Politweet.Org

Observing Malaysian Social Media

Posts Tagged ‘Merdeka55

Evolution of Spammers in Malaysian Politics

During the #Merdeka55 event (which you can read about here and here), we discovered the use of Twitter accounts to repeatedly send the same tweet at the same moment. This practice is known as spamming. You can see this pattern demonstrated in the below screenshot of the Twitter stream during the event – large blocks of identical tweets were being sent at the same moment.

Merdeka55BotSample

Since that time we have had to continuously improve our spam detection methods to filter out the spammed tweets. Our priority in our reports is to share content that was shared by the most number of people (not users) and genuinely popular.

Content that appears popular due to the usage of automated accounts or people hired to spam the content across multiple accounts need to have their popularity rank adjusted. We do not censor spammed content or ban spammers, but only filter out the individual tweets.

Read the rest of this entry »

Written by politweet

November 28, 2013 at 12:55 pm

The Role of Clones in #Merdeka55

We have run multiple censuses on politician’s followers since December 2011, and over time developed some basic categories to distinguish followers:

  • Active – shown signs of Twitter use in the last 1-2 months, has followers and/or tweeted
  • Observer – user with 0 tweets and 0 followers
  • Inactive – no change in statistics in the last 1-2 months (besides followers)
  • Suspended – account currently suspended by Twitter

We are going to add a new category called Clones, which will draw users from Active and Inactive categories. This will be for users found to be run by the same person or group of persons.

What are Clones?

When you create/manage multiple accounts on Twitter with intent to tweet the same type of content some/all of the time, we will call those clones. Not all clones are bad. There are good reasons to have multiple accounts:

  • For marketing purposes, you can post tweets targeted at different markets
  • For work, multiple accounts representing different departments/franchises/organisations
  • You can have one account for personal use, and one for professional use
  • You can use each account to tweet about different topics, and socialise with the different groups for each topic

As long as the number of accounts is kept small and tweeting frequency kept low, Twitter doesn’t seem to have a problem with it. It doesn’t change the fact that only one person is managing these accounts.

When multiple accounts are used to send the same tweet, they are in danger of being suspended or deleted. This is because such behaviour is a violation of Twitter’s Terms of Service (TOS):

You may not do any of the following while accessing or using the Services:  (v) interfere with, or disrupt, (or attempt to do so), the access of any user, host or network, including, without limitation, sending a virus, overloading, flooding, spamming, mail-bombing the Services, or by scripting the creation of Content in such a manner as to interfere with or create an undue burden on the Services.

Clones pose a problem when doing Twitter analytics because:

  • They inflate the @mention levels for an account
  • They artificially increase the follower count for other users
  • They affect sentiment analysis, because one person tweeting an opinion to 10 accounts gives the illusion of 10 users sharing the same opinion

Regular Twitter users call these ‘fake followers’. So there is a need to filter these accounts out to get a truer sense of how many people follow politicians.

Bad Clones (Bots)

Some people register multiple Twitter accounts with some/all of the following characteristics:

  • Pretending to be another person, or fake organisation
  • Scheduled or automated mass-tweeting
  • Following the same user
  • Similar follower/following relationships

These clones are bots. They are not real people. They do not socialise, except with other bots or their creator(s). They have automated behaviour. They were created to serve an agenda. Their creators maintain a real persona online and make use of the bots when needed.

Their effect is to increase follower counts and raise @mention levels. Bots may be terminated by Twitter if they start spamming, so many lie dormant. We will treat bots as a subset of clones. Our bot-detection methods will not be shared publicly, to avoid having the bots change tactics.

Catching the #Merdeka55 Clones

During the #Merdeka55 event, we noticed large blocks of identical tweets being sent at the same time. Further investigation into who sent the tweets revealed that many of these users had a lot in common:

  • Tweeted using Tweetdeck
  • Sending scheduled/automated tweets containing #Merdeka55 and @NajibRazak in sync with other bots
  • Fake-looking profile (based on personal details)
  • Similar follower/following relationships

The pattern seemed to be one real person having as many as several dozen more Twitter accounts. The person may have ‘tweet to all’ or scheduled the tweets. Such tweets were sent from 31st August – 1st September, primarily during 8.15 PM – 9.15 PM on 31st August. One possibility is also real users giving their login details to a 3rd party for use during #Merdeka55.

Some samples of bot account profile images are below. The bot account names are listed at the end of this post. Suspected non-bot user(s) that we have identified have been removed from the mosaic, though it is possible we missed some. It doesn’t change the fact that they are all clones, and each tweet was only tweeted by clones. See what they have in common.

Example 1

Tweet: #Merdeka55 Rukun Negara 5 Kesopanan dan Kesusilaan @NajibRazak @relamalaysia

Sent: 8:17:16 PM

Total users:  78

Bots: 77

Read the rest of this entry »

Written by politweet

September 10, 2012 at 11:11 am

#Merdeka55 Twitter Report

On 28th August 2012, Datuk Seri Dr Rais Yatim (Minister of Information, Communications and Culture) announced that a world record of one million tweets was targeted for the Merdeka Day celebrations. To take part, Twitter users needed to send tweets from 8.15 PM – 9.15 PM (GMT +8) on 31st August 2012 using the hashtag #Merdeka55.

Politweet tracked mentions of the #Merdeka55 hashtag since the announcement. During the targeted hour, an odd pattern emerged during the live stream – large blocks of identical tweets were being sent at the same time.

Further investigation revealed that a small group of users were responsible for a large volume of tweets. These users had similar characteristics, e.g. account creation date, profile photos, location and follower/following relationships. All of their duplicate tweets were sent using Tweetdeck. We are going to call these users ‘Clones’ and expose their methods and impact on the stats in another blog post.

Stats for the hour follow.

Total for the hour

Tweets : 109,320

Users : 19,838

Tweets-per-minute (TPM)

This graph shows TPM from 8.10 PM – 9.20 PM. Tweets rose almost vertically at 8.15 PM. The highest peaks were 2,146 TPM at 8.24 PM and 2,104 TPM at 8.37 PM. Tweets started to decline at 9.11 PM, then spiked one minute at 9.15 PM. After that tweet levels increased as news of the 2.5 million tweet record broke.

Location of #Merdeka55 tweets

Tweets were coming in from all across the country. Globally there were only 5 tweets outside Malaysia – Myanmar, Switzerland, London, Indonesia and Singapore. Its safe to say that the #Merdeka55 hashtag usage was almost entirely confined to Malaysia.

Popular Tweets

The most retweeted (RT) tweets of the hour are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RT counts shown only cover retweets made on August 31st, of tweets made between 8.15PM – 9.15PM.

1. WardinaSafiyyah, 412 RTs

2. KhairyKJ, 303 RTs

3. WardinaSafiyyah, 262 RTs

4. KhairyKJ, 206 RTs

5. KhairyKJ, 159 RTs

This is one of the most popular tweets used by the clones. This message was sent by 178 users, scripted to go out at preset times that day.

How many #Merdeka55 tweets were really sent?

A total figure of 3,611,323 tweets was announced at 9.43 PM that night. But immediately after 9.15 PM the figure announced was 2.5 million. The announcement of the record was not accompanied by any source. No company or online tracking service was named.

It is not clear which figure is correct in reference to the one hour duration, but 3.6 million is what the organiser announced as the record so we will use that for our calculations.

This graph from downrightnow.com was screen-captured at midnight on August 31st .We marked the graph with lines indicating each hour from 6 PM – 10 PM. Based on this, there was fluctuation and lower quality of service from 8 – 10 PM.

Twitter’s performance drops when their system is under heavy load, which is to be expected if 3.6 million tweets were sent out. Based on this graph at the time, the announced figure seemed believable.

From our experience with the #GOP2012 and #DNC2012 conventions so far, our approach seems to be getting about 16% – 28% of the real total. However that estimate is influenced by the global tweets-per-minute (TPM). If global TPM is high, then we get significantly more. If we only used Twitter Search, we would have got an estimated 8% of the real total.

There was some comments online saying that the population of Malaysia needs to be taken into consideration when comparing to USA. That does not really apply here, because we are not looking at how many people are talking about Merdeka Day. Instead we are looking at how many people are competing to set a record. There is the expectation that some users would tweet multiple times to contribute to the goal.

Estimating the real total

The convention totals announced by Twitter cover a period of hours, not one hour. The conventions’ tweets-per-hour were definitely lower than 3.6 million tweets. So if we make the assumption that we got 8% – 28% of the real total:

  • Estimated total (min) = 109,320 / 28 * 100
  • Estimated total (min) = 390,428
  • Estimated total (max) = 109,320 / 8 * 100
  • Estimated total (max) = 1,366,500

Based on our data, the estimated total #Merdeka55 tweets is 360,428 – 1,366,500 tweets.

Estimating the tweets- per-minute (TPM)

Twitter’s system gives us a per-minute sample of what is tweeted. By taking the highest peak in our data, we can estimate the TPM of the real data.

  • Highest peak = 2,146 TPM
  • Percentage of our total = 2,146/109,320 * 100 = 1.963 %
  • Given total = 3,611,323
  • Estimated peak = 3,611,323 * 1.963 %
  • Estimated peak = 70,890 TPM

During the Olympics, Twitter mentioned the biggest records as:

  1. Usain Bolt winning the gold in the 200m sprint (80,000+ TPM)
  2. Usain Bolt winning the gold in the 100m sprint (74,000+ TPM)
  3. Andy Murray winning the gold in men’s tennis singles (57,000+ TPM)

That puts #Merdeka55 as being just below Usain Bolt. It is surprising that such a record went unnoticed by Twitter.

Was the #Merdeka55 a world record?

Only Twitter and their data provider partners (Gnip, Datasift) know the true number of tweets sent for any given topic. Other online systems only have access to a subset of tweets sent, using the same API that Politweet used.

Twitter tends to announce tweets-per-minute and tweets-per-second records, not tweets-per-hour. The closest record that seems relevant is the 2.7 million tweets about Spain during the #Euro2012 Final against Italy, which should cover about 2 hours or more (90 minute match + 15 minute halftime + post-match buzz).

So assuming the figure is true, it is possible that the 3.6 million tweets are a world record. However to date, Twitter has made no announcement on their blog about #Merdeka55. There is also no mention of the #Merdeka55 record online by other tracking websites. Without a third party to verify the data, the 3.6 million tweets figure is doubtful.

The presence of clones also reduces the quality of the record. If the person or organisation in charge of these clones hadn’t polluted the data, whatever record was achieved would have had more historical value.

Update #1 (7th September 2012)

Corrected a typo under ‘How many #Merdeka55 tweets were really sent’. Original text was “If global TPM is high, then we get significantly less“. Correct version is “If global TPM is high, then we get significantly more“. Twitter’s Streaming API offers access to a percentage of tweets based on how much is globally tweeted at the moment. It is stated to be 1%, but we found it to be more.

This does not mean we can only get 1% of tweets on any topic. Think of the limitation as a ceiling on how much data can be received per minute. For example, if our limit is 4000 TPM and the total tweets about @NajibRazak is 3000 TPM, we would then get 100% of all tweets. If we are tracking tweets about @NajibRazak (real total 3000 TPM) and tweets about @BarackObama (real total 3000 TPM), then we would lose 2000 TPM because our limit is 4000 TPM.

Written by politweet

September 7, 2012 at 9:30 am