Observing Malaysian Social Media

Posts Tagged ‘Tweetdeck

Emergence of Dedicated Spamming Apps in Malaysian Politics

In previous articles published on our blog we have described various spamming strategies used in Malaysian politics:

Some of these strategies are still in use today. Some key points about spam:

  • Spamming involves repeating the same tweet or retweeting another user’s tweet across one or more accounts within a certain time period.
  • Spamming is a violation of Twitter’s own rules for users (that you can read at https://support.twitter.com/articles/18311-the-twitter-rules ). They list various factors that determine spamming behaviour one of which is, “If you post duplicate content over multiple accounts or multiple duplicate updates on one account”. ‘Updates’ refer to tweets and retweets.
  • Spammer accounts are identifiable via their behaviour on Twitter e.g. follower/following relationships; timeline content; tweet timestamp patterns. Tweet frequency, repetition and collaborative behaviour are the main traits we look for.
  • Twitter accounts are being used for personal use and spamming. This lends the appearance of a normal human Twitter user for anyone looking at their timeline and a denial when accused of sending spam.
  • Some Twitter accounts are dedicated to spamming and have no personal messages of any kind.
  • Some Twitter accounts have a block of spam in their timeline but otherwise appear normal. This means they only spammed tweets briefly. It is possible their login credentials were being used without their knowledge.
  • Spamming does not always involve automation via applications. Humans using mobile devices to repeatedly send identical messages across multiple accounts can still be identified.
  • It takes a computerised system to analyse and identify these users and their tweets and categorise spam.

Developing spam detection systems is necessary for us due to the frequent use of spam in Malaysian politics. Spammers who retweet other tweets are problematic for 4 main reasons:

  • They increase the retweet counter for the tweet, making people believe that tweet was popular
  • By retweeting instead of tweeting, users who are searching on a keyword or hashtag won’t see the spamming accounts. This means people who use Twitter won’t discover this activity unless they happen to find the spammer in the list of recent retweeting users for the tweet.
  • The retweet counter is not guaranteed to decrease if the spammer is suspended or deleted.
  • It is harder to prove to non-technical users that the account is a spammer. Direct links to the spammer’s tweet will redirect to the tweet that they retweeted, which is a detail they may overlook. The best way to show evidence to the public is for them to visit the spammer’s timeline and judge for themselves.

All timestamps used in this article have been adjusted for the UTC+8 time zone.

Read the rest of this entry »

Written by politweet

June 1, 2015 at 12:51 pm

The Role of Clones in #Merdeka55

We have run multiple censuses on politician’s followers since December 2011, and over time developed some basic categories to distinguish followers:

  • Active – shown signs of Twitter use in the last 1-2 months, has followers and/or tweeted
  • Observer – user with 0 tweets and 0 followers
  • Inactive – no change in statistics in the last 1-2 months (besides followers)
  • Suspended – account currently suspended by Twitter

We are going to add a new category called Clones, which will draw users from Active and Inactive categories. This will be for users found to be run by the same person or group of persons.

What are Clones?

When you create/manage multiple accounts on Twitter with intent to tweet the same type of content some/all of the time, we will call those clones. Not all clones are bad. There are good reasons to have multiple accounts:

  • For marketing purposes, you can post tweets targeted at different markets
  • For work, multiple accounts representing different departments/franchises/organisations
  • You can have one account for personal use, and one for professional use
  • You can use each account to tweet about different topics, and socialise with the different groups for each topic

As long as the number of accounts is kept small and tweeting frequency kept low, Twitter doesn’t seem to have a problem with it. It doesn’t change the fact that only one person is managing these accounts.

When multiple accounts are used to send the same tweet, they are in danger of being suspended or deleted. This is because such behaviour is a violation of Twitter’s Terms of Service (TOS):

You may not do any of the following while accessing or using the Services:  (v) interfere with, or disrupt, (or attempt to do so), the access of any user, host or network, including, without limitation, sending a virus, overloading, flooding, spamming, mail-bombing the Services, or by scripting the creation of Content in such a manner as to interfere with or create an undue burden on the Services.

Clones pose a problem when doing Twitter analytics because:

  • They inflate the @mention levels for an account
  • They artificially increase the follower count for other users
  • They affect sentiment analysis, because one person tweeting an opinion to 10 accounts gives the illusion of 10 users sharing the same opinion

Regular Twitter users call these ‘fake followers’. So there is a need to filter these accounts out to get a truer sense of how many people follow politicians.

Bad Clones (Bots)

Some people register multiple Twitter accounts with some/all of the following characteristics:

  • Pretending to be another person, or fake organisation
  • Scheduled or automated mass-tweeting
  • Following the same user
  • Similar follower/following relationships

These clones are bots. They are not real people. They do not socialise, except with other bots or their creator(s). They have automated behaviour. They were created to serve an agenda. Their creators maintain a real persona online and make use of the bots when needed.

Their effect is to increase follower counts and raise @mention levels. Bots may be terminated by Twitter if they start spamming, so many lie dormant. We will treat bots as a subset of clones. Our bot-detection methods will not be shared publicly, to avoid having the bots change tactics.

Catching the #Merdeka55 Clones

During the #Merdeka55 event, we noticed large blocks of identical tweets being sent at the same time. Further investigation into who sent the tweets revealed that many of these users had a lot in common:

  • Tweeted using Tweetdeck
  • Sending scheduled/automated tweets containing #Merdeka55 and @NajibRazak in sync with other bots
  • Fake-looking profile (based on personal details)
  • Similar follower/following relationships

The pattern seemed to be one real person having as many as several dozen more Twitter accounts. The person may have ‘tweet to all’ or scheduled the tweets. Such tweets were sent from 31st August – 1st September, primarily during 8.15 PM – 9.15 PM on 31st August. One possibility is also real users giving their login details to a 3rd party for use during #Merdeka55.

Some samples of bot account profile images are below. The bot account names are listed at the end of this post. Suspected non-bot user(s) that we have identified have been removed from the mosaic, though it is possible we missed some. It doesn’t change the fact that they are all clones, and each tweet was only tweeted by clones. See what they have in common.

Example 1

Tweet: #Merdeka55 Rukun Negara 5 Kesopanan dan Kesusilaan @NajibRazak @relamalaysia

Sent: 8:17:16 PM

Total users:  78

Bots: 77

Read the rest of this entry »

Written by politweet

September 10, 2012 at 11:11 am

#Merdeka55 Twitter Report

On 28th August 2012, Datuk Seri Dr Rais Yatim (Minister of Information, Communications and Culture) announced that a world record of one million tweets was targeted for the Merdeka Day celebrations. To take part, Twitter users needed to send tweets from 8.15 PM – 9.15 PM (GMT +8) on 31st August 2012 using the hashtag #Merdeka55.

Politweet tracked mentions of the #Merdeka55 hashtag since the announcement. During the targeted hour, an odd pattern emerged during the live stream – large blocks of identical tweets were being sent at the same time.

Further investigation revealed that a small group of users were responsible for a large volume of tweets. These users had similar characteristics, e.g. account creation date, profile photos, location and follower/following relationships. All of their duplicate tweets were sent using Tweetdeck. We are going to call these users ‘Clones’ and expose their methods and impact on the stats in another blog post.

Stats for the hour follow.

Total for the hour

Tweets : 109,320

Users : 19,838

Tweets-per-minute (TPM)

This graph shows TPM from 8.10 PM – 9.20 PM. Tweets rose almost vertically at 8.15 PM. The highest peaks were 2,146 TPM at 8.24 PM and 2,104 TPM at 8.37 PM. Tweets started to decline at 9.11 PM, then spiked one minute at 9.15 PM. After that tweet levels increased as news of the 2.5 million tweet record broke.

Location of #Merdeka55 tweets

Tweets were coming in from all across the country. Globally there were only 5 tweets outside Malaysia – Myanmar, Switzerland, London, Indonesia and Singapore. Its safe to say that the #Merdeka55 hashtag usage was almost entirely confined to Malaysia.

Popular Tweets

The most retweeted (RT) tweets of the hour are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RT counts shown only cover retweets made on August 31st, of tweets made between 8.15PM – 9.15PM.

1. WardinaSafiyyah, 412 RTs

2. KhairyKJ, 303 RTs

3. WardinaSafiyyah, 262 RTs

4. KhairyKJ, 206 RTs

5. KhairyKJ, 159 RTs

This is one of the most popular tweets used by the clones. This message was sent by 178 users, scripted to go out at preset times that day.

How many #Merdeka55 tweets were really sent?

A total figure of 3,611,323 tweets was announced at 9.43 PM that night. But immediately after 9.15 PM the figure announced was 2.5 million. The announcement of the record was not accompanied by any source. No company or online tracking service was named.

It is not clear which figure is correct in reference to the one hour duration, but 3.6 million is what the organiser announced as the record so we will use that for our calculations.

This graph from downrightnow.com was screen-captured at midnight on August 31st .We marked the graph with lines indicating each hour from 6 PM – 10 PM. Based on this, there was fluctuation and lower quality of service from 8 – 10 PM.

Twitter’s performance drops when their system is under heavy load, which is to be expected if 3.6 million tweets were sent out. Based on this graph at the time, the announced figure seemed believable.

From our experience with the #GOP2012 and #DNC2012 conventions so far, our approach seems to be getting about 16% – 28% of the real total. However that estimate is influenced by the global tweets-per-minute (TPM). If global TPM is high, then we get significantly more. If we only used Twitter Search, we would have got an estimated 8% of the real total.

There was some comments online saying that the population of Malaysia needs to be taken into consideration when comparing to USA. That does not really apply here, because we are not looking at how many people are talking about Merdeka Day. Instead we are looking at how many people are competing to set a record. There is the expectation that some users would tweet multiple times to contribute to the goal.

Estimating the real total

The convention totals announced by Twitter cover a period of hours, not one hour. The conventions’ tweets-per-hour were definitely lower than 3.6 million tweets. So if we make the assumption that we got 8% – 28% of the real total:

  • Estimated total (min) = 109,320 / 28 * 100
  • Estimated total (min) = 390,428
  • Estimated total (max) = 109,320 / 8 * 100
  • Estimated total (max) = 1,366,500

Based on our data, the estimated total #Merdeka55 tweets is 360,428 – 1,366,500 tweets.

Estimating the tweets- per-minute (TPM)

Twitter’s system gives us a per-minute sample of what is tweeted. By taking the highest peak in our data, we can estimate the TPM of the real data.

  • Highest peak = 2,146 TPM
  • Percentage of our total = 2,146/109,320 * 100 = 1.963 %
  • Given total = 3,611,323
  • Estimated peak = 3,611,323 * 1.963 %
  • Estimated peak = 70,890 TPM

During the Olympics, Twitter mentioned the biggest records as:

  1. Usain Bolt winning the gold in the 200m sprint (80,000+ TPM)
  2. Usain Bolt winning the gold in the 100m sprint (74,000+ TPM)
  3. Andy Murray winning the gold in men’s tennis singles (57,000+ TPM)

That puts #Merdeka55 as being just below Usain Bolt. It is surprising that such a record went unnoticed by Twitter.

Was the #Merdeka55 a world record?

Only Twitter and their data provider partners (Gnip, Datasift) know the true number of tweets sent for any given topic. Other online systems only have access to a subset of tweets sent, using the same API that Politweet used.

Twitter tends to announce tweets-per-minute and tweets-per-second records, not tweets-per-hour. The closest record that seems relevant is the 2.7 million tweets about Spain during the #Euro2012 Final against Italy, which should cover about 2 hours or more (90 minute match + 15 minute halftime + post-match buzz).

So assuming the figure is true, it is possible that the 3.6 million tweets are a world record. However to date, Twitter has made no announcement on their blog about #Merdeka55. There is also no mention of the #Merdeka55 record online by other tracking websites. Without a third party to verify the data, the 3.6 million tweets figure is doubtful.

The presence of clones also reduces the quality of the record. If the person or organisation in charge of these clones hadn’t polluted the data, whatever record was achieved would have had more historical value.

Update #1 (7th September 2012)

Corrected a typo under ‘How many #Merdeka55 tweets were really sent’. Original text was “If global TPM is high, then we get significantly less“. Correct version is “If global TPM is high, then we get significantly more“. Twitter’s Streaming API offers access to a percentage of tweets based on how much is globally tweeted at the moment. It is stated to be 1%, but we found it to be more.

This does not mean we can only get 1% of tweets on any topic. Think of the limitation as a ceiling on how much data can be received per minute. For example, if our limit is 4000 TPM and the total tweets about @NajibRazak is 3000 TPM, we would then get 100% of all tweets. If we are tracking tweets about @NajibRazak (real total 3000 TPM) and tweets about @BarackObama (real total 3000 TPM), then we would lose 2000 TPM because our limit is 4000 TPM.

Written by politweet

September 7, 2012 at 9:30 am