Politweet.Org

Observing Malaysian Social Media

Archive for September 2012

Twitter Stats on Pakatan Rakyat’s Alternative Budget 2013

Pakatan Rakyat (PR) revealed its alternative budget yesterday. Compared to last year there seems to be a slight increase in users tweeting about the budget.

On September 26th 2012, 1440 users wrote 2663 tweets about PR and Federal budget-related terms.

On October 4th 2011 when Pakatan Rakyat’s alternative budget was released, 1046 users wrote 2124 tweets about PR alternative budget and (Barisan Nasional) Federal budget-related terms.

From 2011 to 2012, that is an increase of 37.67% in users and 25.38% in terms of tweets. The average tweets/user decreased from 2.03 to 1.85 tweets/user.

One of the hashtags used by promoters of PR’s alternative budget, #BelanjawanPR, trended in Malaysia for a cumulative total of 340 minutes (5 hours 40 mins). It started at 6th place at 4.30pm for 10 minutes before quickly dropping to 8th place by 4.50pm. It continued to trend on/off at 7th and 8th places before ending at 8th place at 12.10am the next day.

Please note that in both years a significant number of people were tweeting about the upcoming BN budget and not the PR budget. An initial sampling of the tweets indicated that for 2012, an estimated 610 users (out of 1440) were referring to the PR budget. So it is possible that PR’s popularity when it comes to its budget has declined. A more detailed analysis will have to wait.

On October 7th 2011 when the Federal budget was tabled, 5972 users wrote 13961 tweets about PR and Federal budget-related terms. That’s almost six times the amount of users talking about the PR alternative budget. It should be interesting to see if the gap widens this Friday.

Advertisements

Written by politweet

September 27, 2012 at 2:01 am

Estimating Malaysia’s Twitter Population, Part 1

This map shows geo-located tweets covering a 24 hour period, obtained using Twitter’s Streaming API. Each blue dot represents one tweet. The red line represents the KL-Selangor border. Click here to view the full-size image on Flickr.

Previous maps that we released were based on cumulative data from 2010, from tweets on socio-political topics. This map is based on all tweets regardless of topic. We were planning to collect a months’ worth of tweets before sharing the results, but the significant difference in just 24 hours makes it worth sharing now.

Total

Tweets : 145145
Users : 24894

This can be broken down into:

Malaysia Total

Tweets : 103303
Users : 17414

Singapore

Tweets : 24974
Users : 4800

Borneo

Tweets : 16868

Users : 2680

KL/Selangor

Tweets : 61680 (59.70% of Malaysia)
Users: 10902 (62.60% of Malaysia)

Rest of Malaysia

Tweets : 41623 (40.29% of Malaysia)
Users : 6512 (37.40% of Malaysia)

Past experience has shown us that only 5-10% of tweets on a given topic are geo-tagged. However it is difficult to estimate the number of users based on tweets. 24 hours worth of data is also too small a sample. Still its worth calculating daily estimates at this point to see what sample size is best.

Based on the Malaysia Total stats, and assuming an average of 6 tweets/user we can calculate a simple estimate of the size of the population:

Min = (103303 / 10 * 100)/6 = 172172
Max = (103303 / 5 * 100)/6 = 344343

Therefore the estimated size of Malaysia’s active Twitter population is 172,172 – 344,343 users. Our last census of politicians’ followers in August found 594,893 active users, but that includes foreigners and Clones. We will continue to collect data and recalculate this estimate in 1-2 weeks time.

Update #1 (September 14)

Found a large number of Borneo tweets were included in the sample obtained from Twitter. This is due to an issue with their Streaming API. All calculations have been revised to reflect the change.

Written by politweet

September 14, 2012 at 5:02 am

Posted in Analyses, Social Media

Tagged with , ,

The Role of Clones in #Merdeka55

We have run multiple censuses on politician’s followers since December 2011, and over time developed some basic categories to distinguish followers:

  • Active – shown signs of Twitter use in the last 1-2 months, has followers and/or tweeted
  • Observer – user with 0 tweets and 0 followers
  • Inactive – no change in statistics in the last 1-2 months (besides followers)
  • Suspended – account currently suspended by Twitter

We are going to add a new category called Clones, which will draw users from Active and Inactive categories. This will be for users found to be run by the same person or group of persons.

What are Clones?

When you create/manage multiple accounts on Twitter with intent to tweet the same type of content some/all of the time, we will call those clones. Not all clones are bad. There are good reasons to have multiple accounts:

  • For marketing purposes, you can post tweets targeted at different markets
  • For work, multiple accounts representing different departments/franchises/organisations
  • You can have one account for personal use, and one for professional use
  • You can use each account to tweet about different topics, and socialise with the different groups for each topic

As long as the number of accounts is kept small and tweeting frequency kept low, Twitter doesn’t seem to have a problem with it. It doesn’t change the fact that only one person is managing these accounts.

When multiple accounts are used to send the same tweet, they are in danger of being suspended or deleted. This is because such behaviour is a violation of Twitter’s Terms of Service (TOS):

You may not do any of the following while accessing or using the Services:  (v) interfere with, or disrupt, (or attempt to do so), the access of any user, host or network, including, without limitation, sending a virus, overloading, flooding, spamming, mail-bombing the Services, or by scripting the creation of Content in such a manner as to interfere with or create an undue burden on the Services.

Clones pose a problem when doing Twitter analytics because:

  • They inflate the @mention levels for an account
  • They artificially increase the follower count for other users
  • They affect sentiment analysis, because one person tweeting an opinion to 10 accounts gives the illusion of 10 users sharing the same opinion

Regular Twitter users call these ‘fake followers’. So there is a need to filter these accounts out to get a truer sense of how many people follow politicians.

Bad Clones (Bots)

Some people register multiple Twitter accounts with some/all of the following characteristics:

  • Pretending to be another person, or fake organisation
  • Scheduled or automated mass-tweeting
  • Following the same user
  • Similar follower/following relationships

These clones are bots. They are not real people. They do not socialise, except with other bots or their creator(s). They have automated behaviour. They were created to serve an agenda. Their creators maintain a real persona online and make use of the bots when needed.

Their effect is to increase follower counts and raise @mention levels. Bots may be terminated by Twitter if they start spamming, so many lie dormant. We will treat bots as a subset of clones. Our bot-detection methods will not be shared publicly, to avoid having the bots change tactics.

Catching the #Merdeka55 Clones

During the #Merdeka55 event, we noticed large blocks of identical tweets being sent at the same time. Further investigation into who sent the tweets revealed that many of these users had a lot in common:

  • Tweeted using Tweetdeck
  • Sending scheduled/automated tweets containing #Merdeka55 and @NajibRazak in sync with other bots
  • Fake-looking profile (based on personal details)
  • Similar follower/following relationships

The pattern seemed to be one real person having as many as several dozen more Twitter accounts. The person may have ‘tweet to all’ or scheduled the tweets. Such tweets were sent from 31st August – 1st September, primarily during 8.15 PM – 9.15 PM on 31st August. One possibility is also real users giving their login details to a 3rd party for use during #Merdeka55.

Some samples of bot account profile images are below. The bot account names are listed at the end of this post. Suspected non-bot user(s) that we have identified have been removed from the mosaic, though it is possible we missed some. It doesn’t change the fact that they are all clones, and each tweet was only tweeted by clones. See what they have in common.

Example 1

Tweet: #Merdeka55 Rukun Negara 5 Kesopanan dan Kesusilaan @NajibRazak @relamalaysia

Sent: 8:17:16 PM

Total users:  78

Bots: 77

Read the rest of this entry »

Written by politweet

September 10, 2012 at 11:11 am

#Merdeka55 Twitter Report

On 28th August 2012, Datuk Seri Dr Rais Yatim (Minister of Information, Communications and Culture) announced that a world record of one million tweets was targeted for the Merdeka Day celebrations. To take part, Twitter users needed to send tweets from 8.15 PM – 9.15 PM (GMT +8) on 31st August 2012 using the hashtag #Merdeka55.

Politweet tracked mentions of the #Merdeka55 hashtag since the announcement. During the targeted hour, an odd pattern emerged during the live stream – large blocks of identical tweets were being sent at the same time.

Further investigation revealed that a small group of users were responsible for a large volume of tweets. These users had similar characteristics, e.g. account creation date, profile photos, location and follower/following relationships. All of their duplicate tweets were sent using Tweetdeck. We are going to call these users ‘Clones’ and expose their methods and impact on the stats in another blog post.

Stats for the hour follow.

Total for the hour

Tweets : 109,320

Users : 19,838

Tweets-per-minute (TPM)

This graph shows TPM from 8.10 PM – 9.20 PM. Tweets rose almost vertically at 8.15 PM. The highest peaks were 2,146 TPM at 8.24 PM and 2,104 TPM at 8.37 PM. Tweets started to decline at 9.11 PM, then spiked one minute at 9.15 PM. After that tweet levels increased as news of the 2.5 million tweet record broke.

Location of #Merdeka55 tweets

Tweets were coming in from all across the country. Globally there were only 5 tweets outside Malaysia – Myanmar, Switzerland, London, Indonesia and Singapore. Its safe to say that the #Merdeka55 hashtag usage was almost entirely confined to Malaysia.

Popular Tweets

The most retweeted (RT) tweets of the hour are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RT counts shown only cover retweets made on August 31st, of tweets made between 8.15PM – 9.15PM.

1. WardinaSafiyyah, 412 RTs

2. KhairyKJ, 303 RTs

3. WardinaSafiyyah, 262 RTs

4. KhairyKJ, 206 RTs

5. KhairyKJ, 159 RTs

This is one of the most popular tweets used by the clones. This message was sent by 178 users, scripted to go out at preset times that day.

How many #Merdeka55 tweets were really sent?

A total figure of 3,611,323 tweets was announced at 9.43 PM that night. But immediately after 9.15 PM the figure announced was 2.5 million. The announcement of the record was not accompanied by any source. No company or online tracking service was named.

It is not clear which figure is correct in reference to the one hour duration, but 3.6 million is what the organiser announced as the record so we will use that for our calculations.

This graph from downrightnow.com was screen-captured at midnight on August 31st .We marked the graph with lines indicating each hour from 6 PM – 10 PM. Based on this, there was fluctuation and lower quality of service from 8 – 10 PM.

Twitter’s performance drops when their system is under heavy load, which is to be expected if 3.6 million tweets were sent out. Based on this graph at the time, the announced figure seemed believable.

From our experience with the #GOP2012 and #DNC2012 conventions so far, our approach seems to be getting about 16% – 28% of the real total. However that estimate is influenced by the global tweets-per-minute (TPM). If global TPM is high, then we get significantly more. If we only used Twitter Search, we would have got an estimated 8% of the real total.

There was some comments online saying that the population of Malaysia needs to be taken into consideration when comparing to USA. That does not really apply here, because we are not looking at how many people are talking about Merdeka Day. Instead we are looking at how many people are competing to set a record. There is the expectation that some users would tweet multiple times to contribute to the goal.

Estimating the real total

The convention totals announced by Twitter cover a period of hours, not one hour. The conventions’ tweets-per-hour were definitely lower than 3.6 million tweets. So if we make the assumption that we got 8% – 28% of the real total:

  • Estimated total (min) = 109,320 / 28 * 100
  • Estimated total (min) = 390,428
  • Estimated total (max) = 109,320 / 8 * 100
  • Estimated total (max) = 1,366,500

Based on our data, the estimated total #Merdeka55 tweets is 360,428 – 1,366,500 tweets.

Estimating the tweets- per-minute (TPM)

Twitter’s system gives us a per-minute sample of what is tweeted. By taking the highest peak in our data, we can estimate the TPM of the real data.

  • Highest peak = 2,146 TPM
  • Percentage of our total = 2,146/109,320 * 100 = 1.963 %
  • Given total = 3,611,323
  • Estimated peak = 3,611,323 * 1.963 %
  • Estimated peak = 70,890 TPM

During the Olympics, Twitter mentioned the biggest records as:

  1. Usain Bolt winning the gold in the 200m sprint (80,000+ TPM)
  2. Usain Bolt winning the gold in the 100m sprint (74,000+ TPM)
  3. Andy Murray winning the gold in men’s tennis singles (57,000+ TPM)

That puts #Merdeka55 as being just below Usain Bolt. It is surprising that such a record went unnoticed by Twitter.

Was the #Merdeka55 a world record?

Only Twitter and their data provider partners (Gnip, Datasift) know the true number of tweets sent for any given topic. Other online systems only have access to a subset of tweets sent, using the same API that Politweet used.

Twitter tends to announce tweets-per-minute and tweets-per-second records, not tweets-per-hour. The closest record that seems relevant is the 2.7 million tweets about Spain during the #Euro2012 Final against Italy, which should cover about 2 hours or more (90 minute match + 15 minute halftime + post-match buzz).

So assuming the figure is true, it is possible that the 3.6 million tweets are a world record. However to date, Twitter has made no announcement on their blog about #Merdeka55. There is also no mention of the #Merdeka55 record online by other tracking websites. Without a third party to verify the data, the 3.6 million tweets figure is doubtful.

The presence of clones also reduces the quality of the record. If the person or organisation in charge of these clones hadn’t polluted the data, whatever record was achieved would have had more historical value.

Update #1 (7th September 2012)

Corrected a typo under ‘How many #Merdeka55 tweets were really sent’. Original text was “If global TPM is high, then we get significantly less“. Correct version is “If global TPM is high, then we get significantly more“. Twitter’s Streaming API offers access to a percentage of tweets based on how much is globally tweeted at the moment. It is stated to be 1%, but we found it to be more.

This does not mean we can only get 1% of tweets on any topic. Think of the limitation as a ceiling on how much data can be received per minute. For example, if our limit is 4000 TPM and the total tweets about @NajibRazak is 3000 TPM, we would then get 100% of all tweets. If we are tracking tweets about @NajibRazak (real total 3000 TPM) and tweets about @BarackObama (real total 3000 TPM), then we would lose 2000 TPM because our limit is 4000 TPM.

Written by politweet

September 7, 2012 at 9:30 am

#GOP2012 Twitter Report Day 4

#GOP2012 is the official hashtag of the Republican National Convention currently being held at Tampa Bay, Florida from August 27th – August 30th. During this period we tracked mentions of #GOP2012, #RNC, #RNC2012, #RomneyRyan2012, #tcot, @GOPConvention,  @MittRomney and @PaulRyanVP on Twitter. Starting Day 3 we also tracked #WeBuiltIt.

The highest levels of interest were for Clint Eastwood (3954 tweets-per-minute), Mitt Romney (3852 tweets-per-minute) and the balloon drop/end of Romney’s speech (4125 tweets-per-minute).

Times shown are in Eastern Daylight Time (UTC -4 due to Daylight Savings Time). Stats in this report cover Day 4 (August 3oth). Click the images to view full-size.

Day Total

Tweets : 606,074

Users : 213, 882

Mentions by the minute

This graph shows tweet levels (mentions) per minute, from 6 PM – 12 AM on August 30th. Significant peaks are labeled on the graph. The peak times along with the main topic being tweeted about are listed below:

Format [Minute = Main speaker/topic being tweeted about (x users, y mentions)]

  • 19:55 PM = Newt & Callista Gingrich (734 users, 775 mentions)
  • 20:15 PM = Jeb Bush (915 users, 968 mentions)
  • 21:06 PM = Tom Stemberg (817 users, 860 mentions)
  • 21:35 PM = US Olympians (1035 users, 1083 mentions)
  • 22:03 PM = Clint Eastwood (2395 users, 2504 mentions)
  • 22:15 PM = End of Clint Eastwood’s speech (3749 users, 3954 mentions)
  • 22:21 PM = Marco Rubio (3008 users, 3189 mentions)
  • 22:36 PM = Mitt Romney gets on stage (3152 users, 3276 mentions)
  • 22:45 PM = Mitt Romney (3678 users, 3852 mentions)
  • 23:00 PM = Mitt Romney (3604 users, 3785 mentions)
  • 23:15 PM = Buzz after end of Mitt Romney’s speech; Balloon drop (3962 users, 4125 mentions)

Mentions by the hour

 

This graph shows tweet levels (mentions) and users tweeting per hour, from 00:00 – 23:59 on August 30th. The gap between the users and mentions indicates how hot the tweeting activity was. Mentions peaked at 185,165 tweets-per-hour at 10 PM.

The data is shown in the table below:

Hour Users Mentions
0 9279 14827
1 4661 7318
2 2665 4490
3 1892 3325
4 1217 2047
5 1190 2098
6 1857 2966
7 3042 4873
8 4441 6997
9 6235 9422
10 6714 10615
11 7223 10857
12 7636 11438
13 7432 11031
14 8217 12109
15 7916 11611
16 9086 13402
17 8247 12191
18 8773 13268
19 12335 21500
20 21264 42676
21 25962 50997
22 79982 185165
23 78043 140851

 

Location of Tweeple

This map shows where geo-located tweets on the convention were coming from in the United States. Each blue dot represents one tweet. Unlike previous days we couldn’t show English/Non-English tweets. This is because the Streaming API does not include Twitter’s auto-detected language. Our own language detection system is optimised for distinguishing English from Malay/Indonesian, so it is not appropriate to apply to this set of tweets.

Differences from yesterday are a lot more tweets are visible on the map, expanding on existing clusters.

Popularity of Searchterms

The list below is a breakdown of how many users wrote tweets containing each searchterm,  from 00:00 – 23:59 August 30th. It is ordered by the number of users.

  • #RNC = 123032 users, 299770 mentions
  • #GOP2012 = 69412 users, 178614 mentions
  • #RNC2012 = 53265 users, 127807 mentions
  • @MittRomney = 49836 users, 80497 mentions
  • #RomneyRyan2012 = 39154 users, 62978 mentions
  • @PaulRyanVP = 20622 users, 31021 mentions
  • #tcot = 16315 users, 58920 mentions
  • @GOPConvention = 3185 users, 5233 mentions
  • #WeBuiltIt = 1308 users, 1690 mentions

Its interesting that #RNC was more widely used than #GOP2012, the official hashtag for the convention. Mentions of @GOPConvention and #WeBuiltIt continued to remain low as in the previous days. We missed out on tracking #BelieveInAmerica.

Popular tweets

The most retweeted (RT) tweets of the day are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RTs shown only cover retweets made on August 30th. We will recalculate the popular tweets for the whole convention once its over.

1. Mitt Romney, 3212 RTs

2. Mitt Romney, 2811 RTs

3. Paul Ryan, 2114 RTs

4. Mitt Romney, 1822 RTs

5. Mitt Romney, 1739 RTs

6. Marco Rubio, 1548 RTs

Note

We collected 606, 074 tweets from 213,882 users. Our system was modified to collect tweets from both the Streaming API and Search API. This helped avoid any dips in the graph or ‘ceiling’ issues like in Day 3.  Twitter wrote a blog post giving the #GOP2012 stats as 2 million, with the peak at 14289 tweets per minute at the end of Mitt Romney’s speech. Our peak was 4125 tweets per minute. Their second peak was at 11.09 PM EST with 13267 tweets-per-minute. Our stats for that time show 3097 tweets-per-minute.

Using our combined API approach, we seem to be getting about 20-28% of the real total. But without more details on what searchterms Twitter is using, or similar graphs to ours, we can’t be sure. 95782 tweets about the convention came exclusively from Twitter Search, and were not obtained using the Streaming API. So for the sake of completeness, both APIs need to be used for major events.

Previous Reports

Day 1 & 2 (August 28th)

Day 3 (August 29th)

Written by politweet

September 5, 2012 at 6:25 pm