Politweet.Org

Observing Malaysian Social Media

Posts Tagged ‘GOP2012

#GOP2012 Twitter Report Day 4

#GOP2012 is the official hashtag of the Republican National Convention currently being held at Tampa Bay, Florida from August 27th – August 30th. During this period we tracked mentions of #GOP2012, #RNC, #RNC2012, #RomneyRyan2012, #tcot, @GOPConvention,  @MittRomney and @PaulRyanVP on Twitter. Starting Day 3 we also tracked #WeBuiltIt.

The highest levels of interest were for Clint Eastwood (3954 tweets-per-minute), Mitt Romney (3852 tweets-per-minute) and the balloon drop/end of Romney’s speech (4125 tweets-per-minute).

Times shown are in Eastern Daylight Time (UTC -4 due to Daylight Savings Time). Stats in this report cover Day 4 (August 3oth). Click the images to view full-size.

Day Total

Tweets : 606,074

Users : 213, 882

Mentions by the minute

This graph shows tweet levels (mentions) per minute, from 6 PM – 12 AM on August 30th. Significant peaks are labeled on the graph. The peak times along with the main topic being tweeted about are listed below:

Format [Minute = Main speaker/topic being tweeted about (x users, y mentions)]

  • 19:55 PM = Newt & Callista Gingrich (734 users, 775 mentions)
  • 20:15 PM = Jeb Bush (915 users, 968 mentions)
  • 21:06 PM = Tom Stemberg (817 users, 860 mentions)
  • 21:35 PM = US Olympians (1035 users, 1083 mentions)
  • 22:03 PM = Clint Eastwood (2395 users, 2504 mentions)
  • 22:15 PM = End of Clint Eastwood’s speech (3749 users, 3954 mentions)
  • 22:21 PM = Marco Rubio (3008 users, 3189 mentions)
  • 22:36 PM = Mitt Romney gets on stage (3152 users, 3276 mentions)
  • 22:45 PM = Mitt Romney (3678 users, 3852 mentions)
  • 23:00 PM = Mitt Romney (3604 users, 3785 mentions)
  • 23:15 PM = Buzz after end of Mitt Romney’s speech; Balloon drop (3962 users, 4125 mentions)

Mentions by the hour

 

This graph shows tweet levels (mentions) and users tweeting per hour, from 00:00 – 23:59 on August 30th. The gap between the users and mentions indicates how hot the tweeting activity was. Mentions peaked at 185,165 tweets-per-hour at 10 PM.

The data is shown in the table below:

Hour Users Mentions
0 9279 14827
1 4661 7318
2 2665 4490
3 1892 3325
4 1217 2047
5 1190 2098
6 1857 2966
7 3042 4873
8 4441 6997
9 6235 9422
10 6714 10615
11 7223 10857
12 7636 11438
13 7432 11031
14 8217 12109
15 7916 11611
16 9086 13402
17 8247 12191
18 8773 13268
19 12335 21500
20 21264 42676
21 25962 50997
22 79982 185165
23 78043 140851

 

Location of Tweeple

This map shows where geo-located tweets on the convention were coming from in the United States. Each blue dot represents one tweet. Unlike previous days we couldn’t show English/Non-English tweets. This is because the Streaming API does not include Twitter’s auto-detected language. Our own language detection system is optimised for distinguishing English from Malay/Indonesian, so it is not appropriate to apply to this set of tweets.

Differences from yesterday are a lot more tweets are visible on the map, expanding on existing clusters.

Popularity of Searchterms

The list below is a breakdown of how many users wrote tweets containing each searchterm,  from 00:00 – 23:59 August 30th. It is ordered by the number of users.

  • #RNC = 123032 users, 299770 mentions
  • #GOP2012 = 69412 users, 178614 mentions
  • #RNC2012 = 53265 users, 127807 mentions
  • @MittRomney = 49836 users, 80497 mentions
  • #RomneyRyan2012 = 39154 users, 62978 mentions
  • @PaulRyanVP = 20622 users, 31021 mentions
  • #tcot = 16315 users, 58920 mentions
  • @GOPConvention = 3185 users, 5233 mentions
  • #WeBuiltIt = 1308 users, 1690 mentions

Its interesting that #RNC was more widely used than #GOP2012, the official hashtag for the convention. Mentions of @GOPConvention and #WeBuiltIt continued to remain low as in the previous days. We missed out on tracking #BelieveInAmerica.

Popular tweets

The most retweeted (RT) tweets of the day are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RTs shown only cover retweets made on August 30th. We will recalculate the popular tweets for the whole convention once its over.

1. Mitt Romney, 3212 RTs

2. Mitt Romney, 2811 RTs

3. Paul Ryan, 2114 RTs

4. Mitt Romney, 1822 RTs

5. Mitt Romney, 1739 RTs

6. Marco Rubio, 1548 RTs

Note

We collected 606, 074 tweets from 213,882 users. Our system was modified to collect tweets from both the Streaming API and Search API. This helped avoid any dips in the graph or ‘ceiling’ issues like in Day 3.  Twitter wrote a blog post giving the #GOP2012 stats as 2 million, with the peak at 14289 tweets per minute at the end of Mitt Romney’s speech. Our peak was 4125 tweets per minute. Their second peak was at 11.09 PM EST with 13267 tweets-per-minute. Our stats for that time show 3097 tweets-per-minute.

Using our combined API approach, we seem to be getting about 20-28% of the real total. But without more details on what searchterms Twitter is using, or similar graphs to ours, we can’t be sure. 95782 tweets about the convention came exclusively from Twitter Search, and were not obtained using the Streaming API. So for the sake of completeness, both APIs need to be used for major events.

Previous Reports

Day 1 & 2 (August 28th)

Day 3 (August 29th)

Advertisements

Written by politweet

September 5, 2012 at 6:25 pm

#GOP2012 Twitter Report Day 3

#GOP2012 is the official hashtag of the Republican National Convention currently being held at Tampa Bay, Florida from August 27th – August 30th. During this period we tracked mentions of #GOP2012, #RNC, #RNC2012, #RomneyRyan2012, #tcot, @GOPConvention,  @MittRomney and @PaulRyanVP on Twitter. Starting Day 3 we also tracked #WeBuiltIt.

The highest levels of interest were for Condoleezza Rice (1232 tweets-per-minute), Susana Martinez (1353 tweets-per-minute) and Paul Ryan (1405 tweets-per-minute).

Times shown are in Eastern Daylight Time (UTC -4 due to Daylight Savings Time). Stats in this report cover Day 3 (August 29th). Click the images to view full-size.

Day Total

Tweets : 322,162

Users : 121, 009

Mentions by the minute

This graph shows tweet levels (mentions) per minute, from 6 PM – 12 AM on August 29th. Significant peaks are labeled on the graph. The peak times along with the main topic being tweeted about are listed below:

Format [Minute = Main speaker/topic being tweeted about (x users, y mentions)]

  • 20:14 PM = John McCain (692 users, 718 mentions)
  • 20:33 PM = Pam Bondi and Sam Olens (578 users,599 mentions)
  • 21:09 PM = Rob Portman (589 users, 620 mentions)
  • 21:34 PM = Tim Pawlenty (1023 users, 1090 mentions)
  • 21:54 PM = Mike Huckabee (979 users, 1014 mentions)
  • 21:57 PM = Condoleezza Rice (1023 users, 1059 mentions)
  • 22:01 PM = Condoleezza Rice (1051 users, 1099 mentions)
  • 22:07 PM = Condoleezza Rice (1189 users, 1232 mentions)
  • 22:17 PM = Susana Martinez and post-speech praise for Condoleezza Rice (1307 users, 1364 mentions)
  • 22:25 PM = Susana Martinez (1282 users, 1353 mentions)
  • 22:35 PM = Paul Ryan (1223 users, 1269 mentions)
  • 22:45 PM = Paul Ryan (1230 users, 1306 mentions)
  • 22:59 PM = Paul Ryan (1380 users, 1405 mentions)
  • 23:07 PM = Post-speech commentary (1357 users, 1423 mentions)

Our system had trouble keeping up with the tweet volume starting from the time Condoleezza Rice spoke. This is why the graph has many dips from that point on.

Mentions by the hour

This graph shows tweet levels (mentions) and users tweeting per hour, from 00:00 – 23:59 on August 29th. The gap between the users and mentions indicates how hot the tweeting activity was. The almost horizontal levels of tweets is an indication of a ‘ceiling’ that our system hit, because we used the Twitter Search API instead of the Streaming API. A discussion of that can be found in this blog post.

The data is shown in the table below:

Hour Users Mentions
0 9393 14485
1 4428 6715
2 2430 3821
3 1579 2343
4 1129 1802
5 1088 1733
6 1816 2623
7 2838 4086
8 3861 5670
9 6015 9035
10 6461 9700
11 6375 9893
12 6563 9855
13 6296 9643
14 6154 9218
15 6451 9440
16 6126 9167
17 6232 9056
18 6219 9407
19 10516 19245
20 14460 27037
21 19868 39471
22 28113 52754
23 28574 45963

Location of Tweeple

This map shows where geo-located tweets on the convention were coming from in the United States. Blue tweets are English, red tweets are non-English. Differences from yesterday are more tweets from Vancouver and more tweets distributed across America,  noticeably from Missouri until the east coastline.

Popularity of Searchterms

The list below is a breakdown of how many users wrote tweets containing each searchterm,  from 00:00 – 23:59 August 29th. It is ordered by the number of users.

  • #GOP2012 = 42049 users, 89308 mentions
  • #RNC = 45455 users, 80869 mentions
  • #RNC2012 = 32022 users, 65012 mentions
  • @MittRomney = 24588 users, 36093 mentions
  • #tcot = 15304 users, 55481 mentions
  • @PaulRyanVP = 14898 users, 21945 mentions
  • #RomneyRyan2012 = 10618 users, 17055 mentions
  • @GOPConvention = 3350 users, 5482 mentions
  • #WeBuiltIt = 2272 users, 2949 mentions

Compared to yesterday, the relative ordering remains the same except for mentions of @PaulRyanVP, which moved up a place. The new hashtag we tracked, #WeBuiltIt, had relatively few mentions.

Popular tweets

The most retweeted (RT) tweets of the day are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RTs shown only cover retweets made on August 29th. We will recalculate the popular tweets for the whole convention once its over.

1. Paul Ryan, 1289 RTs

2. Paul Ryan, 1167 RTs

3. Mitt Romney, 838 RTs

4. Mitt Romney, 831 RTs

5. Paul Ryan, 828 RTs

Note

We collected 322, 162 tweets from 121,009 users. Politweet uses Twitter’s Search API to track tweets for two reasons – to take advantage of Twitter’s spam-filtering; and because the Search API is closer to the end-user experience. Twitter wrote a blog post giving the #GOP2012 stats as over 2 million, with the peak at 6669 tweets per minute during Paul Ryan’s speech. Our peak was 1405 tweets per minute. Given this big discrepancy, for Day 4 of the convention we will be using both the Search API and Streaming API.

Written by politweet

August 31, 2012 at 6:07 am

Twitter Search and the Streaming API for Research

In the last 3 days we tracked mentions of the GOP convention currently being held in Tampa Bay, Florida. One oversight we made was in estimating the number of tweets. In Malaysia, the volume of tweets for topics we track is relatively lower. So some missing tweets are acceptable. We had never tracked an American event before, and understimated the discrepancy between the amount of tweets provided by Twitter’s public API system, and the real amount of tweets.

Yesterday, Twitter’s official blog put up some statistics on the GOP convention, so we have a good comparison to make.

The Republican VP nominee also drove the top three peaks tonight in Tweets-per-minute, the highest coming at the conclusion of his speech: 6,669.

It is not clear if Twitter is only looking at #GOP2012 or taking other related tweets into account. From the data we collected yesterday, the highest peak was 1423 Tweets-per-minute. For #GOP2012 alone the peak was 550 tweets-per-minute. We can assume the amount of tweets collected from the Search API is an estimated (550/6669 *100), or 8.2% of the real total.

Twitter also stated:

Tweets about the #GOP2012 convention topped two million as Ryan took the stage—six times the Tweets sent about the 2008 conventions combined.

Do they mean two million on August 29th, or two million since August 28th, or since the weekend? Without a frame of reference we can’t make good use of that figure. We collected 212,810 mentions of #GOP2012 since Monday (August 27th). For August 29th alone, we have 89,308 tweets. We can assume the worst case is we got (89308/2000000*100), or 4.5% of the real total.

Reasons for Discrepancy

Public access to Twitter is limited to 2 application programming interfaces (API):

Twitter Search  

  1. Limited to tweets that Twitter considers ‘relevant’. Tweets and users that are considered spam are filtered out.
  2. What is obtained with this API is less than the real amount of tweets, and the discrepancy increases with how ‘hot’ the topic is.
  3. This API limits you to a maximum of 1,500 tweets for a search query, broken up into pages of up to 100 tweets each. In practice we find the only stable way of working with the API was to limit pages to 70 tweets, giving us a maximum of 1,050 tweets per query. Requesting more resulted in timeouts or zero tweets returned.
  4. Search results go back in time for up to 6 days. This is good for tracking tweets after finding out about an event.
  5. Works by posting a search request to Twitter and parsing the results, then submitting more requests if there is paging to be done. Susceptible to network latency issues.

The Streaming API

  1. Gives a small percentage of all tweets. Twitter has previously stated it gives 1% of all tweets, and this scales with the amount of tweets. So if there are 5 million tweets about all topics at the moment, the upper limit would be 50,000 tweets.
  2. Any search request will have this limit applied to it. If what we are searching for has only 10,000 tweets, we will get 100% of all tweets because 10,000<50,000.
  3. It is meant for real-time use. There was a back-fill option to get old tweets but it is missing from the documentation so it may have been removed.
  4. No filtering is done. Spam and spammers will be included.
  5. A connection is made to Twitter, after which tweets come in on a constant stream. Less susceptible to network latency issues.
  6. Search queries need to be defined before connecting to Twitter. If there is a need to add/remove searchterms, you have to terminate the connection and reconnect. There is a limit of 400 keywords, 5000 ‘follow’ users and 25 geo-located areas. The ‘follow’ users return tweets written by those users but not all @mentions of the user.
  7. The downside is if you are tracking something very popular that takes up >1% of all tweets, you will run into the 1% limit. What you get with the Streaming API also includes spam, so you would need your own tweet spam-detection technology.

When the real volume of tweets is high, then the rate of tweets coming in is faster than what gets indexed under Twitter Search, and what we end up collecting is only a sample of what was tweeted.  We only experienced this before during earthquake monitoring, when the pattern of data showed we were hitting a ceiling. This ceiling became more obvious during Day #3 of the GOP Convention (report pending). Based on this experience we can now expect results to be doubtful if they breach 400 tweets-per-minute. In this situation we need to look at tweets-per-second and find gaps (and there were such gaps during #GOP2012). From the size and frequency of the gaps we can try to deduce the real volume of tweets.

After 3 years of usage, we find our system can collect up to 1050 tweets-per-minute, per-searchterm. But the collection rate is influenced by how overloaded Twitter is. When it is under heavy load it tends to return only 0-100 tweets-per-minute. Within seconds a thousand tweets go by and that data is lost forever due to the 1,500 tweet limit. However with the Streaming API there is the 1% limit and the spam to filter out. Developing our own spam filter is not worthwhile.

Note that Twitter does not state how many of the 2 million tweets include spammers/bots that would have been filtered out by Twitter Search. Using Twitter Search we obtained 4.5 – 8.2% of the total tweets, which is a decent sample considering we have data for every minute.

Moving Forward

There is no way to use Twitter’s public API to track popular searchterms and expect to get 100% of all tweets. The only way to get complete data (such as the 2 million #GOP2012 tweets) is to use paid services such as Gnip and Datasift. Datasift charge based on the amount of data collected, which can be very costly. Gnip does not have a pricing plan listed.

It is not practical for political campaigners, event managers or researchers to pay-per-tweet for 2 million tweets for one event. So working with a sample is the only way to get things done, and this is likely what many online paid tracking systems are using.

For research purposes, using the Search API is still acceptable though its preferable to use both when it comes to events.  So tonight for #GOP2012 Day 4 (August 30th) and tomorrow for Malaysia’s Independence Day (August 31st), we will use both and merge the data to get the total tweet count. For determining context we will use the Twitter Search data, to take advantage of Twitter’s spam filtering.

Update #1 (31st August 2012)

Found the reference to back-fill in the ‘count’ parameter in Twitter Streaming API docs. So it is possible to request historical tweets, though how far back in time is unknown. The stated limit is 150000 tweets and its possible the 3-6 day limit from the Search API applies. But it is not available to the public as it requires elevated access.

Written by politweet

August 31, 2012 at 1:26 am

#GOP2012 Twitter Report Day 1 & 2

#GOP2012 is the official hashtag of the Republican National Convention currently being held at Tampa Bay, Florida from August 27th – August 30th. During this period we tracked mentions of #GOP2012, #RNC, #RNC2012, #RomneyRyan2012, #tcot, @GOPConvention,  @MittRomney and @PaulRyanVP on Twitter.

The highest levels of interest were for Ann Romney (1422 tweets-per-minute) and Chris Christie (1491 tweets-per-minute).

Most of Day 1 events were postponed due to weather conditions related to Hurricane Isaac, and schedule was merged into Day 2. Stats shown start from Day 2 (August 28th 2012). Times shown are in Eastern Daylight Time (UTC -4 due to Daylight Savings Time). Click the images to view full-size.

Day Total

Tweets : 287,226

Users : 103, 932

Mentions by the minute

This graph shows tweet levels (mentions) per minute, from 1 PM – 12 AM on August 28th. Significant peaks are labeled on the graph. The peak times along with the main topic being tweeted about are listed below:

Format [Minute = Main speaker/topic being tweeted about (x users, y mentions)]

  • 2:01 PM = GOP 2012 starts (200 users, 215 mentions)
  • 5:43 PM = Mitt Romney wins the nomination (600 users, 640 mentions)
  • 7:46 PM = Mia Love and Janine Turner (428 users, 444 mentions)
  • 7:47 PM = Mia Love and Janine Turner (487 users, 517 mentions)
  • 8:33 PM = John Kasich (675 users, 695 mentions)
  • 8:59 PM = Gov. Scott Walker (665 users, 702 mentions)
  • 9:33 PM = Rick Santorum (1029 users, 1076 mentions)
  • 9:35 PM = Rick Santorum(1054 users, 1093 mentions)
  • 10:17 PM = Ann Romney (1360 users, 1422 mentions)
  • 10:41 PM = Chris Christie (1427 users, 1484 mentions)
  • 10:50 PM = Chris Christie (teacher’s unions) (1230 users, 1262 mentions)
  • 11:00 PM = Chris Christie (asking everyone to stand up for Mitt Romney, also praise for Chris Christie’s speech) (1466 users, 1491 mentions)
  • 11:20 PM = Post-convention commentary (916 users, 961 mentions)

Mentions by the hour

This graph shows tweet levels (mentions) and users tweeting per hour, from 00:00 – 23:59 on August 28th. The gap between the users and mentions indicates how hot the tweeting activity was. The data is shown in the table below:

Hour Users Mentions
0 1032 1380
1 645 903
2 432 615
3 320 487
4 246 410
5 265 384
6 388 549
7 889 1239
8 1334 1871
9 2291 3060
10 2956 4130
11 3044 4084
12 4509 6514
13 6708 10594
14 6846 11374
15 7636 12649
16 8553 14825
17 11219 20529
18 9939 16099
19 10621 18871
20 15748 29539
21 20789 42260
22 26218 48011
23 24096 36849

Location of Tweeple

This map shows where geo-located tweets on the convention were coming from in the United States. Blue tweets are English, red tweets are non-English.

Popularity of Searchterms

The list below is a breakdown of how many users wrote tweets containing each searchterm,  from 00:00 – 23:59 August 28th. It is ordered by the number of users.

  • #GOP2012 = 39644 users, 92329 mentions
  • #RNC = 38240 users, 71349 mentions
  • #RNC2012 = 27557 users, 57593 mentions
  • @MittRomney = 24638 users, 39126 mentions
  • #tcot = 10625 users, 34678 mentions
  • #RomneyRyan2012 = 8030 users, 14230 mentions
  • @PaulRyanVP = 5230 users, 6750 mentions
  • @GOPConvention = 4548 users, 7839 mentions

Popular tweets

The most retweeted (RT) tweets of the day are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RTs shown only cover retweets made on August 28th.

1. Sarah Silverman, 1514 RTs

2. Mitt Romney, 1011 RTs

3. Ann Romney, 952 RTs

4. Mitt Romney, 877 RTs

5. Paul Ryan, 780 RTs

Note

At least 287,226 tweets about the GOP convention were tweeted by 103,932 users. Politweet uses Twitter’s Search API to track tweets for two reasons – to take advantage of Twitter’s spam-filtering; and because the Search API is closer to the end-user experience. However this means some genuine users are sometimes filtered out and their tweets are not obtained. Sometimes live tweets are missing from search results and only appear later, which causes them to be missed by our search.

For example despite the #RNC tag, Sarah Silverman’s original tweet was not found in our data, but all the retweets were. This is not to imply that Sarah Silverman was blocked, but a little missing data is just the nature of the Twitter search engine during live events. Journalists should note that the real totals are actually slightly higher.

Chris Christie delivered the highest number of tweets. For an idea of how big the crowd was while he was speaking, here is a screen capture from Youtube:

Update #1 (31st August 2012): Stats on this page are not ‘slightly lower’ as reported, but significantly lower. We caught on to the size of the discrepancy after Twitter wrote a blog post about the event. Explanation on the reason for this discrepancy is in this blog post. However the peaks on the graph are still relevant, and do indicate which speakers were most popular.

Written by politweet

August 29, 2012 at 10:13 pm