Twitter, the Warmest 100 and was there enough data? Posted on January 28, 2013 by Anthony Reply Note: I know and have worked briefly with Nick Drewe. He is rather good at what he does and it usually involves numbers, the Internet and spreadsheets. Spoiler Alert: The Warmest 100 Every year Australia’s major youth focused radio station holds a poll to pick the 100 most popular tracks for the last twelve months. This poll is called the Hottest 100; you may have heard of it. A reasonable proportion of the voting happens online, and unsurpringly, the site gives the voters the option of sharing their responses online. A few Brisbane locals, @nickdrewe (http://nickdrewe.com), @jacktmurphy (http://jacktmurphy.com) and @andythelander (http://thelanded.com), built a site to track this activity on Twitter to show which song might end up on top. Their site is called the http://warmest100.com.au. It was interesting watching all the commentary surrounding this site, ranging from discussion on its impact on betting for the winners to the creators being accused of ruining all the things and generally being ‘UnAustralian’. Some journalists even described deriving insights from a set of just over 35,000 data points as using ‘Big Data’. The dominant narrative though focused on the role social media had in providing information and its potential use in predicting trends. At least until the countdown started, then the conversation alternated between where the Warmest 100 was wrong as much as where it was right. When is Enough Data Enough? Late on Sunday night, just two hours before the close of voting, Drewe conducted his last collection of data, pushing his sample up to a massive 35,081 votes – or roughly 2.7% of the anticipated final vote – further refining his countdown’s final tally. Hottest 100 cracked as Triple J outsmarted A 2.7% sample does not sound like a lot. Until you consider that Gallup frequently uses far smaller samples in their polls, such as in their survey on attitudes towards gun law with just over 1000 respondents. Gallup goes to great lengths to ensure that their sample is random. Their methodology is interesting, as Gallup surveys randomly choose respondents by phone number representing all groups within the population and weights the responses appropriately. While Gallup polls are subject to some biases (they won’t ever represent groups without a phone, for example), they are acknowledged as generally being representative of the population. It is hard to say the same for people choosing to post their vote to Twitter. The Warmest 100 was not using a random sample. Out of the population of people who voted in the Hottest 100, it was by design selecting those that had a Twitter account and decided to share their vote. This biases the information towards one group who share a behaviour that may not be present evenly across the whole population. The data scraped from Twitter is a convenience sample with a strong self selection bias. Though these biases do not automatically make the data worthless. If adoption of Twitter and its use in this way is uniform enough among the population, it could still be representative. How Right Should it Have Been? The percentage of the vote each rank received in the sample When the Hottest 100 countdown began, it quickly emerged that not every result matched the Warmest 100’s list. This isn’t surprising. Looking at the distribution of the votes for the top 200 responses captured by the Warmest 100 shows that while the top three tracks were well ahead of the rest, the rest of the results tend to group. The distance between each point in the scatter plot reduces as you move further away from number one, and out of the top fifty, there is no reason to assume that their rank in the sample would match the population. Track Hottest 100 Warmest 100 Vote % Plus 2SE Minus 2SE Difference Macklemore & Ryan Lewis Thrift Shop (Ft. Wanz) 1 1 2.60% 2.80% 2.40% 0 Of Monsters And Men Little Talks 2 2 2.46% 2.66% 2.26% 0 Alt-J Breezeblocks 3 3 2.24% 2.42% 2.05% 0 Flume Holdin On 4 6 1.68% 1.85% 1.52% 2 Mumford & Sons I Will Wait 5 7 1.67% 1.84% 1.51% 2 Major Lazer Get Free {Ft. Amber Coffman} 6 8 1.62% 1.78% 1.46% 2 Tame Impala Elephant 7 5 1.70% 1.87% 1.54% -2 Frank Ocean Lost 8 4 1.86% 2.03% 1.69% -4 Tame Impala Feels Like We Only Go Backwards 9 9 1.50% 1.65% 1.34% 0 Rubens. The My Gun 10 10 1.40% 1.55% 1.25% 0 The top ten from the Warmest 100, while not spot on, were still very accurate, with no track more than four spots out of place compared to the Hottest 100. While “Frank Ocean – Lost” actually ranked 8th in the Hottest 100, and not 4th as predicted, Lost’s lowest predicted share of the vote fell between the shares from the sample for the 9th and 7th in the Hottest 100, meaning that this result still fell within two standard errors of the predicted position. The difference between the real rank and its position in the sample. Ideally we would want the total votes each track got in the Hottest 100 so we can compare each track’s share of the vote to the figures from the sample. This information would make it easier to establish how representative the sample was of the voting behaviour of the population as a whole. However reality is rarely so accommodating. What is clear is that the further down the Warmest 100’s list you go, the more likely the track’s rank isn’t the same as in the Hottest 100, and the greater the difference between the placements in both. Sample rank with +/-2 Standard Error A Confidence Interval (CI) of +/- 2 Standard Error (SE) for each track’s votes determined using the Bootstrap method. Considering the range that each track’s vote could fall within in comparison to others, the unreliability of the tail end of the sample makes sense. For example, it isn’t until over rank 57 in the sample that the upper range of the Confidence Interval for track 100 no longer overlaps with another’s lower limit. Also the total number of votes each track has in the sample falls dramatically as you move down the list. While the first three results are significantly different from each other those further down the list are closer, meaning even a small level of error could significantly alter their order and the chance that their rank in the Warmest 100 will match the Hottest 100 will be low. Ranks in the sample with equal, or otherwise shares of the responses As the actual number of votes each track received in the Hottest 100 is not available, it is not possible to see how close the actual results came to those from the sample in terms of share of the whole vote. Without this information, the expectation that the Warmest 100 would have predicted the share of the vote each song had to within +/-2 SE at least 95% of the time cannot be tested. It does seem that the Warmest 100 is mostly correct for the results with a large number of responses, with the accuracy declining as the number of observed votes fall. Data Works* *But sometimes your sample doesn’t. The positive and negative difference between the real rank of a track and its position in the sample. The Warmest 100 got the top three tracks right. The top ten were never off by more than four spots and the distance between what Twitter said and what the general population said gets wider from there. As you move down the list the greater the impact an error of only a few votes will have. Even in the top ten, a handful of additional votes for many of the tracks would change its position. As for the lower ranked tracks, as the number of votes in the sample gets smaller any difference in votes recorded, even within the 95% CI range, can change its position significantly. Conservative estimate. Even with the potential bias issues mentioned above, it seems that Twitter is fairly accurate for picking large trends for this population. As inaccurate as it was for most of the Hottest 100, it did pick the top three. There is more than one reason why the sample did not accurately represent the other 90, and it isn’t limited to sampling error. The small amount of data for the tracks further down the list is also a problem, exaggerating the effect of any rate of error on their order. In this case, Twitter provided enough information to predict the top three, but once the number of responses per track fell out of the hundreds, its accuracy predictably declined. The data used for this analysis and for generating the graphs can be found here. Leave a Reply Cancel reply Your email address will not be published. Required fields are marked *Comment Name * Email * Website