Thursday, October 8, 2015

Royals Playoff Power Rankings!

Hey you know that baseball team that I liked when I was a kid, but didn't make the playoffs from the time I was 5 until I was 33?  They're in the playoffs again!

In the past I have used sentiment and topic modeling technology to analyze text on the #ksleg hashtag on twitter, and also create "power rankings."  Why not do that for the Royals too?  Especially in the playoffs? Let's kick the playoffs off in style.


So my power ranking system is based on the reach an individual account has through tweets, retweets, and favorites.  I don't disclose the system, because I know that being on my power rankings is highly coveted, and accounts would just start gaming my system.  I can say that I downloaded the last 48 of tweets using the hashtage #royals.  Hhere are the top #Royals accounts over the past 48 hours:

The top ranked here actually make a lot of sense.  The @Royals official account, some players, a few local media people.   I'll check in tomorrow to see who is top at live-tweeting the game.


So what topics are people talking about when using the #Royals hashtag?  Here's a wordcloud of common terms.
Quite a few things you would expect in there, including Astro (their opponent), baseball, MLB, takethecrown (obvious Royals pun), and... Emma Watson?  What?

Ok, so can we break this into topics using a simple low-n topic model algorithm?  Answer: Yes.  I used the correlated topic model algorithm described here.

Topic 1: Topic mainly about naked Emma Watson pictures, and people selling stuff, some of it Royals related like this.  (I've wrote about this before, but high popularity hashtags get a lot of spam traffic, which is the nature of this hashtag).

Topic 2: Topic about the other Royals.  You know, Kate, William, their babies.  Pictures of them.  Here's an example.

Topic 3: A topic cheering on the Royals.  This one is largely talking about the playoffs, and being positive towards the Royals.

Topic 4: A topic about tickets.  A lot of people looking for tickets, or regretting not having tickets.  Or gloating, about having tickets.


Sentiment mining is effectively mining text for feelings.  Where the model above was a topic model which just separates tweets into broad correlated categories by "what" they are talking about.  Sentiment mining however looks at tweets and determines what emotion is being expressed (joy,sadness,anger,etc) and how positive or negative they are.

A lot of different ways to go with this, but I used our above topics (renamed appropriately) and cited the associated probability associated with two sentiments, joy and sadness. 

What did I find?  Cheering on the Royals was the most joyful, while not having tickets, is the saddest.

No comments:

Post a Comment