Monday, August 24, 2015

Sentiment Mining #Royals Over Time

I have been analyzing political/policy data over the past few weeks using my sentiment mining algorithm, and thought it would be interesting to look at some sports Twitter data.  Fortunately, the Royals are good again this year, and approaching the end of the season, so I have a good local target.  A few pre-findings of interest:

  • Royals tweets are much more positive than political tweets in general.
  • Royals fans are most negative DURING games, and more positive when games are not being played.
  • Royals fans are also less "sad" during games, though more likely to be angry or disgusted when tweeting about the team.


I used the same methodology as before, downloading tweets for approximately the past week that used the #Royals hashtag. Then a bunch of preprocessing, including word removal, stemming, etc, as discussed in prior posts.  Finally I classified each tweet by emotion and polarity using a naive-Bayes algorithm, and analyzed focusing on over-time trends.


First our initial descriptive data, general polarity of sports tweets? The data is generally pretty positive.  Though this may have more to do with the Royals having on of the best records in Major League Baseball right now.  It is certainly much more positive than political data (a future post on that).

What about the emotions of Royals tweets?  See chart below.  Joy is number one by far, but keep in mind it is the only *real* positive emotion, whereas there are four negative emotions to spread the negativity around on. 

A lot of sadness in our sample too, with a little anger, and a bit of surprise.  (I know given the last twenty years of Royals seasons, I am full of surprise every time they win.)


The #Royals hashtag has quite a bit of volume on it throughout the day, so we can analyze the data-trend over time.  Here is the hourly data over the past four days, with the volume of tweets in blue, and the % positive tweets in orange.

A few things stick out of this graph:

  • Regular daily spikes of activity, not always at the same time.  Oh, yeah, because the Royals play games most days in summer, and that's when most people tweet about them.  I matched these up manually to the Royals schedule.
  • The % positive tweets is volatile, but appears to have a "low point" each day when the Royals play. I tested this, and found that fans are significantly more negative when the team is actually playing. Here are the actual results of that:

What about emotion?  Looking at emotion hourly doesn't make a ton of sense, because of multiple categories (dimensionality).  We can, however, summarize the data by emotions during a game/not during a game.  During games, Royals fans are generally less sad, but much more likely to be angry or disgusted.  There is also a slight uptick in "joy."


A few moderately interesting findings in the data, and I think some of it makes sense.  

The spike in negativity while games are being played: This is likely because the Royals are having a great season, and so when responding to the Royals outside of game times, fans are likely to be talking about their record, or playoff chances, etc.  During the games, fans are more likely responding to an acute event, such as a bad at-bat, a missed throw, etc.

The spike in disgust and anger during games: If you look at the shift during games from sadness to disgust and anger, you're likely looking at a shift from a generalized feeling about the team ("sad" a player being on the DL, last nights game, etc) to another acute emotion (disgusted at a sloppy play, angry at Ned Yost for being no better than a random number generator as a manager).

Sports tweets more positive than political tweets: This one is fairly obvious.  Sports are what we do in our spare time in order to enjoy ourselves.  By definition it exists for fun.  Politics is what we do to resolve disagreements about how the world should work.  It is by definition negativity and conflict.  

No comments:

Post a Comment