Thursday, August 13, 2015

Sentiment Mining and #ksleg Power Rankings

Here we go again.  Once again mining one of my favorite Twitter hashtags for the last week (ish).  First, a wordcloud to know what people are talking about.


This is generally the same as we've seen before, people talking about the governor, education, but a few new words like "brother" show up too.


TOPIC MINING AND POWER RANKINGS



And summary of the topics with relevant news links:



And once again the (somewhat humorous) by-user power rankings.  Not too much surprising here, still working on developing a one-metric reach index.


SENTIMENT MINING

(non-nerds skip to the RESULTS below)

So I've been playing with sentiment mining a little bit, and thought of a few applications here. Sentiment mining is different than the previous text modeling I've used in prior analyses.  My prior analyses focused on Topic Modeling, which is using an algorithm to determine the topics that exist in a set of documents (tweets) and categorizing those tweets by topic.  

Sentiment mining focuses on using an algorithm (in this case a Naive Bayes classifier) to determine the "emotion" communicated by a document (tweet).  For this purpose I used the now deprecated "sentiment" R library.  It includes a pre-trained Naive Bayes classifier that gives me two outputs when I run tweets through it:

  • Emotion: Categorizes tweets by general that is being communicated.  A lot of tweets fall out of this as they don't match a clear "emotion"
  • Polarity:  Determines if a tweet is generally positive, negative or neutral.  

Aside from installing a deprecated R library, the process for sentiment mining is fairly straight forward.  I used the same tweets from above, and ran them through the classifier.  (ask me for code if you want it)


RESULTS

Mining text like this allows us to look at a few things.  First, the polarity (negative/positive/neutral) of all #ksleg tweets.  This graph shows that positive tweets have a slight advantage over negative  tweets.  

Side note: a positive tweet wouldn't necessarily be saying positive things about the State, but could be saying things of a positive nature.  Example:  We should pay teachers more! is actually negative towards the State of Kansas, but a "positive" statement in algorithm terms.


What about the emotions in the tweets?  See graph below.  Some may think that there's too much "joy" being categorized based on current attitudes in Kansas.  On further analysis "joy" is the only abjectly positive emotion, and it's possible to express joy over a negative news story.  Further reading.


APPLICATIONS

There are two predictive applications from sentiment mining: classifying people, and measuring emotions between people.  

First, when you look at the types of people that tweet on the hashtag #ksleg, there are two main types:
  • Pundits: People who tweet their opinions on policies they like or dislike.
  • Newsies: People who report the news.  Supposedly in a neutral way.
You would assume you can classify users by the polarity of their tweets into Newsies (more objective) and Pundits (more objective).

First I took our top 20 from the list above, classified them manually by my knowledge of whether they work for a newspaper, are a fake account, etc.  Then I calculated the percent of their tweets with negative polarity, and reclassified them using a 33% cutoff point (less negative tweets = newsie, more negative tweets = pundit).

Using this method, I was able to successfully parse out Newsies from Pundits 85% of the time.  And two of those failures were conservatives that are just more positive about the current administration.  The algorithm could be refined further, but generally works.



Second, what about emotional tweets, is there anything they can tell us?  For each tweet I know who was being "replied to," what kind of emotions are being directed towards users?

From the emotional categories above, I can set a statistical (bayesian) prior, and then determine which tweeters are obvious outliers in the emotions directed towards them. There were two obvious accounts that create statistically different reactions (p <0.01). 

First the most joyously received tweeter on the #ksleg hashtag (Bryan Lowry of the Wichita Eagle):



And the most angrily received (Michael Austin, works for KDOR):


(A quick editorial note.  I disagree with the above Twitter user quite often, but agree with him sometimes too.  He's in a rough spot in defending policies that aren't working out.)

CONCLUSION

You can see the Twitter power rankings and topic modelling above for an update of how things are going.  The biggest takeaway from a data perspective is that we can successfully sentiment model tweets.  More important, we can use the polarity and emotion of sentiment mining to both categorize users, and to measure the emotion directed towards users.

No comments:

Post a Comment