Friday, November 13, 2015

Mizzou Protest Mining

I don't want to offer an opinion on the situation at Mizzou on this blog, however, interest in the situation sent me down the path of text mining tweets today. I downloaded last two days of tweets with hashtag #mizzou and used my normal methodology  (download tweets, sentiment, topic mining)  Some takeaways (I'm just going to show data, not too many words after this):

  • Negative Polarity: Tweet polarity-wise, this is the most negative set of tweets I've ever mined.  More negative than government and education in Kansas, much more negative than the Royals.
  • Recently Negative: The tweets I analyzed and topics found are negative towards the protesters, but that could be because I pulled the last two days. If I looked earlier in the week I would likely see different results. Also if I used other hashtags (e.g. #concernedstudent1950, #blacklivesmatter, #millionstudentmarch) I would likely see much different results.
  • #TCOT Presence: I noted a large presence of TCOT (top conservatives on twitter) in the tweets I downloaded, especially in a couple of discovered topics.  The presence of this hashtag at high intervals tells me that conservatives have been widely using the #mizzou hashtag.  


Largely without comment.  First, I searched the hashtag #Mizzou and these were the top two results (for flavor).  

And a wordcloud of all the tweets.  The words center on "students," with a few hashtags like #blacklivesmatter and #millionstudentmarch. The rest of the terms focus on general racial  words with "college" terms also getting high marks.


A few months ago I compared polarity of three hashtags I commonly analyze (Kansas Government, Kansas Education, and the Royals).

Today I did the same analysis for #mizzou.  Much higher percentage negative than even Kansas Government.

Finally I created a topic model.  Generally speaking, each topic contained both positive and negative comments about the  protesters, with the majority leaning negative.  The most positive towards the protesters was topic three.  If I had to name the discovered topics they would be:

  1. Truth/Lies of the protesters.
  2. TCOT.
  3. Support of protesters.
  4. Free Speech and Football.
  5. Generally making fun of the protesters.

Here's my term printout from R:

And each topic with it's top correlated tweet:

TOPIC 1: "Lies" of protesters


TOPIC 3: Support of protesters.

TOPIC 4: Free Speech and Football

TOPIC 5: Generally making fun of protesters.

1 comment: