Friday, September 18, 2015

GOP Primary Debate Number Two: CNN Summary And Text Mining

Three hour debate?  Seriously?  After my wife and I realized it was going to be three hours we started recording the debate and watching something else.  I intended to watch the rest of the debate later, but why would I do that when I can just run an algorithm and review the results?  Here's a summary take-away of what I found:
  • The candidates drawing the most attention in tweet volume in order were Trump, Fiorina, Bush (with others coming in much lower).
  • Winner of the debate was likely Fiorina, with twitter most associating  the term "loser" with Kasich.
  • Donald Trump dominated the social media conversations following the debate, but conversations about him were focused on how "funny" he was and his insults of other candidates, not issues or policy stances.


I downloaded about 100,000 tweets with the hashtag #GOPDebate the morning after the debate.  I used normal text scrubbing, stemming, and data-izing methodologies.  Then I went to work analyzing the data, including a wordcloud, topic modeling, some basic sentiment modeling, and summarizing of data. 

What everyone wants, here's our resultant wordcloud, note that Trump dominated the text, with "Trump" being even more common than the word "debate."

I also created a few topic models to try to parse out what topics were being discussed during the debate.

It turned out the topic models didn't converge as well as I would hope, why?  A few reasons:

  • Almost every topic we pulled out would be dominated by Trump, because Trump dominated the conversation.  Monotonic documents don't lead to very good topic models.
  • Even the non-Trump conversations were dominated by Trump's interactions with other candidates.  For instance Topic 4 focuses on Jeb Bush, but still has "Donald" and "Trump" in its top 5 associated terms.  Sames with topic two and Carly Fiorina.


If a debate is a competition to generate twitter traffic then Trump won clearly.  Here's a graph of top candidates and their respective twitter volume.  I also overlayed some sentiment data as well.  Trump clearly wins with nearly double the twitter volume of any other candidate.  He also had a lower negative tweet percent than any other candidates, but there's a reason for that, which I will address later.

What about a more objective view of who won, given that debates aren't really Twitter volume competitions?  We can look at the terms that are most associated with the words "winner" and "loser".

Notice a lot of reporter names and other references end up in this list of most associated terms, largely because they are the ones discussing winner/loser outcomes.  However, in our winner list Fiorina is the only candidate name to show.  In our loser list we see John Kasich's name.  From the little bit of debate I saw, Kasich wasn't necessarily a loser, it was just very unclear why he is still in the race.  This isn't a perfect methodology, but it does speak to a twitter consensus:  When discussing winners of the CNN GOP Debate, Carly Fiorina was the most frequent topic of conversation.


Looking at the issues and topics statistically associated with candidates has a couple of applications.

  • We can determine which issues are each candidate's signature issue.  For instance, last debate we recognized that Huckabee is effectively a social-conservative issues candidate, as his top associations were abortion and gay marriage.  
  • We can identify the most prevalent negative issue for each candidate.  For instance, Trump's comment on Fiorina's face coming through as top issue.
Here are the related terms for some top candidates:

And some quick analysis:

  • Fiorina: Focuses on her comments on Planned Parenthood, related videos, and relationship to other candidates.  
  • Trump: Focuses on words like "funniest", "reaction", "msm", and "drudge" a lot of terms related to the spectacle of Trump's candidacy.  This also relates to the reason Trump has a low-negative-tweet ratio: the conversatin regarding his candidacy is not issue-based, it's largely talking about how hilarious he is.
  • Huckabee: So apparently there's an actor with that last name on The Walking Dead.  Not good that that association is popping to the top, meaning that Huckabee is not a high-popularity candidate.  The weird words that keep popping up for Huckabee are "soulless" , "hell","heathen", and "atheist".  Also "epicfail."
  • Bush: Focuses on his family (George, Barbara), whether his Brother kept us safe.  Also a reference to pot (likely regarding Fiorina's joke about Jeb smoking week weed).


A few bullet points to close this out:
  • Trump is still dominating the social media and conversation around the Republican Presidential primary, though more out of spectacle than actual issues.
  • Fiorina appears to have won the debate from analysis of twitter conversations.  
  • Bush, who is still favored by many real analysts to win the debate, is being talked about larger context of his family, and his brother's presidency.

No comments:

Post a Comment