Sunday, December 3, 2017

Twitter Data Related to the Passing of the Senate Tax Bill

Over the past few weeks I have been heavily analyzing Twitter data: looking for methods to find mass-blocking patterns (I've tweeted on this extensively at @leviabx and may write that analysis up in the future).  A few nights ago saw some users on Twitter wanting people to save tweets of Senators leading into passing of controversial tax legislation-I thought-hey that's actually super easy.  

I downloaded that data and plan on making this data available here on the blog for other analysts to look at.  If this is popular, I will think of posting more generated datasets to this site. I also think this could give some readers of this blog a taste of what social media data looks like, and what it's like to work with it.


I used the Twitter API to pull down the most recent 500 tweets for each current sitting US Senator on Saturday night, December 2nd 2017.  For this I used a list of Twitter handles.  A couple of notes:
  1. I found this list on the internet, and made the obvious changes, so if you find any errors please let me know.
  2. Senators often have multiple Twitter handles, so I'm hoping I have the right one for this type of policy discussion.

If you have any issues on this blog you can either comment on this blog (open comments) or hit me up directly on Twitter at @leviabx.


The data I pulled down is a list of tweets found here.  This data was pulled mid-day on December 2nd so it includes tweets from immediately after tax bill passage. A few notes on (some of) the data fields:
  1. text: This is a cleaned version of the text from the original tweet.
  2. original_text: this is the original text from the tweet.
  3. created_at: the UTC timestamp of when the tweet was sent.  This is a standardized time that is hours ahead of US eastern time.. impact: subtract 5 hours to get it to DC time.
  4. emotions(anger, anticipation, positive, negative): for the user's convenience I ran this data through a sentiment algorithm (see Plutchik's 8 emotions).
  5. tax: a TRUE/FALSE indicator of whether "tax" was mentioned in the tweet.
  6. tl: this is a link to go look at the tweet directly in browser (just copypasta it to the browser). I wrote this piece of code 3 years ago and have no clue what I meant by "tl".
  7. screen_name: the screen name of the senator who sent the tweet.
  8. geo fields: There are a ton of geo location fields for Twitter data.. mostly to be ignored as it's only filled out on opt in from the user.
  9. retweet/fav counts: number of times an individual tweet is retweed or favorited.
WordCloud of related tax related tweets from Senators during the week of tax reform.


Playing with this data can be interesting and somewhat fun.  Here are some use cases you can do, from least to most technical:
  1. Find your Senators and see what they Tweeted this week.
  2. Sort the spreadsheet by "created_at" and follow the tweets by the timeline of bill passage and after bill passage time.
  3. Find tweets you like/dislike (search, emotion, names), then use the "tl" field to go to the Tweet directly and react.
  4. By sorting the emotion fields, find the Senators who were the most happy (trust, joy) versus least happy (anger, disgust) about the bill.


I'm not going to work on this dataset extensively, but I did pull together the happiest and angriest tweets regarding tax reform:

First angriest:

Now happiest: