Tuesday, December 22, 2015

First Look: Mass Shooting Data

I've wanted to take a deep-dive into mass shooting data for quite a while, but I didn't want it to be in the heat of the moment following another mass shooting.  Over the next few days I am going to take a deep analysis into the mass shooting data we have available, what it means, and why numbers differ between sources.


There are two main datasets with mass shooting data, the Mother Jones data and the Shooting Tracker data.  Here is a brief summary of each dataset:

  • Mother Jones Data: Mother Jones focuses on multiple-death, non-gang public mass shootings.  Essentially, the kind of things we see on the news.  
  • Shooting Tracker Data: Shooting tracker focuses on any event where multiple people are shot, a very basic definition of mass shootings.


For this post I created an initial comparison of the data; just to get a sense for differences between the data.  The first issue is that the Shooting Tracker Data only goes back three years, but Mother Jones goes back to 1982.  We can generally solve for this, but any longitudinal analysis will have to be based on Mother Jones data.

As one might expect, the Shooting Tracker data tells us the number of mass shootings in the United States is much higher than Mother Jones does.  In fact Shooting Tracker tells us that we average one mass shooting a day, whereas Mother Jones tells us we average one a quarter:

An additional component is when mass shootings occur, Mother Jones shows most shootings occur during the week, whereas Shooting Tracker shows shootings occur disproportionately on weekends.

There's even a disagreement on the seasonality of mass shootings.  Mother Jones Mass shootings are scattered fairly evenly throughout the year, whereas Shooting Tracker shows a strong summer-bias.


This is just an initial first-look at mass shooting statistics, but it shows an important deviation in the way we talk about mass shootings.  I will dig into these datasets more in the next few days, attempting to understand the following:
  • Why are the datasets so different?
  • What would make a researcher choose one data set or another?
  • Which dataset is a more accurate presentation of "risk"?

1 comment: