Sunday, November 20, 2016

Supreme Court Death-Loss Simulations

Since the nomination of Donald Trump as President on Nov. 8th, the media has featured many narratives on the negative impacts of the future administration.  These stories vary widely in scope and impact, and while there are likely some legitimate fears given the rhetoric of the Trump campaign, there's also a fair chance that some narratives are actually fear mongering.  One fear that seems valid is Trump's impact on the supreme court.  It goes something like this:

In the next four to eight years, there is a reasonable chance that Donald Trump will be able to replace at least one liberal or moderate Justice of the Supreme Court due to death.  If Trump decides to replace that Justice with an ultra-conservative, it could change the majority that has held in recent decisions (e.g. gay marriage; abortion) and impact case law for generations to come.

When my friends (both liberal and conservative), especially those who have a vested interest in gay marriage, abortion, or the affordable care act, hear this, they respond in very emotional ways.  I certainly understand this, as these issues cut to heart of people's identities, livelihoods and health. As a data scientist though, my reaction is to simply ask the question: What likely outcome is indicated by data, and how might that impact the future political landscape of the Supreme Court?


The political background of this situation is complex, but I will stick to derivation of assumptions for this analysis:
  • Relevant Cases: For people in my generation, two very recent seem to have the most impact on their outlook on the supreme court.  
    • Whole Woman's Health v. Hellerstedt: Case regarding what kind of additional restrictions states can place on women seeking an abortion.  The court found 5-3 that states cannot place restrictions that create an "undue burden."
    • Obergefell v. Hodges: This is the infamous gay marriage case which the court held that same sex couples have a right to marry.  This was a 5-4 decision.
  • Current Court Dynamics: Obviously justices vote differently on different issues, but for these two key cases, majority was the same.  Here's how it lays out:
    • Liberals: Kagan, Ginsberg, Sotomayor
    • Moderates (joined liberals in the majority): Breyer, Kennedy
    • Conservatives: Thomas, Alito, Roberts, Scalia (now deceased)
  • Presidential Politics: The progressive theory that underlies the fear of a Trump Court, is that Donald Trump will nominate ultra-conservatives to the court, and they will be confirmed.  There are two main issues with this theory:
    • Trump Conservative?:  There is still an open question regarding how conservative Trump will govern, and what his *real* opinion of social issues like abortion and gay marriage may be.  Beyond this, there is an open question of how much influence Vice President Mike Pence will have, and we're slightly more certain of Pence's conservative agenda.
    • Merrick Garland Stall: Following Scalia's death earlier this year, President Obama nominated Merrick Garland as a replacement.  Congressional Republicans have stalled on confirming that nomination, with the assumption that Trump will replace Scalia in January with another conservative.  This raises a further question for Trump's nominees and this analysis: If a liberal justice dies during the last year of a Trump presidency, will congressional democrats consider this Garland incident a precedent?

What does all of this mean?  For the remainder of this analysis we'll refer to Breyer and Kennedy as liberals, because of their impact on these two social cases.  And obviously the Supreme Court is complex, but generally if Trump can replace one member of the liberal wing of the court with an ultra-conservative, we may see very different future court decisions on social issues like gay marriage and abortion.


General Assumptions: This methodology makes the (likely safe) assumption that no member of the liberal wing of the court will resign their position to be replaced by a Trump nominee.  Given the politics and recent history of the court this seems like a reasonably safe assumption.  As such, calculation and simulation engine I derived below are based upon actuarial probabilities of death.

Nerdy Methodology (feel free to skip): I use annualized mortality risk by age and gender for US citizens, and then use a Kaplan-Meier estimator to determine survival probability rates over an assumed four year and eight year administration. Whereas a parametric Weibull-based survival model may have been more elegant in solution, the sample on which mortality estimates are based was sufficiently large, and curve fitting in Weibull may introduce other types of error.  For straight forward probability estimates, this methodology is sufficient, however some complex scenarios require a simulation based solution.  I created a Supreme Court Survival Simulation Engine (SCSSE, I guess) which simulate who will survive the next eight years to answer these more complex questions.  


Before I go into estimates of mortality/survival, we should first coverwhat makes an individual more likely to die.  Without knowing health risk from detailed medical histories, the three most predictive factors in short-term mortality are age (older = more likely to die) and gender (male = more likely to die), and affluence, measured in various ways (we'll think in terms of income percentiles).  This means we can get general survival odds by looking at four general things:
  • Age
  • Gender
  • Any Known Public Health History
  • Affluence
I will ignore affluence for now, because mid-life affluence seems to be predictive, and all Supreme Court Justices are similarly affluent. But here's a summary of what we know about current Supreme Court Justices risk:

To summarize:
  • Conservatives: Rather boring, three men, all in their 60s, and only Roberts has even a rumored health issue.
  • Liberals: The news is relatively bad for liberals.  First is an 83 year old woman who had cancer twice, which is risky enough.  Then add two men (aged 78 and 80) with similarly high mortality risk levels.  

I used this data to create cumulative mortality risk estimates for each Supreme Court Justice, the supreme court as a whole (what is the probability that all 8 current justices will survive) and liberals and conservatives separately.  

First a chart for each justice separately.  The numbers and line on the chart represent the probability of surviving through each year of the administration.  You will see that relatively healthy, young justices (e.g. Kagan) have relatively low chances of dying even over the second term of Trump (flatter lines).  Older justices, however, (e.g. Kennedy, Ginsburg) have a lower shot at surviving, some less than 50% chance of surviving a second Trump term.

We can then aggregate these yearly by-justice probabilities into survivability-as-whole numbers for the entire supreme court.  Below I've created annual survival probabilities for three groups of Supreme Court Justices: 1. All Supreme Court Justices, 2. Conservative justices (defined as three living dissenters in two above cases), 3. Liberal justices (defined as those in majority in above cases).  The Y-axis is the probability that all members of each sub-group will survive through each year of the Trump administration (Years represented on X-axis).  Here's what that looks like:

Survival charts cane be a bit complex to read, a summary though:

  • First Term: At the end of the first term, there is only a 34% chance that all justices will survive, 42% that all five liberal justices will survive, and 80% chance that all conservative justices will survive.  This means there is 58% chance that at least one liberal justice will die, to be replaced by Trump.
  • Second Term: At the end of a potential Trump second term, there is only a 6% chance that all justices will survive, 11% that all five liberal justices will survive, and 68% chance that all conservative justices will survive. (i.e. an 89% chance that Trump will have opportunity to replace a liberal on the court)
In essence, the probability is better than 50/50 that Trump will get to replace at least one liberal justice in his first term, and nearly 90% that he will be able to shift the balance of power by the end of his second term (if he so chooses).  But what about more complex scenarios, what are the odds that Trump gets to replace two liberal justices in four or eight years?  Enter a simulation engine.


My prior analysis showed a high probability that Trump may be able to shift the current balance of power in the Supreme Court, but can we predict the odds for replacing more than one liberal justice?  To do this we need a simulation engine with some robust matrix-algebra/storage capabilities, which I designed in R.  The simulation engine is somewhat novel in it's ability to calculate number of survivors for heterogeneous groups of people, and could, theoretically be applied to any group of people, including families.  Moving on.

The first simulation involved all eight current Supreme Court Justices and calculated the number (independent of ideology) that would survive the first and (potential) second term of the Trump administration.  I ran one million simulations, and output the graph below, (two term on left, one term on right).  The Y axis and bar label represent the number of simulations that ended in this outcome, the X axis is the number of justices surviving in the simulation.  Essentially: the height of the bar/1,000,000 is the probability of each outcome.

What do these simulations reveal?
  • First term: About a 35% chance that all justices survive, 41% that that all but one survive, and 20% chance that 2 die. 
  • Second term: only a 6% chance that all justices survive, 20% chance that all but once survive, 35% chance of two dying, and 25% chance that three die in that time.  And a marginal, yet 3 in 1,000,000 chance that all eight Supreme Court Justices die in the next 8 years.
That's interesting, but shifting the balance of power involves specifically replacing liberal justices.  Let's re-simulate and only analyze the five liberal justices.  Here's what that looks like:

And these simulations
  • First Term: about a 40% chance of all justices surviving, 42% of one liberal justice dying, and 15% chance of two liberal justices dying in that time.
  • Two Term: 10% chance of all liberal justices surviving, 34% chance of one liberal justice dying, 38% chance of two dying, and 16% of three dying.  Also, again a minute but real 8 in 10,000 chance of Trump being able to replace all liberal justices by the end of 8 years.
Another interesting scenario is just looking at the three most liberal justices (coincidentally? all female).  Here's an analysis of the death probabilities for just Kagan, Sotomayor, and Ginsburg

  • One Term: There is a 68% chance of all three liberal justices surviving the first term, 30% chance of one dying (most likely Ginsburg).
  • Two Term: There is a 39% chance of the three justices surviving the second term, 53% chance of one dying, and 8% chance of losing two of the three most liberal justices.  There's also a roughly 0.3% chance of all three female justices dying by the end of Trump's second term.


I've taken reasonable steps to create accurate estimates in terms of calculating mortality risk for each Supreme Court Justice and pools of justices.  There are a couple of potential sources of bias, which may or may not be adequately controlled for:

  • Health - Generally speaking Supreme Court Justices are fairly healthy despite their age.  The types of illnesses we see among Supreme Court Justices seem fairly normal for cohort of their age, if not slightly more healthy than average similar American groups (the Supreme Court is, after all, largely a group of still-able-to-work senior citizens).  The outlier here is Ruth Bader Ginsburg, who has survived cancer.  Twice.  Those cancers (colon 1999 and pancreatic 2009) both carry very high mortality risk, so it's difficult to acquire accurate post seven year mortality multipliers. Since she has survived seven years, I make the likely assumption that her pancreatic cancer was caught in time, and is no-longer a risk.

  • Affluence-We know that affluence, and specifically income levels at middle age tend to impact mortality risk later in life.  Supreme Court Justices are likely in the top 2-3 percentiles of Americans in terms of income and education (have law degrees, make $244K annually).  This means that our mortality estimates may over-estimate the death probabilities for Justices who may, ceteris paribus, live longer due to income, affluence and privilege.  Reviewing relevant data, and available information on the relationship between mortality rates at the median versus top percentiles, it is likely that annualize mortality risk for Supreme Court Justices is 25-40% lower than the median. 
As I pointed out, there isn't a good case to make in-line adjustments to mortality estimates based on health, but affluence seems a different matter.  Using the estimates developed above, I re-ran the annualized probability of all cohort justices surviving, results below.

To summarize, if we account for the impacts of income and affluence, the four and eight year risk of replacing at least one Supreme Court Justice falls to 46 and 81% (from 58% and 89% respectively).  If we make an affluence assumption these values may be more accurate, however it's difficult to definitively know the impact that affluence has had on each Supreme Court Justices.


The results of this analysis are fairly straight forward:
  • There is a 58% chance that at least one liberal justice will die during Trump's first term, and 89% chance of the same if Trump is elected to a second term.
  • If Trump is elected to two terms, it is likely (57%) that two liberal justices will die during his presidency. 
  • These estimate may slightly overstate the risk of death for liberal justices, due to economic affluence, and as such, the risk of death of at least one liberal justice may be reduced to 46% (four year) and 89% (eight year).
  • These are simply the death risks, and to assume that any of these deaths would shift the balance of power, also assumes that Trump would appoint conservatives, they would approved by Congress, and the Merrick Garland precedent is not used by Democrats.

This analysis may seem somewhat cold, robotic and mathematic to some users.  And it is.  Mortality is difficult to discuss, and to place hard mathematical numbers to the odds of surviving past a certain date is a bit frightening.  But it is also necessary at times to look at the world in these terms, such that smooth and open succession planning can occur.   There are a couple of obvious implications to this analysis in my mind:
  • I (or someone) should have conducted this analysis prior to the election, so everyone could have better understood the full implications of the Trump presidency on a few issues important to both conservatives and liberals.
  • There's a greater question here on how we look at our own mortality, and how we manage risk around it.  I'll close with a question.  If the liberals on the court would have known these odds of survival (and conversely, having their replacement chosen by Trump), would any have resigned in 2014?

Monday, November 14, 2016

Quick and Easy Geographic Maps in R

Over 10 years working in analytics and data science I have found policy makers and business executives gravitate towards analytical maps in order to understand business, social, and demographic relationships.  Though I have created these types of maps in one form or another for over a decade, I've found many young analysts have trouble understanding the data and code that at the base of geographical analysis.  The purpose of this post is to demonstrate the mapping capabilities now available in R, and describe how to quickly create attractive graphical representations of spatial data.


A few months ago on Twitter, I was criticizing R's ability to create quick, functional, and attractive maps. My essential criticism was this:

In order create good map visualizations, I often have to pull my data out of the R statistical engine and merge it with a shapefile inside of a GIS system like QGIS.  QGIS is great, and I can create awesome visualizations in that system.  

Other R users jumped in and encouraged me to check out some newer functionality, specifically that found in the ggmap package.  This package is related to the popular ggplot2 package that I often use for creating graphical representations of data and models at work and on this blog.  Using this new(ish) functionality, I was able to code the following map in just a few minutes, and with just a few lines of code.

African American % by Precinct, Sedgwick County Kansas


The code to create this map was straight forward, here are my comments on the capabilities of dealing with GIS data in R:

  1. readShapeSpatial is a function that allows us to ingest shapefiles into R.  Shapefiles are a standard data type for geographical data, for more information see here.  
  2. fortify is a function that we can run against a shapefile to transform the geospatial data into an R data.frame.  I would recommend analyzing the output of this process, it is informative about your dataset, as well as how geospatial data "works."
  3. @data  is the classic data element of the shapefile (holds demographics, generally), which we can reference as (shapefile@data) and treat like a data.frame in R (see code below).
  4. qmap  is the analog to ggplot2's qplot.  It is a way to quickly create maps, without requiring much syntax or handling.  A few things of note:
    1. The function allows us to underlay google maps against our shapefile, the first parameter here is the text "search" of google maps on which to center our map.
    2. Zoom we also pick a zoom function, which tells us zoomed in on our search area the map should be.  I recommend just playing with this until it looks good.
    3. geom_polygon is a function that tells qmap what to do with the shapefile.  You'll notice that the rest of the syntax looks much like that in ggplot2.  If you need help with that type of syntax I recommend this cheat sheet.


 #grab my shapefile  
 shapefile <- readShapeSpatial("KLRD_2012VotingDistricts.shp")  
 #create id from rownames   
 shapefile$id <- rownames(shapefile@data)  
 #fortify shapefile, creates a dataframe of shapefile data  
 data <- fortify(shapefile)  
 #join data file to the @data which is the attribute table dbf element of the shapefile  
 data = join(data, shapefile@data, by="id")  
 #subset by FIPS for county level data  
 data <- subset(data,substr(data$VTD_2012,3,5) =="173")  
 #calculate % African American  
 data$AA_PERC <- data$BLACK/data$POPULATION  
 #run qmap for Sedgwick County Kansas  
 qmap('Sedgwick County Kansas', zoom = 10) +  
      geom_polygon(aes(x = long, y = lat, group = group, fill = AA_PERC), data = data,  
      colour = 'white', alpha = .6, size = .3)+   
      scale_fill_gradient(low = "green",high = "red")