## Friday, May 22, 2015

### Reduction of Multinomial Data: Measuring Diversity with a Single Number

A decade ago I was confronted three coworkers with a major data problem.  They were trying to make a meaningful statement on whether or not certain entities had become more racially diverse over time.  They were looking at numbers like this (these are 2010 total US census numbers):

The problem they faced was this: How do we show a change in racial statistics over time (which is generally expressed as a series of six or seven numbers) in way that's accurate and easy to measure  and communicate?

The analysts had a few good starts, but none of them really measured diversity:
• Percentage of non-white people?
• Ratio of minority to majority groups?
So I took a couple of days to think...

## NERDINESS

(skip if you're just interested in acutal results)

After a few days of thinking, I came up with a metric, which I called:  Effective Number of Races, or my diversity index, calculated as:

Where p is the proportion of the total population of each racial group.  Admittedly, this metric isn't perfect, but it has an appropriate reaction to diversity:  Homogeneous populations with few racial groups active in small numbers get very low diversity scores; heterogeneous populations with more, large minority groups get larger scores.

This metric isn't necessarily a new idea; but it's derived from other fields where it's necessary to measure heterogeneity in multi-nomial variables. Two examples: in economics (effective number of firms: How many firms are REALLY active in this sector?) and political science (effective number of parties: How many parties are REALLY active in this parliamentary system?).

## OBSERVATION

So for comparative purposes I can use this score to compare diversity among different populations and sub-populations.  For instance:

• US Populations in general: 2.21
• US House of Representatives: 1.53
• US Senate: 1.13
• US President: 1.0 (This is a joke, FYI)
This analysis clearly demonstrates, through a single number something that we already know: the US Government is significantly less diverse than the nation as a whole.

## PREDICTIVE

This is good for demonstrative purposes, but what about predictive purposes?  Can diversity be predictive of political, social, or economic outcomes?

So lets start with politics.  Will a more diverse population lead to different political outcomes?  A priori theory would tell us that more diverse populations in the US would favor the democratic party (I won't go into the reasons why).

To prove this out I regressed the diversity index for each state against the percentage of democrats in the state legislature.  I found that a 1 point change in the diversity index led to a 11 percentage point gain in democratic votes. This is a significant correlation.

(BTW, to do this I had to calculate diversity values for each State, if you're interested in that data let me know)

## OVER TIME ANALYSIS

One of the best uses for the diversity index, involves its ability to measure diversity over time, and show how a population has changed in a single number.  I've plotted three time periods for the diversity index below (1960,2010,2050) against a simple % White metric.  Two thoughts:
• I'm personally shocked at how large this change is.
• The absolute slope of the diversity index is much higher (and here more accurate) because it measure not just a reduction of the white proportion, but also the growth in multiple minority populations.

## WHEN IT FAILS

One of the cases I wanted to analyze was one that has been covered in the media: the diversity of the population of Ferguson, MO to that of its police department.  Here are the values I received:

• Ferguson MO: 1.85
• Ferguson PD: 1.12
As we may expect, there is a huge differential in the diversity values.  Unfortunately though, because Ferguson is overwhelmingly African American and the Police department is majority white, the diversity index actually underestimates the effective differential here.

This doesn't occur often, but a good analogy would be South African apartheid.  The diversity index would not be a great metric in this case because both values would read close to one, because both the population and leadership were homogeneous groups.  Only, in this case, one was homogeneously black and the other homogeneously white.  In essence: the diversity index is not a good proxy for measures of inequality.

## CONCLUSION

The diversity index can serve as a powerful metric to measure differences between populations, subpopulations, change over time, and to predict the impact of diversity on various outcomes.  However, in instances of flipped majority groups (Apartheid South Africa, for instance) it is not a good proxy measure for inequality.