Monday, July 13, 2015

Local Gender Ratios Part 2

Last week, my research for a friend led me down a rabbit hole of local gender ratios.  That same research quickly helped me find some racial anomalies in North West Kansas, which turned out to be influenced by prisons.

I had promised a second post on gender ratios, but the other post was a bit more interesting, so it took precedence. After looking at the county level last Thursday, I  drilled into the census block level.  The census block is a lower level of data, of varying geographical size, for which there is detailed census demographic data.  What I found was kind of interesting.

FUN WITH MAPS

First a map of gender ratio by census block.  The map is shown below.

My main observation was this: larger, more rural census blocks tend to have more men, while smaller (city/town) blocks have more women.  But a lot of variance in a map like this.  What if we focus on a smaller area.  Here's a map of Johnson County Kansas only:


 The Johnson county map demonstrates the same pattern, with more urban, dense census blocks being more female (redder) while outlying areas tend to be more male (bluer).

What causes this?  I have a few a priori guesses, but nothing strong:

  • Males tending to be more comfortable living alone in the woods.
  • "Feminized" jobs (read: secretary, nurses,teachers) tend to lie more in cities, denser areas.
  • Older (urban core) communities tend to have older populations, so underlying correlation of age:gender ratio takes effect.
  • Women tend to move to town after their farmer-husbands die. (seen this in my own family)
That's an ocular analysis of gender skews, any statistical validity though?

STATISTICS

Can we model gender ratios by other data? If you want to skip the nerd stuff, here's your short answer:

It can be modeled, and significant predictors found, but the model isn't hugely "predictive."

What factors appeared to matter in predicting gender ratio?  Here are our variables:


  • Percent Female: Dependent variable.  What we're predicting.
  • Dense: Population Density. Should be positive, as denser populations seem to lean female.
  • Med_age: Median age of population.  We know that older populations are more female, so this should be positive.
  • Vac_Perc:  We assumed that housing availability likely mattered, and assumed that the amount of vacant housing would be negatively correlated to number of females.
  • Renter_Perc: Another housing variable, this time percent of housing units that are rented rather than owned.  Likely positive to female rate.
So do the models work?  Yes and no.  Here's the first model, on statewide data:



Notice that each variables is significant and in the correct direction.  A couple of comments though.  First the data R-squared is low, so while we have significant predictors, we don't have the most "predictive" model.  This tends to happen quite a bit when predictive percentages like we are here, in reality, there are likely a lot of factors that impact gender ratios, many local effects that are difficult to measure.  Second, we have over 120K observations here, meaning a lot of statistical power.  So while are p-values are *very low* this isn't necessarily representative of very predictive variables.

Because of the statistical power issue, and concerns if different counties behave differently, I also ran the analysis for a sub-sample of counties.  Generally the relationships hold up, but are weaker (due to lower N in more rural counties).  First, the county I live in, Johnson County:



Next the county where I grew up, Saline County:


Next a more rural county, Lincoln County:


And finally, a VERY rural county, Gove County (only 2600 people live here):



CONCLUSION

A few easy bullet points for a conclusion:
  • Gender ratios vary geographically, sometimes in very significant ways.
  • At least part of these variations appear to be systematic, and correlated to other variables.
  • We know at least some of the factors that determine gender ratio by county, however the global model isn't extremely predictive: likely many local factors at play.
  • My friend (from the initial analysis) should spend her time in rural areas, with young populations, vacant housing, and few renters.













No comments:

Post a Comment