Monday, November 14, 2016

Quick and Easy Geographic Maps in R

Over 10 years working in analytics and data science I have found policy makers and business executives gravitate towards analytical maps in order to understand business, social, and demographic relationships.  Though I have created these types of maps in one form or another for over a decade, I've found many young analysts have trouble understanding the data and code that at the base of geographical analysis.  The purpose of this post is to demonstrate the mapping capabilities now available in R, and describe how to quickly create attractive graphical representations of spatial data.


A few months ago on Twitter, I was criticizing R's ability to create quick, functional, and attractive maps. My essential criticism was this:

In order create good map visualizations, I often have to pull my data out of the R statistical engine and merge it with a shapefile inside of a GIS system like QGIS.  QGIS is great, and I can create awesome visualizations in that system.  

Other R users jumped in and encouraged me to check out some newer functionality, specifically that found in the ggmap package.  This package is related to the popular ggplot2 package that I often use for creating graphical representations of data and models at work and on this blog.  Using this new(ish) functionality, I was able to code the following map in just a few minutes, and with just a few lines of code.

African American % by Precinct, Sedgwick County Kansas


The code to create this map was straight forward, here are my comments on the capabilities of dealing with GIS data in R:

  1. readShapeSpatial is a function that allows us to ingest shapefiles into R.  Shapefiles are a standard data type for geographical data, for more information see here.  
  2. fortify is a function that we can run against a shapefile to transform the geospatial data into an R data.frame.  I would recommend analyzing the output of this process, it is informative about your dataset, as well as how geospatial data "works."
  3. @data  is the classic data element of the shapefile (holds demographics, generally), which we can reference as (shapefile@data) and treat like a data.frame in R (see code below).
  4. qmap  is the analog to ggplot2's qplot.  It is a way to quickly create maps, without requiring much syntax or handling.  A few things of note:
    1. The function allows us to underlay google maps against our shapefile, the first parameter here is the text "search" of google maps on which to center our map.
    2. Zoom we also pick a zoom function, which tells us zoomed in on our search area the map should be.  I recommend just playing with this until it looks good.
    3. geom_polygon is a function that tells qmap what to do with the shapefile.  You'll notice that the rest of the syntax looks much like that in ggplot2.  If you need help with that type of syntax I recommend this cheat sheet.


 #grab my shapefile  
 shapefile <- readShapeSpatial("KLRD_2012VotingDistricts.shp")  
 #create id from rownames   
 shapefile$id <- rownames(shapefile@data)  
 #fortify shapefile, creates a dataframe of shapefile data  
 data <- fortify(shapefile)  
 #join data file to the @data which is the attribute table dbf element of the shapefile  
 data = join(data, shapefile@data, by="id")  
 #subset by FIPS for county level data  
 data <- subset(data,substr(data$VTD_2012,3,5) =="173")  
 #calculate % African American  
 data$AA_PERC <- data$BLACK/data$POPULATION  
 #run qmap for Sedgwick County Kansas  
 qmap('Sedgwick County Kansas', zoom = 10) +  
      geom_polygon(aes(x = long, y = lat, group = group, fill = AA_PERC), data = data,  
      colour = 'white', alpha = .6, size = .3)+   
      scale_fill_gradient(low = "green",high = "red")  

No comments:

Post a Comment