Friday, May 6, 2016

OSHA Preliminary Death Data: How We Die at Work?

Earlier this week Twitter comedian, "devops thought leader," and Edward Scissorhands expert John Hendren (who tweets under the handle @fart) posted a tweet that caught my eye.  Specifically:

  • A normal person might react to this by saying "how horrible."
  • A nerdy or curious, but otherwise normal person might open the file, look at some of the accidents in horror and then close it.  
  • A normal data analyst would probably open the file, summarize by day of the week, and then say something lame like "most workplace deaths occur on... Friday or something."
  • I reacted in none of these ways.  I decided to make the world's most horrible word clouds (and use some cool text mining implementations). 


The file itself is just a web available CSV.  The really nice thing about the R stats language is that it's amazingly simple to read these types of files into data.

j <- read.csv("")

On inspection there are five attributes available in the table:
  •  $ Fiscal.Year
  •  $ Summary.Report.Date
  •  $ Date.of.Incident
  •  $ Company  
  •  $ Preliminary.Description.of.Incident
The data isn't very rich in reality.  Sure we know when people have been killed at work and what company they worked for.  But any information on how they died or what killed them is locked up in irregular text data, without coding, and that appears to be written somewhat haphazardly. For a flavor of that text data I created a word cloud:

For more flavor, here's actually the most awful description I found in the data:

Decedent was dumping a load of offal from a tractor trailer. He was in the process of dumping offal into a bin when the tailgate malfunctioned. Decedent was freeing the tailgate, it released, and the load swept the decedent into the offal bin. Decedent drowned in the bin.
Horrible. But what if we could use the text data to measure underlying ways that people die at work?


I used a method used quite a bit before on this blog, Correlated Topic Models to measure the underlying topics-essentially a way of summarizing and differentiating the way people die at work.  For each of the topics I created a word cloud (for effect) and some examples of the original OSHA text description.  

Topic One: Falls (Many Times off Roofs)

Topic Two: Industrial Injuries (explosions, falling into tanks)

 Topic Three: Found Unresponsive. (Generally Natural Causes)

Topic Four: Electrocution

 Topic Five: Tractor Trailers and Warehouses

 Topics Six: Hit by Something (Cranes, Trucks, Booms)

 Topic Seven: Crushed (by various things, low frequency).

And here are some of the individual descriptions of our categories above:

Decedent fell 4-ft, 8-inches off a platform, striking his head.
Decedent was using a scaffold above 10 to 15 feet while painting. Instead of extending the scaffold, he used a step ladder on the scaffold, and feel off the scaffold.
Decedent was trimming a tree and fell 60 feet to the ground.
Worker was sandblasting under a bridge and fell 124-feet from a two-point suspension scaffold.
Decedent was working inside a mobile home, mixing propane and butane to make a substitute refrigerant. A fire occurred. The cause of death was determined to be carbon monoxide asphyxiation.
Decedent was walking across a tank and fell through a hatch into a tank of boiling water. He either drowned or died of thermal burns.
Worker was washing flights on an auger of a concrete machine and was pulled into the flights of the auger.
The worker was trapped in a large auger attached to a grain silo.
The decedent returned from his break to his work area and was sitting when he fell over. He was transported to a local hospital where he was pronounced dead. It was determined the decedent died from natural causes.
The decedent was not feeling well and left work early. While he was sitting waiting for the bus he collapsed and was non-responsive from an apparent heart attack. He was transported to the hospital where he was pronounced dead.
The worker was found behind the sales counter unconscious and unresponsive. The worker was pronounced dead at the scene by the coroner's office at 4:30pm.
Worker was found unresponsive in the employee restroom.
Worker was performing welding duties aboard a marine vessel. An electrode from his welding equipment contacted the sweat on his neck, causing an electric shock.
Worker was electrocuted.
Worker was trimming trees and was electrocuted, after the aerial lift contacted an overhead powerline, causing the bucket truck to become energized.
Worker was under a home doing plumbing work in a tunnel, dug under the concrete foundation using an electrical shovel type drill, and was electrocuted.
Worker was struck by a pick-up truck that was backing up from a warehouse.
Worker was waiting in line to be loaded out at a mill. He had exited his truck to check on something on the trailer in front of him. While doing so the truck pull forward. The trailer tires struck and traveled over the worker.
Worker was unloading a vessel and was struck by a loose spinning cargo sling chain.
Worker was standing along side his semi-trailer, as it was being unloaded by a powered industrial truck. A bundle of steel weighing 2700-pounds fell, striking him.
Hit by
Decedent was observed standing behind the parts counter, conscious but bleeding from the head.
Worker was assigned to check out a semi-trailer's front right air bag. While positioned between the two trailer axles, he was struck on the head by the left front air bag's base cup.
Worker was performing work on a gas drilling rig and was struck in the chest by a drill pipe.
Worker was operating a line 2 extrusion machine and was struck by a plug at the end of the line.
Worker was crushed by steam roller.
Worker was working alone and found by other employees having either fallen into or been inadvertently crushed by a moving part on a piece of equipment.
Worker was crushed between a forklift and a parked flatbed truck.
Worker was crushed by a cable after being pulled into a motorized capstan.


This data is interesting, as it gives us some insight into how people die at work.  But measuring topics this way allow us to also summarize the data using our newly created observed topics.  To do this, we simply summarize the probability that each death would belong to each category, by category.  This gives us a sense of the relative frequency of each type of death in our underlying data set.

Ok, falls seem like a bad thing in the workplace, and unresponsive is number two, though that seems to be mostly people dying of natural causes at work.  But what about my "day of the week question from earlier...

Weekends are low and well.. it looks like Wednesday is the most dangerous day in the workplace.. especially in warehouse environments.  But how about another look into daily skews:

Natural causes (unresponsive) tend to over-skew on Sundays, but that may be due to low death volume or industrial jobs having the day off.  Other skews exist too, Tuesday is a big fall day, Thursday is a big electrocution day and industrial accident day.  Saturday is a relatively big day to get crushed.  Actually, these are easier to read as a 1.0 index:  

And some full code for this project:

 j <- read.csv("")  
 mydata <- j  
 mydata$text <- mydata$Preliminary.Description.of.Incident  
 #clean data frame  
 try(mydata$text <- tolower(mydata$text))  
 mydata$text <- gsub("@\\w+", "", mydata$text)  
 mydata$text <- gsub("[[:punct:]]", "", mydata$text)  
 mydata$text <- gsub("http\\w+", "", mydata$text)  
 #create corpus  
 corp <-Corpus(VectorSource(mydata$text))  
 #clean corpus  
 corp <- tm_map(corp, content_transformer(tolower))  
 corp <- tm_map(corp, removeNumbers)  
 corp <- tm_map(corp, removePunctuation)  
 corp <- tm_map(corp, removeWords, stopwords("english"))  
 corp <- tm_map(corp, removeWords, c("decedent","worker"))  
 #stem words into roots  
 corp <- tm_map(corp, stemDocument, "english")  
 corp <- tm_map(corp, removeWords, c("decedent","worker"))  
 corp <- tm_map(corp, stripWhitespace)  
 matx <- DocumentTermMatrix(corp)  
 #print frequent terms  
 par(bg = "black")  
 wordcloud(corp, scale=c(5,0.5), max.words=400, random.order=FALSE,   
                rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))  
 ctm <- CTM(matx,7)  
 mydata$topics <- topics(ctm)  
 m <-mydata  
 for(i in c(1:7)){                 
      z <- subset(m,topics == i)  
      z <- Corpus(VectorSource(z$text))  
      z <- tm_map(z, removeWords, stopwords("english"))  
      z <- tm_map(z, removeWords, c("decedent","worker"))  
      #stem words into roots  
      z <- tm_map(z, stemDocument, "english")  
      z <- tm_map(z, removeWords, c("decedent","worker"))  
      pal2 <- brewer.pal(8,"Dark2")  
      png(paste(i,".png",sep = ""), width=8, height=8, units="in", res=300)  
      par(bg = "black")  
      wordcloud(z,scale=c(5,0.5), max.words=400, random.order=FALSE,   
                     rot.per=0.35, colors=pal2)  


  1. Your effort is very good and I appreciate you and hopping for some more useful posts. Thank you for sharing big information to us.

  2. to avoid the accident please consider the weight carry of your trailer, also you have to control your speed while driving the trailer.
    Thanks for sharing!

  3. Very nice post. I like your blog. Thanks for sharing.

    non woven bag machine

  4. Wonderful work!

    Here is a social network approach.