Friday, March 27, 2015

Publicly Available Data and Helping Government

First off, I decided to make an honest blog of this and buy the domain, so we'll see how this goes.  A few of my posts were getting enough traffic volume that I thought it made sense to have my own domain.

That's not the purpose of this post though.  Living in Kansas, and given my prior work in "data mining" for the State means that I have knowledge, and a bit of interest in State government issues.  Most recently, the issue of education funding has come to the forefront.

Go back 10 years (am I really that old?) and I'm just out of graduate school and working on a "Cost Study" of education costs in Kansas.  Questions asked where "Does spending improve outcomes?" "How much should the State fund education?"

On this project I did my first serious "professional" statistical work, a regression predicting teacher salaries, that created a suggested teacher pay matrix.  Pretty cool stuff.  Here's a picture of regression results:





Of note (in red) are two variables that no one on the outside world ever really grabbed hold of.  "Black" and "Female."  What do these variables mean?  Black and female teachers make 2% and 5% less than their counterparts, respectively.  But most teachers are on payscales set by districts, wouldn't that suggest equality?

I believe this is actually measuring more Male, White teachers get hired at higher paying districts, but, I don't have the data to prove that anymore, and can't remember proving that out at the time.  I think whenever we look at old work we think of much differently we would do things now.

Fast forward to this week.  On twitter I see a local news article pop up regarding school consolidation impacts on small communities.  In response, people are saying that small towns that lose their schools, lose out economically and go into rapid decline.  I asked if there were any good statistical studies on that, to which I got a lot of emotional response but not a lot of logic.

The a priori theory of closing schools negatively impacting a community makes perfect sense to me.  However, I also realize the communities already in decline are more likely to lose their schools, so it would be very difficult to establish cause, or determine the impacts via simple observation.  We need a study to accurately account for the impact of closing schools, and make better government decisions.

I started thinking, the wild speculation on these types of issues (school closings) as well as the speculation on all sides of other government issues (including school funding) likely does no more than to muddle issues, confuse voters, and create more problems.  What if we had the publicly available data to create our own statistically valid "cost study" of schools.. or social security... or even healthcare...

I started to get irritated about the lack of publicly available data... until I turned to google and found the KSDE site located here.  Basically, a large data warehouse of education data.. that could be analyzed.. and likely a statistically valid cost study completed.

I'm not saying that I would do it, but I could.  The data is in a series of Excel spreadsheets, needs a lot of cleaning, and quite a bit of understanding.  In short, it's a lot of work.  But we could do it.

My final thoughts are this:  A lot of public government data exists, and (as my regression from above shows) modelling can be very beneficial to understanding government spending.  I know analysts, especially young analysts that would love to get their names on a project, hey we could even have a Kaggle for analytics.

Why doesn't the analytics community look at some of these public problems, possibly pro bono?  

No comments:

Post a Comment