Wednesday, March 30, 2016

Economic Issues Part 1: Minimum Wage and Disability

Economic issues are often difficult to understand, especially in an election season where everyone seems to be... well.. lying or at least misleading us about what numbers mean.  There have been a few times recently that someone has said something blatantly false... that I should have known was false... but still had to google and run the numbers to verify.

This is post is the first is in the vein of my prior post on tax policy, and throws out a couple of economic misconceptions with some real data.


I am usually the first person to say that the world is a complex place, drawing linear causal lines is difficult, and you can't assume that relationships are perfect and deterministic.  That's why I should have realized that Social Security Disability Applications are not a deterministic process.

The process/system seems simple enough:

  • an event occurs (let's say you're paralyzed)
  • you apply for benefits
  • you wait on the government to decide if you're eligible
  • you get/don't get benefits.

Simple as it seems, and in a box, one would assume that the driving factor would be the prevalence of initial "events."  Because the process of people becoming disabled seems fairly stochastic, one would assume that it would occur at a fairly steady rate over time.

Except that people respond rationally to external incentives.

For another project I was looking at disability applications by year and noticed a spike around the time of the last recession.  Realizing and hypothesizing at once, I charted the data back to the 1960's and found that the annual change in disability applications correlate fairly well with the change in the annual unemployment rate (r = 0.47).  Here's what that looks like:

I looked at the literature on this subject (there isn't a ton) and there are two general theories (respond to this post if you have other additional theories):

  • People have greater incentives to apply for benefits if external job prospects are poor, and some borderline cases apply in these time periods.
  • More people fraudulently apply and are turned down during recession years out of economic desperation.

The second point is interesting, as the approval rate also tends to decline during recessions, however it does not account for all the variance, and more people are in-net approved for disability during difficult economic times.  Here's granted applications change correlated to change in unemployment rate (r = 0.5).

These numbers shouldn't be surprising: people respond to rational incentives.  However it does point to something else sounding somewhat sinister: a good number of the people receiving social security disability benefits may be able to work and would, under different economic situations.


A major misconception I have seen in recent days goes along this line:

If the minimum wage kept pace with inflation from the beginning, it would already be over ___ (e.g. $15, $22). 

This has been disproven a few times, and I can verify it here again with a simple chart placing the historic minimum wage into 2015 dollars.

The high point is in the 1960's when minimum wage was an adjusted $10.90 for a single year. The original program was only $4.20 in today's inflation adjustment.  The average inflation adjusted minimum wage from program inception until 2000 was only $7.65, actually fairly close to today's $7.25.  The argument doesn't seem to hold up.

But why do so many people have this misconception?  It's because of a different talking point for some economists and politicians: that the minimum wage hasn't kept up to worker productivity since about 1973.  If we pegged minimum wage to a common measure of worker productivity it would be something like $22 an hour.

But economists disagree on whether bench-marking to productivity is the correct thing to do. In short part of the argument is that minimum wage is just a bind at the low-end, whereas much of the productivity gains have been seen in or as a result of high-tech, developing firms with already high wages.  Thus manipulating the minimum wage is likely not the best way to target rewarding increased worker productivity.


It's clear that many of our common economic understandings are false.  The problem becomes larger in election years when people are openly arguing with misleading and false notions.  It's clear from the above data though:

  • Disability claims are not a stochastic process, and seem to be incented by the external economy.
  • The minimum wage would not be $15 today if tied to inflation.  If tied to productivity, minimum wage would be $22, but that's a disputable benchmark.

Monday, March 28, 2016

The Transgender Bounty and Perverse Incentives

I'm generally not very interested in gender studies, LGBT, or bathroom usage patterns on this blog (except for that one time).  But the Kansas Legislature (as sometimes happens) has forced my hand to draw opinion by introducing an element I am interested in: perverse incentives.  (Click here to learn more about this economic term.)


Last week, a couple of new bills were introduced into the Kansas Legislature with the aim of making students use the bathroom of their "birth gender."  This is somewhat euphemistic, but the aim is to keep transgender students from using the restroom of their choice/new identity.

The way it implements the policy seems to be two-fold:
  • To make all restrooms at public schools and colleges single (birth assigned) gender.
  • To allow students who encounter "opposite gender" students in the bathroom to sue their school for $2500.
I understand the aim of this legislation for social conservatives point of view, new transgender issues seem to be an attack on tradition understanding of genders.  But I also understand bad incentive structures when I see them.


Always looking to make a buck (and as someone who has spent 8 years of his career designing fraud detection algorithms), my first reaction to this legislation was a $$$ making idea:
I would collude with my female friend, I would go into the women's restroom, she would "encounter" (read: see me) me in her restroom, and then sue the school.  We split the $2500.  Actually the bill appears to give each person right to sue, so if there were 10 female friends in the bathroom we could split $25,000 among us.
I mentioned this scenario offhand to my wife who laughed and called me a weirdo for thinking that way, but we thought not too much of it.  Then I softly brought it up in an online conversation a few days later.

Then I saw a KU Law professor making the same argument a few days later (always good to see someone else thinking in the weird ways I do).

The point here is that the initial law as presented to the public creates a perverse incentive structure, that allows students to make money by encountering a scenario which they could fairly easily orchestrate themselves.  Some general points on this:
  • Actual Incentive is Penalty: The obvious intent of the $2500 fine is to incent schools to create rules and penalties for students that prevent the behavior.
  • Limited Local Penalty: The problem is that students are empowered with financial incentive and the schools would likely be limited in the types of penalties they could levy against the "offender" (detention, suspension, expulsion) to deter cross-bathroom use.
  • High Fraud Incentive: $2500 quite a bit of money for a high school or college student, so this incentive is rather proportionally high (e.g. 8+ weeks at 40 hours and minimum wage).  Thus the penalty to stop this behavior would also have to be high (expulsion?).
  • Politically Impossible in Some Areas: Because the "offender" penalties would have to be set at a local level, the ability to set such penalties would also vary by locale.  For instance, setting a penalty for this at Lakeside High School in rural Downs Kansas would be a much different task than doing it at the University of Kansas.  It may be entirely politically unfeasible to set an expulsion or suspension penalty for transgender bathroom use at KU (or even Lawrence School District).
  • Exiter risk: One of the biggest risks in any financial fraud is what I term "exiter risk."  That is the risk that occurs by people leaving market such that future penalties no longer matter.  A good example is in the consumer credit space: someone racking up additional credit card debt before defaulting/bankruptcy because they aren't going to pay bills anyways.  The same risk exists in this situation: soon-to-be dropouts or transfers have no reason to fear penalties from school, and likely more incentive to commit fraud.


The incentives created by this bill have the potential to create fairly large problems, some of them financial, in relation to the current magnitude of the "problem" they are trying to solve.  This is a quickly changing societal issue, and this bill seems like a too-quick, financial penalty-based response to a social issue we are all trying to wrap our heads around.

Wait.  I may .. like this bill.  Is this retroactive?  When I was at Kansas State University I lived in an all-male dorm, where visiting females would regularly use the male restrooms alongside men.  I had to run into at least 40 females in there. Hey K-State, you may owe me at least $100,000 if this thing passes and is retroactive!

Friday, March 25, 2016

Football School Versus Basketball School?

If you read some of my posts from late last year, you know that my grad-school alma mater football team is... well... horrible.  In fact so terrible that I spent a couple of posts calculating the probability of whether they would win a game at all.  Spoiler alert: they didn't.

Part of the problem in being a Kansas resident is that you either have allegiances to the University of Kansas (good at Basketball) or Kansas State University (good at Football).  So you're either a basketball fan or a football fan, depending on what school you went to.  I went to both, so I can like both sports, in theory.


Out of this dissonance, an argument arises: KSU usually wins at Football (45-14 score this year), KU usually wins at basketball (72-11 vs. KSU since 1984), but which team is really better.  We could look at conference records and finishes, but what if we also want to include teams from other, perhaps weaker, conferences (such as the other in-state contender, Wichita State Basketball)?

Win percentages could be interesting, but win percentages vary sports fairly significantly.  For instance in the NFL, the best team in the league regularly wins 75-80% of games, whereas in baseball the best teams are in the 60-65% win range.  We could normalize the win percentages, but the strength of schedule is still an issue.

What about end of year Top 25 polls?  I compiled data on how schools ranked in Top 25 "final" polls each of the last ten seasons.  I used the final polls because they most reflect how teams actually performed each season rather than pre-season polls which often relate to the perceived quality of a program.  The data was interesting, and showed that KU basketball has been dominant over the past decade (pollingwise), ranked at the end of the season each of the past ten years.

Lower on the chart things get more murky (see below).   A couple of issues:
  • Is being ranked 7 once (KU Football) better than being ranked an average of 15 three times?
  • For similarly ranked teams, how do we account for their quality of play in years they are not ranked in the top 25?
(SKIP IF UN-NERDY) I went to designing a quick and dirty metric... KPI if you will.  The metric estimates performance in off years, using an assumption on the number of total competitive teams in the category (# of ranked teams over 10 years, 75 for both Football, Basketball)  and then estimates the ranking for off years, using the # of off years to estimate the distribution.  

The new metric is Average Net Rank, and accounts for how a team might rank if ranks were given below the Top 25, shown in the data below:

And a graphical view because people seem to like that:

One last thing I remembered, when I was at KSU (graduated in 2003) it seemed like they were better than the polling I saw when doing this research.  What if I calculate this for a ten year period including that time at KSU.  Here's a comparison, note that KSU is the only team that was better from 1996-2005 than in the past decade:


A few takeaways from the polling data and how teams stack up:
  • KU Basketball is clearly the best team in the State over the past ten years.
  • KU Football is clearly worst.
  • KSU Football is actually worse than all three basketball teams in the State, at least in comparative polls, over the past decade.
  • KSU Football is the only major in-state team to perform worse in the last ten years than in the prior 10.

Thursday, March 24, 2016

Can Bernie Still Win? Post Idaho Utah and Arizona

Once again, I am not a Bernie Sanders supporter, but some friends talked me into looking at the data surrounding this primary, and I have found it fascinating. I spent a bit of time trying to figure out how to describe my feelings on Bernie Sanders performance Tuesday night, and "Held Serve" is the sports term that I think is most relevant.  

Bernie won two states by huge margins, but lost the biggest state (Arizona) by a significant margin.  In net, it turns out not to be a huge win, and doesn't fundamentally change the numbers for the rest of the race.


Other websites have detailed accounting of the Tuesday elections, including some craziness in Arizona, but the summary is this:  Bernie lost Arizona by more than expected, and won Idaho and Utah by much more than expected. Because Arizona is bigger than Utah and Idaho combined, Bernie performed slightly though not materially better than this blog's "Bernie Sanders Performance Improvement Plan."

After Tuesday night, Bernie sits in a similar position as he did before, no worse, and no better, really.  He picked up a few delegates over my initial posts, but not enough delegates to fundamentally change the race.  He still needs to win about 57% of remaining delegates to win.  

On the positive side for his followers, he over-performed polling in two western states, so they can likely argue that he has a chance in the rest of the west.  Here's what the delegate counts look like now, first the google view, and then our fair, sans super delegates view.


If this is the first time you've looked at this analysis, you can read the full methodology here.  Essentially, this analysis looks at the pledged delegates required to win (assuming supers will follow), and then uses a logistic function to calculate how much Bernie needs to outperform polling by in each remaining State in order to win.  

How is this helpful? In two ways really:
  • It allows us to quantify how much better than polls Bernie would have perform in order to win. (And the general plausibility of that performance, I generally think it's implausible at this point)   
  • It allows us to set intermediary targets for Bernie's improved performance, that let us know if his current performance is putting him on pace to win.  For instance, Bernie's target for Tuesday was 80 delegates, and he performed slightly better at 85.  On pace, but not good enough to change fundamentals the rest of the way. 

Here's the data, with polling and what Bernie needs to do going forward to have a shot.


I had a few open questions after all of this analysis that I wanted to address.  The first question was: can we project when Bernie will drop out?  To be honest there have been quite a few opinion and think pieces on this in the last few days, ranging from he should drop out now and get out of Clinton's way to, he should wait and see what happens at the convention (e.g. my very low probability scenario where super delegates try to take the party to the left when faced with a Trump opponent).

Right now the data shows that if Bernie doesn't think he has a reason to drop out now, then he won't drop out until after April 26th.  If he still is close after April 26th, then he probably won't drop out until the end of the primaries.  Some reasons for this:
  • Delegate Calendar: We're in a flat spot as far as the delegate calendar, there really aren't any significant races for the next month and the fundamentals underlying the delegate count really can't change until late April (see chart below).
  • Polling: Let's say Bernie is looking at polling to make a decision whether or not to drop out, the polling is pretty grim going forward.  That said, there are likely some in the Bernie camp that would still claim that polls are substantially biased against him (given results in Michigan et. al.), especially in caucuses.  As a side note, the biggest remaining contest (California) doesn't have recent, good polling data.  That would be helpful in both my analysis above, and in Bernie's decision.  Also, in national polls, Bernie appears to continue to close the gap (second chart below).

There's another thing about Bernie's campaign that is bothering me right now, and for lack of a better term, I'm calling it the Sanders Moral Hazard.  I think additional research into this area would be interesting, but here's generally how it lays out:
  • Candidates served by concentrated large donors, Super Pac's or the establishment are more beholden to the rational whims of said donors/institutions. Those donors, many from the business community, are used to pulling plugs on projects and are more dedicated to the party rather than individual candidates:  they may be more likely to pressure a candidate to drop when candidacy seems pointless.
  • Sanders (and candidates like him) have a lot of small donors and supporters, but no big donors to tell him to pull the plug on his candidacy.  Instead, they have a small, populist and emotionally motivated group of followers, that even in the face of defeat, want their leader to stay in.  There is no-one with an individualized motivation nor power to encourage Bernie to drop out.
  • The moral hazard here is this: Sanders has less incentive to drop out of the race because his actual risk (potentially spending money on a pointless campaign) is felt in such a diffuse way, rather than larger, rationally motivated supporters.
  • The irony (and potential harm) here is this: Sanders can stay in the race longer (and theoretically past the point of no-return) with smaller donors.  These small donors are the ones that *pay* for Bernie's risk, and are more likely to be poor/lower middle class voters.  In essence: The structure of Bernie's candidacy has the potential to hurt poor people.  


A few takeaway thoughts:
  • Sports Analogy: Sanders "Held Serve" on Tuesday in Idaho, Utah and Arizona.  
  • Positioning: Sanders is essentially in the same position going forward as he was prior to Tuesday, no worse, but not materially better.
  • Drop Out?: If the Sanders campaign sees no reason to drop out now, it's unlikely they will drop out in the next month.
  • Moral Hazard: Area for future research? There's a potential moral hazard in populist, poor-funded candidates having disincentives to drop-out of races at appropriate times.

Wednesday, March 16, 2016

Can Bernie Still Win: The final post?

For an update on this analysis, please see our most recent post, found here.

Our piece from last week on whether or not Bernie Sanders can still win the democratic nomination was massively popular, so we thought after last night's primaries we should update the analysis.  This post seeks to answer the question simply: Does Bernie still have a chance?  Please reference our prior post for methodology and meaning questions.  Here's a summary:
  • Bernie performed poorly last night, losing all five states.
  • More importantly, his performance was close to recent polling #'s, which he needs to beat significantly in order to win.
  • Going forward, Bernie will need to beat polling by an average of 15.2% to win the nomination. (logit(p) = 0.62)


First a look at Bernie's performance from last night. It was fairly dismal, but also on target with recent polling.  That means that he's not significantly outperforming polling numbers like our prior post found he would need to beat Hillary.  Here's last night's stats:

Now on to our friends at Google and how they are reporting the race.  We like the table they've added below the graphic!

And now our view, that shows where candidates need to get to win the nomination.  Hillary's lead is much more clear at this point.  (Please note this uses a very specific model for super delegate agency, whereas super delegates, in the end, follow the popular vote.)


We're going to use the same methodology that we used before, it's a bit technical but here's the gist of it:
We calculate the amount (using a mathematical logistic function) that Bernie needs to outperform polling by in order to win the nomination.
Once again for detailed method, look at our prior post. We have a "now unassigned" category this week, mainly for delegates from last night that haven't been assigned yet due to some complex party rules. Here's our output:

To win the nomination, Bernie now needs to capture 57.5% of outstanding delegates.  Our calculus shows that requires a logit improvement over current polling of 0.62.  So... what does that mean in not-crazy math terms:
Bernie will have to average (by-state, not weighted) beating current polling by 15.2% in order to win the nomination.
Some people will inevitably say that is doable given the Michigan results, but Michigan isn't representative of polling error in other States.  In fact, a quick look at recent results show that Hillary has out-performed Bernie about the same number of times that he outperforms her.  Side note: in Mississippi she outperformed polling by more than Bernie did in Michigan.  

Michigan is a true outlier, as a state with a lot of rigorous polling, where the pollsters ended up being quite wrong.  One last view at this, here's a new chart of how Bernie needs to improve polling by current percentage.


  • Bernie lost big last night, which put him even further behind in the delegate count.
  • It doesn't appear that he is continuing to significantly beat polling in each state.
  • He will need to beat current polling by 15.2% to win the nomination.

Monday, March 14, 2016

Kansas Education Policy: Building a Funding Formula, Pt3: Valuations

I've spent my last few posts focusing on whether Bernie Sanders still has a potential to win the Democratic nomination, but I haven't forgotten about my Kansas education project.  Time to stop neglecting that project with a new variable: assessed property valuations.


Last time we improved our regression model for a new State funding formula by adding a variable to control for poverty impacts on education.  Once again we're building slowly and hoping to help solve this issue, though it appears the legislature is moving forward with a couple of pieces of legislation on this topic.

There is quite a bit of argument around how to fix the Kansas education funding formula to comply with court orders, the main two methods by either adding money or simply redistributing funds.  It seems possible though, that all prior arguments will be somewhat moot after revenues are re-estimated in April by legislative researchers.  At that point, future revenue estimates may be revised down causing a significant refactoring of the entire budget.

This whole situation has given rise to some interesting punditry, including a  Lawrence Journal World editorial which may be rated as one of the worst risk analyses in history, saying schools PROBABLY won't open in August, and may be closed all next school year.  Seriously..  The author thinks that legislators might risk school closings during an election year, based on a single timing-based data point, in a case where schools were not shut down.  Risk analysis was not this person's forte.


This week we're looking at how higher local property values impact local education spending.  There are at least three a priori theories on how this relationship functions:
  • High assessed property values = High costs: In this scenario, districts in high assessed value regions have to pay more for services (e.g. teacher salaries) due to higher cost of living.  In essence: teachers rationally choose to simultaneously maximize salary and minimize costs, thus ceterus parabis, districts with higher property value have higher wage costs.  Other costs such as construction and goods that vary regionally could also contribute to higher costs.
  • High assessed property values = Higher spending: An additional hypothesis here is that "rich" districts can more-easily raise capital by raising a few mills, and thus have an easier time spending more money. Effectively they can spend more money more freely, and may spend in an attempt to get better results out of students.  Economists may also contend, that being less capital constrained, they may be more likely to spend inefficiently.
  • High assessed property values = Endogeneity: This is a fairly complex statistical concept. In this case, causation runs from the independent variables to the dependent variables, but also, causation can run in the opposite direction.  A good example of this is the jobs market (also good for some arguments occurring in the Kansas Legislature right now).  If we were to model job growth in a region, we would likely want to include population growth as a predictor as population growth can lead to more jobs through more regional spending and cheaper labor. We also know that strong job growth can improve cause inbound population migration, meaning causation runs both ways, and an individual coefficient can be inaccurate.  There are a few statistical techniques to deal with this (most commonly instrument variables).  The point is though, it can muddle our statistical statement that x leads to y.  In the instance of property values, endogeneity would work this way: valuations can lead to higher spending generally, as well as more spending on teacher salaries.  But we also know, that if more is spent on schools (in the right ways) we could create better schools, which in turn can increase property values. 

The way that we introduce this variable into our equation assessed valuation per FTE, because our dependent variable is already on a per FTE basis.  Here's what it looks like in a regression.

The regression shows a positive statistically significant relationship, as valuations increase so does spending.  The coefficient demonstrates an elasticity of 0.08, or that a 1% increase in valuations leads to a 0.08% increase in spending.  That might not seem like a lot, but there's a lot of variation in valuation per FTE, which can lead to wide variations (here's a view of variation, by FTE per district):


We just throw this into our new funding formula and roll with it... right?  Wrong.  Remember our three functional theories as to why higher assessed valuation leads to higher education spending from above.  These theories matter because in some cases they indicate districts that require higher funding and in other cases they do not:
  • If a district has ceterus parabis higher costs due to cost of living issues we absolutely want to give higher funding, so that they can provide comparable and equitable education.
  • If a district is simply choosing to spend more, due to ease of raising capital, we probably don't want to give higher funding, else risking equitable funding.
  • If we're just measuring an endogenous relationship, we absolutely don't want to use this as a basis of funding, because statistical issues should never become a basis for varied funding.
What we need to do now is figure out a way to fund district on costs and not efficiency.  Over the development of our function, we will do this in two ways:

  • To parse out what are truly higher costs, we will bring in cost of living/cost of comparable teacher variables to measure the local cost and teacher-labor market variation.
  • To deal with the endogeneity issue, we'll use an instrumental variable regression methodology and test whether the endogeneity is having a significant impact.


In our work on the Kansas education funding we have covered these basic attributes, and believe that any functioning funding formula will need a way to control for these attributes:
  • Economies of scale: The formula will need to account for the lack of economies of scale in running rural, low enrollment school districts.  We have established a fairly good picture of what that cost curve looks like.
  • Poverty: Education literature demonstrates that poverty negatively impacts education outcomes, and we demonstrated that schools with kids in poverty in Kansas perform poorer.  Currently districts with higher poverty rates are also spending more, so any future funding formula likely needs to account for varying levels of poverty in schools.
  • Local Property Values: This variable correlates positively with prior spending, but we need to sort out what is higher cost, and what is potentially spending due to easier access to capital.

In addition to this, I started keeping a list of elements I need to address in the future (contact me if you would like anything added to the list):
  • Transportation Funding (geographical size modifier)
  • Performance Measures
  • Special Education
  • Teacher Salaries
  • Consolidation Equilibrium
  • Avoiding the Ipso Facto: making sure our regression equation doesn't mirror old funding formula

Wednesday, March 9, 2016

Can Bernie Still Win: The Bernie Sanders Performance Improvement Plan

For an update on this analysis, please see our most recent post, found here.

Another morning and another celebration from Bernie Sanders supporters on Facebook.  This time it seems fairly valid: Bernie won the Michigan primary, where he was trailing by 20% in the polls. A big win for Bernie, demonstrating the polls that show him down by 10% or more may be biased, and a big failure for public polling.  Also a win for snark against the media (whom many Sanders supporters consider biased towards Hillary).  Here's my favorite piece from Facebook: 

There's just one problem, and it's the same problem as Saturday. Bernie won Michigan by a slight margin, but lost big Mississippi (under-performing polls by 20%, hey, at least pollsters got it right on average.. there was a lot more polling in Michigan though..).  In net, Bernie still took a double-digit loss to Hillary, in the range of about 20 delegates.  Let's dig into the numbers a bit though.


I've shown my *fair view* (without super delegates) in the my prior posts (found here and here), so I won't spend too much time on them today.  Here's a view of Bernie winning the big state, yet losing the daily delegate count: 

And here's a view of the fair delegate count, once again showing Hillary expanding her lead, but with the majority of delegates needed for nomination still outstanding:

 Let's take stock of what we know (references findings of two prior posts):

  • Bernie is behind by a fairly significant margin and now needs to win (math) 54% of remaining delegates to win the nomination.
  • Bernie is behind in polling in aggregate and in most individual remaining states, so it seems unlikely if results follow polling that he will  catch up.
  • Bernie just massively outperformed polling in Michigan.  This could be due to a variety of issues, most likely that Bernie supporters are young, and young people are notoriously hard to accurately poll.  It may be indicative of an underlying bias against Bernie in polling for future states.
  • Since mid-summer, Bernie has been gaining polling share and continued to do so in January and February.
I put all this information together and realized the question:  Is there still a path to victory for Bernie?  Then I went to developing


(If you aren't a real nerd, you may want to just skip this)

From my past analysis I knew that Bernie needs 54% of remaining delegates to win; which means he also needs to outperform his current polling in the majority of states.  I put together a model to project the margin by which Bernie needs to beat polling in each state.  This method will also allow us to set to set targets along the way, and adjust future needed values as Bernie over and under performs to target.  

I analyzed Bernie's current polling by State, using RCP polling averages, but more heavily weighting recent values.  In states where polling wasn't available, I used polling in demographically and geographically similar states.
We know that Bernie has to beat polling, and if it was easy as figuring out the % he has to beat polling by in each state, (e.g. +12% in each state) this would all be quite simple algebra.  The problem here is that Bernie's potential to outperform varies by State. For instance, it's not reasonable to think that Bernie would pickup the same % in a state where he's currently only getting 20% of the vote as he would in a state where he's getting 45%.  

A sigmoid-type function fits both prior data, and makes a priori sense (less chance for variance at ends of the distribution, more in the middle).  I used a logistic function to calculate percent increase, holding each State to the same logit change over initial polling results,  Then I calculated the required  aggregate logit change to put Bernie ahead in the delegate count nationwide (value currently logit(p) =  0.51).  

Here's what that logit improvement correlates to in actual numbers (e.g. if he's currently polling at 45%, he needs to perform 58% in that State to be on track).


Back to non-nerd land, we calculated what Bernie needs to do mathematically to win the nomination.  If you think that polling is completely broken after Bernie's recent results you can call this his OBVIOUS PATH TO VICTORY.  If you think Bernie still has some work to do, you can call this the BERNIE SANDERS PERFORMANCE IMPROVEMENT PLAN.

A few notes on these numbers:  

  1. To "win" the overall model only requires Bernie to out-perform polling at half the rate he did in Michigan.  This may seem easy after the experience of Tuesday, but keep in mind: Michigan polling may just have been freakishly bad.
  2. The column "Post Change %" is the proportion of the popular vote Bernie needs in each State.
  3. I will continually update these numbers until the race is "over."
  4. We can create intermediary targets using these numbers, by summing earlier periods, such as "Bernie needs to get 343 total delegates on March 15th to remain on target."
  5. We can also set targets for individual races, such as "Bernie should win Ohio with 52.6 % of the vote to remain on target."


Some takeaway points:
  • Bernie's win was huge in Michigan, mostly because how huge the shift was against prior Michigan polling.
  • Polling may or may not be broken.  Obviously the polls were incorrect in terms of final voter behavior in Michigan, but we don't know how accurate they will be in other States.
  • Given that polling going forward may have issues,  we created a path to victory for Bernie, we will refine the model as more information comes in on the nature of polling bias and Bernie's per-state results.

Monday, March 7, 2016

Can Bernie Sanders Still Win? Part 2: Post Super Saturday

For an update on this analysis, please see our most recent post, found here.

After the weekend and our post on Friday, a lot of people pointed out that Bernie Sanders won big on (what CNN was calling) Super Saturday, so it appears he's moving in the right direction towards my March 15th drop-dead date!

I certainly could see how Bernie supporters would be excited about beating Hillary 2-1 in states on Super Saturday. Except for one fact:  Bernie still lost Super Saturday.  It was quickly clear that Bernie supporters weren't looking at the big picture, the final delegate count for the night.


Three primaries were held on Saturday, Kansas, Nebraska, and Louisiana.  Kansas and Nebraska are demographically similar Midwestern states with Bernie-favoring caucuses, whereas Louisiana was the outlier Southern primary State (with almost as many delegates as Kansas + Nebraska).

Here's a summary of what happened with the delegate count.  Notice that though Bernie saw small wins in the Midwestern States, he lost Louisiana by a huge margin, and thus lost the day.

Lucky for Bernie, there was another primary (this time in Maine, a Bernie-friendly New England State) where he won fairly easily.  Here's what the entire weekend looked like, with Bernie bringing home 51% of total delegates for the weekend (67-64).


That 51% victory sounds good for Bernie, but is that an adequate margin of victory?  First, let's look at how the press is reporting current aggregate primary election delegates: google is still showing super-delegates:

Super delegates, as we discussed before, may or may not actually vote for who they are currently supporting.  I recreated our "fair" view into the current state of the race.  I made a slight change from last time, and backed the super delegates out of the "to win" number, making the basic assumption that super delegates will, as they did in 2008, follow pledged delegate counts.

From this view, we can easily determine what Bernie needs to do from here to win, quick calculation: 2,899 pledged delegates left on the table, Bernie needs 1,550 to win.  Bernie needs to win 53.5% of remaining delegates to win the pledged delegate counts.

In essence, Bernie supporters may be happy about his performance on Super Saturday, but he needs to do quite a bit better than that to close the gap on Clinton.

A couple of quick notes on the current tone:
  • If Clinton can rack up big victories in a couple of states (Illinois, Michigan) this thing could be much closer to over very quickly.
  • The sentiment related to Bernie's "ghetto" comments from Sunday night's debate have been hugely negative (as well as shushing Clinton, which some perceived as misogynistic).  Those could have a negative impact with African American and women voters precisely in the two states he needs their support: Illinois and Michigan.


A few takeaways:
  • Bernie lost Super Saturday, despite winning two states, he is still behind in the delegate count.
  • For the weekend, Bernie won the delegate count, but only by 1%.
  • Looking forward, Bernie needs to win 53.5% of the remaining delegates-exceeding his performance over the weekend.

Friday, March 4, 2016

Can Bernie Still Win

For an update on this analysis, please see our most recent post, found here.

Though not a Bernie Sanders supporter, I seem to have a lot of friends and acquaintances who are. As the primary moves on, I've noticed the Bernie fans becoming increasingly disgruntled at the primary process, the democratic party establishment, Debbie Wasserman-Schultz, and generally the mainstream media.  This led to a telling Facebook message from an old college friend, with this general question:
The mainstream media seems to be writing off Bernie, but he's still in the race, so he still has a chance?  It also seems like the media is over-stating Hillary's lead by counting super-delegates that could change their votes, is that true?


The root of my friend's question revolves around a weird quirk in the way the Democratic party nominates candidates:  super-delegates.The definition of super-delegates are effectively this:
An unelected delegate who is free to support any candidate for the presidential nomination at the party's national convention.

Super-delegates have been a huge source of fear in the last couple of election cycles largely due to the uncertainty they create.  There was a theory in the 2008 election cycle that super delegates would nullify the people's will of nominating Barack Obama, and stick to party-favorite Hillary Clinton.  That obviously didn't happen, and when states started to swing to Obama, a good number of super delegates realigned their votes as well.  (Side note: 2383 delegates are needed to earn the nomination, there are 714 super delegates.)

The stats perspective is more interesting though: we have an outstanding number of delegates that will impact the nomination, and don't have a good way to estimate their allocation them because they don't follow state vote counts. We could ask them, but because they are humans, they tend to change their minds.

Here's where my college friend is right: the current vote counts on many websites are fairly misleading, because they are looking at current endorsements of super delegates, which (as we saw in 2008, somewhat) can change over time.  Also, they are excluding a number of "uncommitted" super delegates who may be more likely to be Bernie supporters waiting for him to show some progress (Clinton got a boost early by looking like the favorite all along).

Here's how google is currently reporting things:

And here's my more honest view of things, with pledged delegates only, as well as a top-line for how many delegates needed to win the nomination.

The point here, the initial views put out by the media are misleading, and there are still a lot of outstanding delegates out there.


Since this is still a competitive race, how can we evaluate Bernie's chance to win remaining states?  Let's start with some good news for Bernie fans, Clinton's polling lead (nationwide polls only) has been declining fairly steadily since the middle of last year, shown here:

That means that Sanders is picking up ground in the polls and velocity is with him, but there's still some bad news in the polls: he still trails Clinton by an average of 10% nationwide.

More bad news for Sanders is that the next two weeks of primaries don't look very promising for him.  Though state polling is fairly irregular, he doesn't have a polling lead in any of the next eleven states. Some of the States show huge Clinton leads, so it's relatively unlikely he will turn the overall delegate count around soon.  His performance will be interesting in Illinois and Michigan, as they may be telling in how he will perform in the rest of the Midwest and West (the former confederacy is clearly Clinton territory, Bernie performs better in his home-area, the northeast).  Here's a view into the next two weeks; I'm not willing to go beyond this at this point, because of recent movement in national polls.


We've established that Bernie is losing the election, though not by as much as the mainstream media have been reporting.  Also, we've seen that though he's gaining in polls, he probably won't make much delegate progress in the next round of primaries.  Is there still a path to victory for Bernie?  Maybe.  I can see two scenarios:

  • The Obama-Trending Scenario:  In this scenario two things have to happen.  First, Bernie has to continue eating into Clinton's polling lead, and overtake her, probably needing to lead polls regularly by March 15th. This is possible, but not likely.  Second, Bernie needs to get support of some super delegates who might be willing to change their vote.  For this scenario, the rational model of super-delegate is simply wanting to go along with party preference.  Thus by winning some of the later states AND putting together a coalition of super delegates (much like Obama did), he could possibly win.  (Probability: Probably less than 10%)
  • The Progressive-Trump Scenario: This scenario is based on a different rationality model of the super delegate, in this case a progressive rationalist.  Let's say, that as democratic party insiders the motivation is for super delegates is to get the most liberal person elected.  The common sense answer throughout the election has been that Clinton has a better shot at winning a general election than Bernie (yes, I've seen those other polls that say Bernie has a better chance, but head-to-head polls are junk for multiple reasons, message me if you want to discuss it).  In this case, Bernie keeps it close until the end, and doesn't drop out of the process. Then the Republicans (who's convention is first) nominate Donald Trump, who, consensus generally indicates, has no shot of beating either Hillary or Bernie.  Those rational super-delegates seeking the most liberal candidate now have less reason to choose Hillary over Bernie.  This scenario is a long shot (probability less than 3%) but does give you an idea of how rational Democrats may react to an increasingly likely Trump nomination.


What are the takeaways from all this?
  • My friend was correct that the mainstream media are currently over-estimating the Clinton lead, especially in light of what occurred in 2008.
  • Though Bernie has been gaining in the polls, he still has a lot of ground to makeup-and likely won't show any meaningful electoral progress over the next two weeks.  
  • There are two potential paths to Bernie winning, but he will need to quickly take a polling lead over Clinton and potentially get some help from Donald Trump to win.  He still is somewhat unlikely to win (15% at the high end).

Wednesday, March 2, 2016

Career Upsides, and Salary Growth by Entry Level Salary

Earlier today, while observing yet another fight on Twitter about Kansas employment numbers, I came in contact with an interesting data set.  The data, found here, contains employment numbers and salaries for various careers.  

One facet of the Twitter argument was the growth rate throughout careers, and whether entry level or median salaries were more relevant for low-wage employees.  I was initially very interested in the data, not because of the Twitter fight, but because of something else I've observed.

The observation: there is quite a bit of variation in career path and salary once someone starts a profession. Some professions (bank tellers, for instance) seem to top out their income only a few years into their career, whereas others (analysts, back-end finance managers) tend to see steady income growth throughout. 

Curious, I noticed the data had entry level salary, experienced salary, mean, median, etc.  I messed around with the data, and calculated the jobs with the best and worse career "upsides" as defined by ratio of experienced to entry level employee (with a minimum $50,000 differential to screen out some junk).  Here's a beautifully colored list of the best career upsides:

Why did I color it like this?  Because I noticed some trends, three categories of careers in these groupings:
  • GREEN-Highly educated professionals who gain skills/abilities (also, tenure for post-secondary teachers) as they move in their career.
  • PURPLE-Management type employees that can improve their lot as they move from low-level manager to middle management to director levels.
  • RED-Sales and other client facing roles that can build income by growing a book of business.

BTW, anyone else think it's absurd that they actually state a value for "Entry Level CEO?"

The list of low-income growth careers is much less interesting.  It's all service industry, many careers that are populated by high-school students.  These careers are also far different than our three groups above, in that expertise grows little over the career, there's no management (unless you transition), and there's little opportunity to grow a book of business. The best opportunity in these fields is to move to management, or a more skilled version of their current job.

One last thing I did, in relation to the initial Twitter fight, was to answer the question, how does wage growth throughout career vary by initial entry level wages.  Here's a chart (ratio of experienced:entry level salary, by entry level salary):

The answer isn't as clear as one might think, but two points can be made:
  • In some careers, people start with very low wages, but can increase those wages significantly with more experience.
  • There is a trend though, where the highest entry-wage careers also have the most potential salary growth.
Quick methodology note: These numbers represent point in time estimates of experienced versus entry level employees, and don't represent longitudinal data. Also, people tend to transition between job types, which can bias the numbers, for instance when someone moves from a high level manager to a chief executive (this analysis assumes you stay in a similar role).