Tuesday, March 31, 2015

Weird Incentives: Real Estate

An important part of my job is understanding customer incentives, how product and pricing schemes change agent behavior.  I get a lot of questions around "what happens if we change a marketing incentive? or will our customer mix change if we reduce pricing? or if we change the sales rep commission plan in a certain way, will that actually grow the business/profitablity?"

These types of questions are pretty important, and impact our everyday lives.  One of the better examples comes from the authors of Freakonomics, on asymmetric incentives in real estate, here's a great video that explains the asymmetry:

I experienced the asymmetric incentive issue with real estate commissions in my personal life last year.  We sold our house and bought a new house closer to our workplaces. Here's what I learned:  The example in the above video is only part of the story, the notion of asymmetric incentives impacted nearly every facet of the transaction. 

Here's a summary of some interesting aspects:

Things that happened when selling our old house

We were encouraged by our Realtor to spend over $10,000 on upgrades to our house in order to help the house sell quickly, before the Realtor even put it on the market.  Keep in mind that real estate commissions are calculated as

Commission = Sale Price X 6%


Commission = (Sale Price - Owner incurred sales cost) X 6%

That means the realtor made money directly from our incurred cost, if it did increase the price at sale, for which we were not reimbursed.  We only profited if the sales price increased as much as our incurred cost, but the Realtor benefited from my expenditure if the price increased at all.

Negotiations: As soon as we had an offer on the house we got the line (very similar to the lines from the video).. essentially as this:

Don't do anything to "risk the deal"
This is probably as good as it's going to get

I designed a counter offer, which our real estate agent didn't like, using strategy I learned from dealing with customer behavior.

Essentially: the buyer's offer involved a huge amount seller paid closing costs (read: they couldn't afford to close the loan on their own) and only a 5% down payment (read: we don't have any cash, but LOVE TO FINANCE THINGS).  Realizing that I was dealing with cash sensitive, credit insensitive people I designed a counter offer that involved increasing the price of the house a lot, but hey, they could finance that.

Our real estate agent tried to talk me out of it, in terms of I was "putting the deal at risk."  I refused his advice, and the buyers accepted my counter-offer as is.  The Realtor's incentive put my negotiating position at risk.

Next in the inspection stage of things, the buyers asked for a long list of "fixes" they wanted done to the house.  Most of them were cosmetic, or not warranted, but I felt a little bullied by our agent to "just do them" .. once again, not to "risk" the sale.  A lot of these items could have been fixed with a tube of caulking, which I did, but wasn't the kind of stress I needed during that time period.  And they really would have cancelled the sale over caulking?  Yet another example of inconvenience to protect the sale.

Things when buying our new house

Just one thing here, really.  Our Realtors scheduled closings so that we would close early on Monday for our old house, and close in the afternoon on Monday for our new house.  We were moving the first weekend in June, so we couldn't get a big truck, and would need to move part of our stuff in shifts.

We were moving to an empty house, so I asked the Realtor if we could get permission to move some of our stuff to the new garage the day before, to save us the rental space.

I was categorically denied.  Basically, I was told that we could scare the sellers at this point, and they would call the whole thing off.  Really?  They would walk away from the money, just because I made a reasonable request?

All of these things added up to an experience that backed up the Freakonomics claim, but added to it, that the asymmetric incentives can permeate the entire transaction.  Once a deal starts the Realtor's incentive is to close the deal, no matter what, without any risk to that deal.  What suffers is the buyers and sellers convenience and negotiating position.

Thursday, March 19, 2015

Political Predictions, Context Matters

Starting about a year ago (March 2014) I started hearing predictions about politics in my home state that ran counter to everything I knew about Kansas politics.  Pundits, journalists, and pop statisticians (read: Nate Silver) had started predicting that Sam Brownback would lose the Kansas governor's race.
Brownback is very conservative and so is Kansas, generally.  He won the 2010 election in a 63% to 32% landslide that was never really any kind of race.  So why would he be so vulnerable only four years later?

The answers lay deeply in Kansas politics of the last four years, and you can read more on it at many different sources.

The short answer is this: Brownback cut taxes which caused a budget crisis, then tried to cut education, didn't give employees a raise, cut art spending, etc.  There were additional criticisms largely because Brownback isn't progressive on women's issues (abortion), was against gay marriage beyond the point that it was socially palatable, and didn't support unions.  Here's the Daily Show take on it: http://thedailyshow.cc.com/videos/6sn82w/sam-brownback-s-conservative-kansas-experiment

Back to statistics.  The idea that Brownback would lose this election was still ludicrous to me, even with the craziness of the past four years. Here's how I saw each side:

Brownback Loss Factors:

  • Polling data was mixed, but generally showed a small Paul Davis lead.
  • The narrative in the mainstream media was generally that Brownback had really screwed over Kansas, and that people would be stupid to vote for him.

Brownback Win Factors:

  • Prior election results (30+% victory).
  • More motivated conservative base, and not a strong progressive candidate running against.
  • Mid-Term election with a (African American) democrat in the Whitehouse.

As an analyst it's tempting to rely only on polling data (or on modeling polling data) and projecting this, because that seems the most quantitative option.  But I didn't have a good feeling about those polling results.  And that gets us to the point of this blog entry: Context Matters.  

The day before the election Nate Silver was still predicting a Davis win, but I was not.  Here are our projections:  Silver: Davis has an 82% chance of winning.  Me: Sam Brownback wins by 2-5%.

The result of the election was a 3.5% Brownback victory.

So why were outside analysts so wrong about the Kansas election?  I see a few reasons, generally related to the model lack of context:

  • Models didn't account for level of voter motivation. (all below reasons feed this)
  • In mid-terms, the out-of-power party (president not in whitehouse) has higher motivation (more pissed off).  
  • Kansas has a strong conservative skew, consisting of many people with a visceral hatred of Obama, making them especially motivated.
  • Kansas conservatives vote for what they believe to be meta-right/wrong issues such as abortion and gay marriage.  They generally believe that if they don't vote correctly on these issues, they burn in hell (quite motivating, if you believe in that sort of thing).
  • Going into election day, there was a "shame" in saying you supported Brownback.  I assumed many Brownback voters would deny their allegiance, thus making polling inaccurate. 
Prior to the election I developed a model that looked at broader context, history, and accounted for this being a midterm (also looked at Presidential Polling data).  My model was similar to Silver's, in being logistic regression, but added value of the context of Kansas politics in a mid-term election. The added context variables, however, were important in allowing me to make a more accurate prediction of the outcome of the election.

Point of this all:  if you want to make accurate predictions, make sure you properly account for voter motivation, and more broadly, contextual factors.

Wednesday, March 11, 2015

Fitness Tracker Data

I average 20,000 steps a day.

I'm sitting here in a meeting at 3:20 pm and I only have 3,500 steps for the day.  Today has been meeting hell, and this is my fourth meeting of the day.  No walking around.  This is driving me nuts.

Yes, I've joined the fitness tracker movement. I wouldn't normally post this on here, as it's not related to data science.  Except, because of who I am, I'm making it related to data science.

I've noticed a few trends relationships in my activity data:

  1. I am least active on the days following a long run or other hard workout, and I had really never realized how significant that is before.
  2. I am sometimes EXTREMELY inactive in the office.  Especially on days when I have a lot of meetings.

These observations made me think, could I use data to predict my activity level on a daily or even hourly basis?

So, I did what I do with these things.  I've setup a MySQL database to track my fitness tracker data over time, along with other factors (meeting data, weight, workout plans, etc) to try to do time-series analysis on my activity level.

I'm hoping to come up with some decent, informative models, that I can use to think about my activity level. Additionally, leave a comment on this blog there's any additional factors or interesting components I should look at in my data.  I'll post my models on here from time to time.

This brings me to a product idea.  If I can successfully create models to predict periods of inactivity, can I create an alert based on that prediction, to nag myself into activity during that time?  I don't know if it could work for the masses, but at least I could try to guilt my self into action.

And, for all of you device nerds out there, I'm currently using Google Fit on my Nexus phone, but I'm looking at getting a Garmin tracker.

Stay tuned for more on this project...

Friday, March 6, 2015

On Weird Metrics

In my job, I spend a lot of time telling people they're looking at the wrong metrics.

Yesterday I came upon an article that many people have read this week, reacting with varying levels of shock, outrage and apathy.  The article was Justin Wolfer's New York Times story on the "glass ceiling" which pointed out mainly that more men named John run companies than women in total run companies. (Found here)  There's a fun death penalty analysis coming, but first I'll be nerdy about metrics...

My first reaction to this story was... what a bizarre freaking metric.  I understand what they are trying to do here, and it's basically a rhetorical ploy like this:

X is so much bigger than Y, that even G (a subset of X) is four times the size of Y.  
This device is used quite a bit, the most common use I can think of is when we talk about California having a bigger economy than most countries.  Basically, the US economy is so large, that even California has more going on than most other countries.  But why is this more powerful than just saying the US is the worlds largest economy and is X% larger than its nearest competitor?  It's not.  But people feel like it is a useful rhetorical device to invoke emotion and scale.

The root problem with this strategy is that it creates a compound metric where you're measuring both the size of the comparison group (in this case women running large companies) and the relative frequency of the subgroup.  The relative frequency of the subgroup will vary over time, space, and other dimensions that correlate, further perverting the metric.  The direct and meaningful metric here is simply the ratio of men to women running large companies.  

I don't miss the point that Wolfers is trying to make with his article, which is that fewer women run large companies.  But I wondered if people on the other side of the argument (Men's rights activists versus Women's rights activists) applied the same strategy, would they see similar results.

Men's rights activists often argue things like men live shorter lives and are more likely to be imprisoned or executed.  So let's look at execution numbers, because they're easily available.  I downloaded the Death Penalty Database from deathpenaltyinfo.org, so my analysis is as accurate as the data they compiled.  Then I just categorized names, nothing fancy, no name rooting or stemming.
From Wolfer's article, the ratio of men named John to Women running large companies is 1.29. (5.3% versus 4.1%)

Here are some ratios from my analysis:

As you can see from the analysis, the ratios are much higher than in Wolfer's article, so do I conclude that recent execution gender disparity is more severe than large company CEO gender disparity?  Maybe.  

But in statistical terms, it's more valid just to say that women make up 1.1% of all executions since 1976 in the United States versus 4.1% of all CEO's.

Tuesday, March 3, 2015

When a team member is being a resource hog...

(no real content in this post, just me having fun with a team)

Teach them a lesson by killing all of their sessions...

--drop table #temp
create table #temp (spid int, ecid int, status varchar(255),loginname varchar(255), hostname varchar(255), blk int, dbname varchar(255), cmd varchar(255), request_id int)

insert into #temp
execute sp_who

declare @spid int
declare @sql varchar(max)

select spid
from #temp
where loginname like '%levi%'

OPEN state_scroll

FETCH NEXT FROM state_scroll
INTO @spid


set @sql = 'kill ' + cast(@spid as varchar(255))

FETCH NEXT FROM state_scroll
INTO @spid

CLOSE state_scroll
DEALLOCATE state_scroll

Monday, March 2, 2015

Board Deck Week

I sit here Monday morning with about ten data requests littering my inbox.  All due by Wednesday.  While this is not an unmanageable task (in fact if I wanted to, and skipped a meeting, I could probably have these all done by noon without any team member help), but it is quite a task to undertake and verify accuracy of results.

The requests are data for various decks and charts to be  presented at this weeks Board of Directors meeting.  I'm seeing three general categories of requests.

  1. Ad hoc query requests. Example: what do customer growth trends look like over time?
  2. Ad hoc modeling requests. Generally phrased as "what if". .. for instance what if the regulatory environment in all states start looking like Washington State?  Actually created something I termed a stochastic customer entropy model to solve this problem.
  3. Model performance requests. Example: what kind of lift have the new underwriting models created?

I actually have quite a bit of experience with these kinds of requests.  At my old job the team had "Board Deck Week" where we would spend almost all of our time pulling ad hoc queries, cutting data new ways, and creating new models for business analysis.  Though this kind of work isn't necessarily the type of thing that Data Scientists like spending their time on, it speaks to the value that data science teams bring to businesses.

For the simpler requests, it's a recognition that the data science team can quickly pull data from databases, and analyze them in ways that can be Board-ready within a matter of hours.  It's just a verification of the known skills of data scientists.

For some of the more complex requests, it's a vote of confidence in the value-added nature of data science teams.  In some cases the executive team has a question they've never been able to address, and see models as a legitimate way to get there.

In other cases, if the executives are interested in actual model results, the importance of data science speaks for itself.  What this means, is that our models are deployed in areas of the business so important that their results need to be reported the Board.  While this can create a lot of anxiety on the team, a payoff exists as well, as the team is obviously viewed as important to the business, and if models perform poorly, the business is likely to give us a chance to improve them rather than throw them away.