Wednesday, May 6, 2015

Modeling Fitness Tracking and Messing Up Models

I've never met a metric that I couldn't screw up in some way after measuring, modeling, and focusing on it. 

This happens to me quite a bit at work, where someone will say "hey, we really need to work on our 'X' ratio."  Then everyone works on the X ratio for a few weeks which makes X go up, predictably.  This creates a couple of problems, though:
  • X gets better for not-normal, and often not-model-able reasons (maybe not a concern for the business, but certainly irritating to me).
  • X gets better sometimes at the expense of the business.  I should probably blog on this later, but I've seen many times businesses sacrificing long-term revenue to improve a single KPI.

But this is a fitness blog entry, so let's start with that.

So, good news on the fitness data front:
  • I now have over a month of data to model.
  • My activity level has increased over 30% since my first week of tracking.
Except that taken together, these two things don't necessarily lead to predictive data models.  But more on that later.  Here is a summary chart of progress over the weeks.

Data Modeling:

I'm trying to create a model by which I can model future day's activity level using various data I'm collecting.  So far, here are the attributes I have for modeling:

  • steps: (how many steps I do today, dependent variable)
  • weekend: (is today a weekend?)
  • day_of_week: What day of the week is it?
  • week: What week is it?
  • steps_prior: How many steps did I do yesterday.
  • hours_three_prior: How many hours of sleep did I get in the last three nights.
  • sleep_hours: Hours of sleep last night.
  • travel: a binary for whether I was traveling that day or not.
  • sick: was I sick today?
Obviously with only 35 observations, I can't use all of these attributes, but I can start a small model. 

First I plotted some data just to get some ideas about trends.  A few interesting correlations found below.

I found that a simple correlation between steps yesterday and steps today was positive.  Does this imply hat by simply moving more I can get an infinite feedback loop where I continuously up my activity level?  Probably not, but interesting.

Next I looked at my weekend variable (plot below).  I knew that most weekend days I move more than weekdays, but now I can quantify.  Variation is also a lot larger on weekends.  I looked through some past data, and realized that there are some relatively low activity weekend days, generally when I'm traveling.  This is why I added the "travel" variable to my models.

So can I model this? Absolutely.  Here's a summary of the attributes in the model I created.

  • Weekend: I get (on average) 9,000 more steps on weekend days than weekdays.
  • Steps Prior: Ceteris paribus, for every 1 step I take today, I will take .25 fewer steps tomorrow.  This doesn't meet my correlation above, however it makes more a priori sense (if I move a lot today, I'll be sore or tired and move less tomorrow).  Why did the correlation change? Likely because the initial simple correlation was just measuring weekend days, or intra-week correlations (my general increase of activity over time).
  • Travel: On travel days I get 6,000 fewer steps.
  • Factor(week): Week level fixed effects generally increasing over time.  I probably need a better methodology here.  But I had to add this factor to make any of the models really work.  Why?  Because  as my first graph on this blog shows, I've greatly changed my behavior week-to-week in a way that make models far less predictive if I don't control for it.  
In common terms, weekends have always been better than weekdays, but my behavior over time has changed so much, that current weekdays are almost as active as weekends when I first started tracking.  This is similar to the effect we see for the "steps prior"correlation mentioned above.

I think I have a reasonable start on a model, and some interesting insight into my activity level.  Tracking data this way is fun (to me) and also interesting.  Would love to get my hands on someone else's data to see if general relationships hold up.

On the other hand, much like a business KPI that gets distorted as a result of measurement and focus, my fitness data has changed so significantly since the beginning of this experiment, that it has become more difficult to model what my "natural state" activity level might be.

1 comment: