Wednesday, September 21, 2016

Distribution Analytics and My Fitness Tracker

The last few weeks have been quite crazy so I haven't had much time to write on the blog.  Also I'm working under so many NDA's (non-disclosure agreements) that I don't have much work or contracting items that I can blog about.  Actually, I've been somewhat hesitant to talk to anyone about anything lately, for fear of breaking an agreement.  Luckily, today I have some every-day data, with data science applications that we can initially explore.

Today I can break my blog-drought, based on some very cool behavior out of my activity tracker.  About a year ago, I did some work looking at the data I capture from my activity tracker (Garmin Vivo Fit 2).   I created some regression models to predict/understand daily steps patterns which led to some interesting observations:
  • My activity patterns tend to be auto-regressive counter-cyclical.  In essence, my daily step patterns are negatively correlated to steps the prior day, high-step days follow low-step days, and vice-versa.
  • My weekends are much more active than my weekdays.  This was by a factor of about 5k-10K steps, but now seems to be much less.
  • Sleep had a weird multi-day effect.  My fitness tracker also tracked when I sleep, which led to some interesting step-sleep interactions.  Generally more steps led to a bit more sleep that night (marginally measurable) but sleep had a multi-day effect on sleep.  In essence, the best sleep predictor of steps, was the last give day average of sleep time.  
  • Here are a couple of recent weeks of the available data.  In essence, I average a little over 20K steps per day (80+ miles per week), and most days (now) are similar in steps.

Analysis of my own fitness tracker distributions last year were interesting, however I wondered how my activity level compared to the rest of the population.  I had researched generally, and found several studies that say Americans on average take 5,000 steps per day.   This is interesting because it is only about a quarter of my daily step total.  I wondered if some demographics were impacting this, for instance, is it possible that older Americans were significantly dragging down the average.

Then, my Garmin app did something cool.  Specifically, it sent me this chart.

A summary of points from the chart:
  • I'm in the top 1% of activity level for people in my age and gender group (mid-30's dudes).
  • The average for my demo group appears to be about 6K-8K.  
  • The data is significantly skewed, with a long right-hand tail (which I'm in).  This indicates a group of people who also take a significantly higher number of steps than the average.
This is great data, and gives my activity level a bit of good context.  But it really makes me want to start a database so I can create auto-regressive models for all users.  More directly, knowing that pooled data exists, I want access to the Garmin data warehouse so I can apply data science techniques to this data, I doubt they offer this though.  

I was really sold on this data, then my activity tracker brought me back down to earth.  Specifically, it sent me this chart:

In sum: 
  • I get less sleep than almost 90% of people my age (under 7 hours). 
  • The average amount of sleep for people my age is just below 8 hours. 
I don't know exactly what to do with this information... maybe it's time to create full-on blown out auto-regressive sleep models? See if I can diagnose why I sleep less?  Maybe.  Though I don't feel tired.  Here's how Garmin describes a week of my sleep.

Nothing obvious there, except some nights that I don't get to sleep until well after 11 pm (likely due to late-night contracting).  Also a few *awake* bars in the middle of the night, likely due a certain three year old yelling for me.  Nothing obvious, but maybe some models will help me figure this one out, as I seek to find out why I'm killing it with my activity level, and sucking it up with sleeping.  More coming soon...