Friday, January 23, 2015


The company I work for is trying to start a new online brand, one that's heavily dependent on analytics and data science work.  I see this as a huge and exciting opportunity, however one that needs scale.

The biggest issue I run into is that online analytics do not "unscale" well: you almost have to go big or go home.

We're starting small, not even throwing a full analyst at the problem, but also, we're not able to throw a ton of marketing resources in, meaning low traffic and low volume at this point.

The low volume is a HUGE issue in analytics, especially in the online space.  Here's the main issues I've identified:

1. Low Volume = Low N.  Simple statistical concept, I can't develop highly accurate predictive algorithms and models without a large population and experience.  

2. Low Volume means it takes a while to figure out issues.  We have seen several issues, both in web-side technical matters, but also in the first set of derived analytics rules we set for the website.  Simply not seeing many transactions means it takes longer to diagnose issues.

3. Low Volume means resource constrained.  In essence, we aren't spending a lot of money here, and we aren't making a lot of money in the near future, so business decision makers don't want us to spend a lot of time here.  This makes the business ramp-up curve even longer.  

4. Low Volume creates bad models.  This is specific to our business.  High fraud rates (that grow quickly out of the box and aren't dependent on marketing) and low legitimate traffic volume makes the total fraud rate approach 100%.  This creates an issue where predictive models (based on experience) become overly tight,  even in spaces where legitimate business exists.  Think in terms of a K-NN or an SVM, in a world where 99% of what an algorithm sees is fraud, it's difficult to detect vector-spaces where near-points would vote "not-fraud". 

No comments:

Post a Comment