On audit-ability in machine learning

Again a moment of honesty:  I started my career working for the Kansas Auditor's office (known as the Kansas Legislative Division of Post Audit) working on school funding, government efficiency and fraud.  I held the absurd title of "Principal Data Mining Auditor"... but.. that was a long time ago (5 years).  I don't regret my experience, though after leaving I swore I'd never work with auditors ever again, I just like being more creative than that.

Fast forward five years, and I'm working in financial services, and suddenly auditing is key again.  This is partially due to financial managers wanting to understand the "innards" of models, but also due to outside auditors and the government (CFPB) wanting to audit our decision-making algorithms.
As a result, I sometimes have to make a decision about which algorithms to use not based on performance, but based on the ability of government auditors to understand what I do.

For instance, I use an ensemble learning process (multiple algorithms) for part of our decision making, part of which uses an SVM..  I can only train the SVM on a truncated data set, missing several variables including some employment and age information, because.. in simple terms, I have to be able to prove that the output function provides a continuous one-direction first derivative.

So, this is my short list (or generalization) of audit-able versus non-auditable methods:

Can be audited:
Multivariate Regression
GLM (logistic)
Spline Regression
Simple Decision Trees
Naive Bayes

Difficult to audit:
Artificial Neural Networks
Support Vector Machines
Relevancy Vector Machines
Random Forest


