## Wednesday, January 18, 2017

### Data Science Method: MARS Regression

People often ask which data science methods I use most often on the job or in exploring data in my free time.  This is the beginning of a series in which I describe some of those methods, and how they are used to explore, model and extrapolate large data sets.

Today I will cover MARS regression (Multi-Adaptive Regression Splines), a regression methodology that automates variable selection, detection of interactions, and accounts for non-linearities.  This methodology at times has become my hammer (from the saying, when you have a hammer in your hand sometimes everything looks like a nail) due to it's usefulness, ease of implementation, and accurate predictive capabilities.

The algorithm for MARS regression originated in 1991 by Jerome Friedman, and I suggest reading his original article for a full understanding of the algorithm.  BTW, because MARS is a proprietary method, the packages in many statistical programs (including R) is called "earth."  Essentially though the algorithm boils down to this:

1. The Basics: The basic mechanics to MARS involves simple linear regression using ordinary least squares (OLS) method. But there are a few twists.
2. Variable Selection: MARS self-selects variables, first using a forward stepwise method (greedy algorithm based on variables with highest squared-error reduction) followed by a backward (in this case, truly back-out) method to remove over-fit coefficients from the model.
3. Non-Linearity: MARS uses multiple "splines" or hinge functions inside of OLS to account for potentially non-linear data.  Piecewise-linear-regression is a rough analog to the hinge functions, except in the case of MARS, the location of hinges are auto detected through multiple iterations. That is to say, through the stepwise process the algorithm iteratively tries different break-points in the linearity of the model, and selects any breakpoints that fit the data well.  (Side note: sometimes when describing these models to non-data scientists, I refer to the hinges humorously as "bendies."  Goes over much better than "splines" or "hinges.")
4. Regularization: The regularization strategy for MARS models uses Generalized Cross Validation (GCV) complexity versus accuracy tradeoffs during the backwards pass of the model.  GCV involves a user set "penalty factor," so there is room for some manipulation if you run into overfit issues.  As dynamic hinge functions give MARS flexibility to conform to complex functions (intuitively eats degrees of freedom with more effective factors considered in the equation), it increases probability of overfitting.  As such, it is very important to pay attention to regularization procedures.

The hinge function takes this type of form in the equation, allowing the regression splines to adapt to the data across the x axis.

## ADVANTAGES

• Ease of Fit: Two factors impact MARS models ease of fit: variable selection and hinge functions. A while back I was faced with a task where I needed to fit about 120 models (all different dependent variables) in two weeks. Due to the power of the MARS algorithm in variable selection and non-linearity detection, I was able to create these models quite easily without a lot of additional data preparation or a priori knowledge.  I still tested, validated, and pulled additional information from each model, however the initial model build was highly optimized.
• Ease of Understanding: Because the basic fit (once you get past hinge functions) is OLS, most data scientists can easily understand the coefficient fitting process.  Also, even if your final model will involve a different method (simple linear regression for instance) MARS can provide a powerful initial understanding of function shapes, from which you may decide to use related transformation (quadratic, log) in your final model form.
• Hinge Optimization: One question I often receive from business users takes the form "what is the value at which x maximizes it's value with y."  In many of these cases, depending on data form, that can be calculated directly by determining the hinge point from a MARS output, much like a local maximum point or other calculus-based optimization strategy.

## DISADVANTAGES

• Can be Overfit: Some people get overly confident over the internal regularization of MARS and forget that normal data science procedures are still necessary.  Especially in highly-dimensional and highly-orthogonal space, MARS regression will create a badly overfit model. Point being: ALWAYS USE A HOLDOUT TEST/VALIDATION SET. I have seen more of these types of models overfit in the past year than all other algorithms combined.
• Hinge Functions can be Intimidating: Right now, if I went to a business user (or other data scientist) and said that a coefficient on an elastic equation was 0.8, we would have an easy shared understanding of what that meant.  However, if I give that same business user a set of three hinge functions, that's more difficult to understand.  I recommend always using the "plotmo" package in R to show business users partial dependency plots when building MARS models. This provides a simple and straightforward way to describe linear relationships.

## AN EXAMPLE

And finally, a quick example from real world data.  The Kansas education data set I've used before on this blog can be modeled using a MARS algorithm.  In this case I pretended I wanted to understand the relationship between FTE (the size of the school) and spending per pupil.  From an economics perspective, very small schools should have higher costs due to lacking economies of scale.  I created a model in R, including a few known covariates for good measure.  Here's what the output with hinge functions look like:

That's all a bit difficult to read, what if we use a partial dependency plot to describe the line fit to the FTE to Spending relationship?  Here's what that looks like:

The green dots represent data points, the black line represents the line fit to the data per MARS regression.  The extreme left side of the graph looks appropriate, fitting an economy of scale curve, and the flat right side of the graph appears to be an appropriate flat line.  The "dip" between the two cuvrves is concerning, and for further analysis. (On futher analysis this appears to be a case of omitted variable bias, in which that category of districts contains many low-cost-of-living mid-rural districts, whereas larger districts tend to be in higher cost areas, so prices (e.g. teacher wages) are higher).

#### 28 comments:

1. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it. Data Science Training in Chennai

2. Well Said, you have provided the right info that will be beneficial to somebody at all time. Thanks for sharing your valuable Ideas to our vision.Big Data Hadoop Training in Bangalore | Data Science Training in Bangalore

3. This comment has been removed by a blog administrator.

4. If some one desires to be updated with newest technologies therefore he must be go to see this web site and be up to date daily. aol.com email sign in

5. Interesting blog post. This blog shows that you have a great future as a content writer. Java Training in Chennai | RPA Training in Chennai

6. Excellent post! keep sharing such a post

Article submission sites
Guest posting sites

7. In the beginning, I would like to thank you much about this great post. Its very useful and helpful for anyone looking for tips. I like your writing style and I hope you will keep doing this good working.
Ethical Hacking Course in Chennai
Certified Ethical Hacking Course in Chennai
PHP Training in Chennai
ccna Training in Chennai
Web Designing Course in Chennai
ethical hacking course in chennai
hacking course in chennai

8. Posing the correct inquiries: Just like with Will Smith from I, Robot, the reaction, or its utility relies upon the nature of the inquiry. ExcelR Data Science Courses

9. Thanks for posting the best information and the blog is very important.data science institutes in hyderabad

10. The Extraordinary blog went amazed by the content that they have developed in a very descriptive manner. This type of content surely ensures the participants explore themselves.

DevOps Training in Hyderabad

11. Terrific post thoroughly enjoyed reading the blog and more over found to be the tremendous one. In fact, educating the participants with it's amazing content. Hope you share the similar content consecutively.

data science course in varanasi

12. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. data scientist course in delhi

13. I was just examining through the web looking for certain information and ran over your blog.It shows how well you understand this subject. Bookmarked this page, will return for extra. data science course in vadodara

14. Fantastic post as well as great guidance! This article is very useful and helpful for us. Thanks for spreading valuable info. Download KineMaster Gold

15. I just wanted to thank you very much once again. Thank you for all your work on this website. You can get a Turkey e visa anywhere with an internet connection and computer , laptop in less than 1 hour quick processing, your Turkish e visa should be ready.

16. I am browsing this website daily and get good facts from here all the time. Aw, this was a really nice post. kenyan evisa, Obtaining a Kenyan visa is quite convenient for individuals who wish to travel to the country for tourism or business purposes.

17. I believe there are many more pleasurable opportunities ahead for
individuals that looked at your site.
oracle dba course in chennai
java course in chennai
node js training institute in chennai