Multiple Adaptive Regression Splines

Multiple Adaptive Regression Splines is a data mining technique developed by Jerry Friedman (who also co-developed CART).   Like other data mining techniques MARS is designed to model complex non-linear relationship both via interaction and nonlinear transformations.  It also corrects for issues such as irrelevant repressors and missing values.  One touted benefit of MARS over other data mining techniques is the model’s readability.  Since MARS is developed from standard techniques (spline function and regression models) the results are more familiar.

How MARS works by fitting multiple splines to independent variables. A spline in this contexts in a function that allows discontinuities in the relationship (also known as non-linarities).  For example, imagine demand for cars in a four-person household.  You could imagine the household’s demand for an additional car if the household already has four cars is more price sensitive (elastic) than if the household has less than three cars.  In this situation the relationship between price and household demand has a discontinuity when the household already has four cars.  This point is called a knot and a spline function can model this.   MARS finds the knots and interactions by using a brute force search procedure to minimize a loss of fit criterion.  In this manner missing values and non-predictive variables are also addressed.  The key to MARS is its efficient algorithm for searching this very large space (all possible interaction and knots).

This type of complex relationship can also be modeled by Artificial Neural Networks, Support Vector Machines and Regression Trees.  Where MARS shines is its understandability.  In the end you get a model with knots, interaction and coefficient. Many find this more digestible than weights on hidden neurons in hidden layers. MARS has a commercial version available from Salford Systems and an open source version in R.