Statistical/AI Techniques

1 Intro

There is a forest of traditional statistical techniques and new artificial intelligence algorithms for forecasting. Choosing the right one can be difficult.

2. Choosing variables

With the Information Age, forecasters got a mixed blessing.  Now we have more data than was dreamed by the most optimistic forecaster just fifteen years ago.  We typically work with datasets consisting of thousands of data elements and millions of records but what to do with all this… stuff? Most of the data elements logically have no relation whatsoever with the features we are studying. And worst, many of the variables are hopelessly correlated with one another and by the law of large numbers, many erroneous relationships will emerge from this plethora of information.

a) Research

Again, many problems can be solved by communication or reading the research. If it is important someone has done it before.

b) Systematic algorithms

The method I favor is writing systematic algorithms to cycle through all data elements available, analyze their relationship with the target feature using a measure like MSE then cherry-pick the best element for further analysis.

b.1) Stepwise Regression

Stepwise regressions reduce the number of variables in a model by removing variables one at a time and calculating the marginal gains from including the variable.

b.2) Lorenz, ROI, and ROC curves.

Cycle through each potential independent variable and generate curves showing the relationship with the dependent variable.

b.3) Correlation

A simple two-way correlation between the potential independent and dependent variables is another technique for finding potential independent variables.

c) Data mining

There are several data mining techniques I will discuss in the data mining section that are geared to uncover linear and non-linear relationships in the data.

d) Principle Components\Factor Analysis

As mentioned in the previous section this technique can aid in reducing the number of variables whose relationships need to be estimated with the hope of not losing too much information. Again, this is an estimation technique and should be treated as such.

3) Forecasting Techniques

Below is a cursory overview of common forecasting techniques. A more detailed overview is provided in the statistics and data mining sections. All the example code is in R except where noted.

a) Ordinary Least Squares Regression

This is the classic forecasting technique taught in schools.

1) Building a model

a) There are many packages that provide ordinary least squares estimates.

b) Variable selection is important.

c) Outputs a simple equation.

d) Can model time series and cross-sectional data.

e) Continuous or Categorical Variables

) Dialogistic Tools

In the statistical section, I go over in detail the evaluation of OLS models. But here are some tools for uncovering majors issues within building an OLS model:

a) QQ Plot

b) Residuals plots

c) Correlations

d) Partial Regressions

e) MSE

f) R-Squared

2) Caveats

Require strong assumptions as to the nature of the data and relationships to maintain BLUE.  BLUE is discussed in detail in the statistical section.

4) Example Code(R)

data(trees)
Results.Model1

b) Logistic Regressions

Logistic regressions have a very different theoretical foundation from ordinary least squares models.  You are trying to estimate a probability, so the dependent data variable only has values of 1 or 0. This violates the assumption required for BLUE.

1) Building the model

a) Many software packages have Logistic regression models included.

b) Variable selection is important.

c) Outputs a simple equation.

d) Can model time series and cross-sectional data.

e) Probabilities or Categorical  Variables

2) Dialogistic Tools

a. Confusion Matrix

b. Lift Charts

c. ROC chart

d. Lorenz Curve

e. Cost Sensitivity/ROI

3) Caveats

Non-linearities can obscure relationships between variables.

4) Example Code(R)

library(MASS)
Results.Model1

c) Vector Autoregression Models (VAR)

Vector Auto Regression models require a firm theoretical foundation.  They are designed for estimating the relationship between a matrix of codependent, autocorrelated variables.  To identify the structure you must make strong assumptions on the structure of the errors, namely how the errors are temporarily related.

1) Building a model

a) There are few packages that provide Vector Auto Regression models.

b) Variable selection is critical.

c) Outputs a simple equation.

d) They correct for auto-correlation and simultaneous equations.

e) Can model time series data.

f) Continuous Variables

2) Dialogistic Tools

Same as for an OLS model.

3) Caveats

By changing the order of the variables in the model you change completely your theory of what true relationship is.  For example, if you order money first you believe the money supply drives output.  If you order output first you believe output drives money.  These are two contradictory models.  Due to this strong reliance on a detailed understanding of the true relationship between variables and all the assumptions required for an OLS model as well they have fallen out of favor in many forecasting circles.

4) Example Code(R)

# This example uses Bayesian c VAR model with flat priors.
Library(MSBVAR)
data(longley)

# Flat priors models
szbvar (longley, p=1 , z=NULL, lambda0=1, lambda1=1, lambda3=1, lambda4=1, lambda5=0, mu5=0, mu6=0, nu=0, qm=4, prior=2,  posterior.fit=F)

d) MARS

Multivariate Adaptive Regression Splines are designed to better deal with nonlinear relationships.  They can be seen as a blending of CHART and OLS.

1) Building a model

a) There are few packages that have MARS models.

b) Variable selection is similar to OLS but you do not need to worry as much with nonlinearities.

c) Outputs a simple equation.

d) Can model time series and cross-sectional data.

e) Continuous or Categorical  Variables
2) Dialogistic Tools

Similar to an OLS model.

3) Caveats
The output can be difficult to read with a complex model but are understandable.  They are prone to overfitting.

4) Example Code (R)

data(glass)
library(mda)
Results.Model1

e) Artificial Neural Networks (ANN)

Artificial Neural Networks(ANN) are an attempt to simulate how the mind works.

1) Building a model

a) There are many good neural network packages.

b) Variable selection is similar to OLS but many non-linearities can be assumed to be handled by the ANN.

c) The output is not understandable in the manner before mentioned models are.

d) Can model time series and cross-sectional data.

e) Probabilities or Categorical  or Continuous Variables

3) Dialogistic Tools

I have not found ANN to be black boxes as they are often criticized as being. You can use the same tools as with an OLS or logistic regression.  To find out the influence of each variable you can cycle through each variable, remove it then re-run the model.  The effect of the variable can be measured via MSE.

3) Caveats
Overfitting

4) Example Code (R)

data(swiss)
library(nnet)
results.Model

f) Support Vector Machines (SVM )

Support Vector Machines are closer to classical statistical methods but hold the promise of uncovering nonlinear relationships.

1) Building a model

a) There are few good SVM packages both commercial and open source.

b) Variable selection is similar OLS but many non-linearities can be assumed to be handled by the ANN.

c) The output is not understandable in the manner before mentioned models are.

d) Can model time series and cross-sectional data.

e) Probabilities or Categorical  or Continuous Variables

2) Dialogistic Tools

I have also found SVM not to be black boxes. You can use the same tools as OLS and logistic regression to diagnose like with the ANN.

3) Caveats

Overfitting

4) Example Code (R)

data(swiss)
library(kernlab)

## train a support vector machine

results.KVSM1

g) Regression Trees

Regression trees briefly became popular as a forecasting technique around the turn of the century.  It was hoped that they could better model nonlinearities but proved to be prone to overfitting.

1) Building a model

a) There are several good Tree packages both commercial and open source.

b) Automatic variable selection.

c) The output is easy to understand.

d) Can model time series and cross-sectional data.

e) Probabilities or Categorical  or Continuous Variables

2) Dialogistic Tools
You can use the same tools as OLS and logistic regression to diagnose.

3) Caveats

Overfitting

4) Example Code (R)

data(kyphosis)
library(rpart)
library(maptree)

DefaultSettings

h) Bagging, Boosting and Voting

Bagging is a way to help unstable models become more stable by combining many models together.

i) Boosted Trees and Random Forests

Boosted Trees apply the boosting methodology applied to trees. You run many, in some case hundreds, of small regression trees then combine all the models to using a voting methodology to stabilize the results.  The resulting model is very complex, but much more stable than any individual tree model would be.

1) Building a model

a) There are several good tree packages both commercial and open source.

b) Automatic variable selection.

c) The output is easy to understand but very large.

d) Can model time series and cross-sectional data.

e) Probabilities or Categorical  or Continuous Variables

2) Dialogistic Tools

You can use the same tools as OLS and logistic regression to diagnose.

3) Caveats

Overfitting

4) Example Code (R)

data(swiss)
library (randomForest)
set.seed(131)
Results.Model1