Artificial Neural Networks

1. Intro
Artificial Neural Networks (ANN) are an attempt to simulate how the mind works. It was developed using the connectionist approach to computer learning.   They gained popularity among forecasters due to their ability to model non-linearities and interactions across variables.  ANN are used for a variety purposes from forecasting the stock market to pattern recognition to compression algorithms.  Critics of ANN often decry them as a black box.  In reality, the working of a ANN model are viewable but their foundation is not in statistics, but artificial intelligence which has a more limited audience.  ANNs have a tendency to over fit so use of a hold out sample and intelligent forecasting practices is required.

2. Overview
ANNs simulate how neurons operate in the brain by using a network of artificial neurons organized in layers with one input layer and one output layer. Artificial neurons (nodes) are simple processing elements that typical have one response value and one to many input values. The neuron is trained to minimize the predicted error by modifying how it responds to its data element; that data element may be the response of other neuron(s) from a higher level.  The simplest ANN has two layers, an input and an output layer.  The two layers are connected via weights, which are adjusted to minimize forecast error.  A model with one input layer and no hidden layer is very similar to a simple regression model.  The hidden layers are all layer between the input and output layers.  This name is misleading, they are not hidden and their weights can be viewed.

border=
The above example is of a Multi-Layer Perceptron with one hidden layer.

3. Classification
There are many ways to classify neural networks and no consensus among researchers as to the best method. Below, I briefly cover three type of classification for ANN.  These classifications do overlap.

a) Learning features

1) Supervised: these ANN fit a model to predict output from inputs.  This is analogous to a regression model where the research choose a dependent variable. The output of the model is a fitted or predicted value.

2) Unsupervised: ANN has no desired output from the model.  They can be viewed as a form of data reduction like cluster analysis that finds a pattern among variables across observations.

b) Layer Structure

1) Single-layer Perceptron

Single-layer networks are an early attempt at ANN with no hidden layer.  The inputs are fed to the output using a series of weights.

Input
O
O

\
/
Output
O

2) Multi-layer Perceptron

A Multi-layer Perceptron consists of at least three layers, input, hidden output. This allows the system to model interactions across variables as well as nonlinearities.

Input
O
O
|\
/|
Hidden Layer
O
O
\
/
Output
O

c) Network Structures

1) Feed Forward 

In feed forward models the data flows directly from the input to the output.

2) Feed Back

Feedback models allow the output of the model to influence itself, feedback into the system.  It is a powerful means of dealing with issue such a serial correlation.

3) Kononen Self-Organizing Network

This is an unsupervised learning algorithm.

4. Common ANN Training Functions

a) Error Backward Propagation

The most widely used training algorithm is backward propagation (BP).  Backward propagation works by repeatedly looping through a learning cycle (when the neuron weights are recalculated) and readjusting the neuron weights and importance.  In each iteration you calculate the scaling factor (adjustment to the weights to better match the desired output) and assign an error to each neuron (the error is also called the blame and is used to adjust the neuron’s importance).  It is called backward propagation because information obtained by looking at the output node, the final mode, is applied upward through the structure.

Common activation functions:

livesrc= livesrc=
Sigmun function Step Function
livesrc= livesrc=
Sigmoid function Tan-hyperbolic Function

b) Radial Basis Function

Radial Basis Function (RBF) is used primarily for image recognition. It is similar to BP but with more restrict assumptions on learning resulting in faster computation.  Can only have one hidden layer.

c) Probabilistic

Probabilistic neural networks (PNN) are also similar to BP but do only one pass through the data and therefore have a much faster computation rate. It is also mainly used for images but also in cases where rapid response is necessary such as with robotics.

d) Recurrent Neural Network

Recurrent Neural Networks (RNN) are a type of feedback ANN. They are useful where serial correlation exists or the data is noisy.

e) Self-Organizing Feature Maps

SOFM are a type of Kononen Self-Organizing Network ANN.

5. Building a model

a. Preparing the data
For many neural network packages it is required to do the following:

1. Input variables must be bound between [0,1] if using a signmoid function.
2. Binary inputs are transform to [.1,.9]. This is because 0 and 1 are at the extreme of the choice functions and convergence may not be possible.

b. Settings
1. Number of hidden layers

The hidden layers are between the input and output layers.  Typically for modeling linear processes one hidden layer is sufficient.  Image recognition typically requires many hidden layers.  Too many hidden layers can result in over fitting.  In practice, unless the model warrants over fitting, like data compression, stick with one hidden layer.

2. Number of neurons for each hidden layer

One critical choice is the number of hidden layers and the number of neurons in each layer. A triangle formation is a good place to start. With the triangle formation the first hidden layer has the same number of neuron as they are input nodes; the next hidden layer has half as many neurons and so forth.

6. Example Code (R)

data(swiss)
library(nnet)

results.Model