Descriptive Analysis

1. Frequencies\Charts\Plots

Simple frequencies and plots can tell you quickly if a relationship exists between two or more variables. However, reliance solely on graphs as a diagnostic or research tool, as with any technique, potentially blind you to discovering the true underline relationship.

Example code (R)


hist(longley$Unemployed, breaks= Sturges , col= darkgray )
boxplot(longley$Unemployed, ylab= Unemployed)
scatter3d(longley$Unemployed, longley$GNP, longley$Year, fit= linear , bg= white , grid=TRUE)

2. Correlations

Correlations measure the influence one variables has on another.  The values range from 1 to 0 with 1 indicating perfect correlation.  Remember correlations do not show causality only if the variations in two or more variables are related.  Also, non-linearities and interactions can obscure the relationship.

Example Code (R)



Analyses of Variance is a powerful tools to show correlation between two or more variables. While it may not lead directly to a forecast model it can help a research gain knowledge of the relationship between the data elements. It is also useful in seeing how variables in a system are related to one another.

Example Code (R)

anova(lm(DriversKilled ~ PetrolPrice, data= Seatbelts))

4. Cluster Analysis

Cluster analysis is often times confused with principle components(factor analysis).  Both are powerful unsupervised data reduction tools. While a principle components is concerned with grouping columns in a dataset together cluster analysis is concerned with grouping rows together. It can be a powerful tool in building rules and dummy variables.  For example, if a strong group merges from cluster analysis for young males it would be prudent to test this subgroup either by splitting the data or adding a dummy variables based on it.

Example code (R)


5. OLAP Cubes\Pivot Charts

Online Analytic Processing (OLAP) is a power data mining tool.  It allows uses to run ad hoc queries on a database quickly with little understanding of data access languages such as SQL.  The end results are frequencies.  OLAP requires an intelligent machine (preferably a statistician) to wield it and will not uncover relationship by itself.

Most OLAP tools come with a graphic interface (GUI). OLAP can be thought of more as a substitute for SAS or SQL. It allows users to program complex queries using drag and drop interface that is intuitive to use.