1. Frequencies\Charts\Plots
Simple frequencies and plots can tell you quickly if a relationship exists between two or more variables. However, reliance solely on graphs as a diagnostic or research tool, as with any technique, potentially blind you to discovering the true underline relationship.
Example code (R)
library(tcltk) data(longley) hist(longley$Unemployed, breaks= Sturges , col= darkgray ) boxplot(longley$Unemployed, ylab= Unemployed) scatter3d(longley$Unemployed, longley$GNP, longley$Year, fit= linear , bg= white , grid=TRUE)
2. Correlations
Correlations measure the influence one variables has on another. The values range from 1 to 0 with 1 indicating perfect correlation. Remember correlations do not show causality only if the variations in two or more variables are related. Also, non-linearities and interactions can obscure the relationship.
Example Code (R)
data(swiss) results.Corr
3. ANOVA
Analyses of Variance is a powerful tools to show correlation between two or more variables. While it may not lead directly to a forecast model it can help a research gain knowledge of the relationship between the data elements. It is also useful in seeing how variables in a system are related to one another.
Example Code (R)
data(Seatbelts) anova(lm(DriversKilled ~ PetrolPrice, data= Seatbelts))
4. Cluster Analysis
Cluster analysis is often times confused with principle components(factor analysis). Both are powerful unsupervised data reduction tools. While a principle components is concerned with grouping columns in a dataset together cluster analysis is concerned with grouping rows together. It can be a powerful tool in building rules and dummy variables. For example, if a strong group merges from cluster analysis for young males it would be prudent to test this subgroup either by splitting the data or adding a dummy variables based on it.
Example code (R)
data(swiss) library(DCluster) library(cluster) hc
5. OLAP Cubes\Pivot Charts
Online Analytic Processing (OLAP) is a power data mining tool. It allows uses to run ad hoc queries on a database quickly with little understanding of data access languages such as SQL. The end results are frequencies. OLAP requires an intelligent machine (preferably a statistician) to wield it and will not uncover relationship by itself.
Most OLAP tools come with a graphic interface (GUI). OLAP can be thought of more as a substitute for SAS or SQL. It allows users to program complex queries using drag and drop interface that is intuitive to use.