This is an example of how to make monitoring graphs that are more interesting therefore more likely to be paid attention to. The goal for such graphs is to make boring and uniteresting be ok. This is surprisingly easy. When the data has lots structure (lots of information) exciting and interintresting patterns emerge. When it is random the graphs tend to look less interesting. This is exactly what we want, residual should have no structure; they should be random.
The data is GDP, Wages and Employement for the US from 1949 to 2009 via St. Louis Federal Reserve. This is a great place for data on the economy.
The model’s goal is to predict next quarter’s employment using the previous quarters wage and GDP data and hopefully creating colorful errors in the process.
The data pull code can be found here.
And the code for generating the models and graphs here.
First we run a simple linear regression leaving out the last 50 oberservatons.
mdl.v1 <-lm(EMP ~ L1_GDP + L1_WAGES ,data = df.Mdl[1:SmpEnd, ])
If you looked at the summary it would have been obvious that this model has issues. The R-Square is near 1. But lets ignore that for now.
First we generate the out of sample predictions.
v1.pred <- predict(mdl.v1 , df.Mdl[OutSmpStart:OutSmpEnd,]) v1.pred
Next we normaized residuals using within sample varience (this is the Y)
v1.res <- v1.pred- df.Mdl[OutSmpStart:OutSmpEnd,]$EMP
v1.var <- var(df.Mdl[1:SmpEnd, ]$EMP )
v1.NormRes <- v1.res/sqrt(v1.var )
Now we get a sense of magintuted of the estimates for size.
v1.maxpred <- max(v1.pred^2)
v1.predAdj <- v1.pred^2/v1.maxpred
Next we create the ratio of the dependant over lagged dependant to indicate color. This is good measures of potental unit roots and concept drift.
v1.tRateOfChange <- abs(df.Mdl[(OutSmpStart-1):(OutSmpEnd-1),]$EMP/ df.Mdl[OutSmpStart:(OutSmpEnd),]$EMP)
v1.maxRateOfChange <- max(v1.tRateOfChange)
v1.RateOfChange <- v1.tRateOfChange/v1.maxRateOfChange
Set the plot options.
op <- par(bg ="black", col="white", col.lab ="white" ,col.axis ="white" , col.main ="white" ,col.sub ="white" )
plot(c(1996, 2009), c(-50, 50), type = "n", xlab="Year", ylab="Normalized Residuals", main = "Emp = L1 GNP + L1Wages", sub="Color: Ratio Dep vs Lagged Dep Radius: Normalized Esimate" )
abline( h=0, col = "white")
palette(rainbow(200))
Now look through the out of sample estimates and plot the values.
loops<- OutSmpEnd-OutSmpStart
## loop through the data
i <- (1:loops)
{
ptx = 1996 + i/4
pty = v1.NormRes[i]*100
ptr = v1.predAdj[i]*2
ptcolor = 35 + v1.RateOfChange[i]*165
points(ptx,pty , pch = 19, col =ptcolor, bg =ptcolor ,cex= ptr)
}
There is structure (patterens) in the residuals. This is bad. Residuals should appear random in plots. First off the residuals are trending up over time indicating Heteroscedasticity. Also note that large values (the size of the circles) have larger errors. Secondly, the color is uniform and high on the color chart showing a strong correlation between the dependant and it’s lagged value. This indicates a unit root.
Lets take the log of the varibles to deal with the heteroscedasticity and correct for the unit root by taking first difference.
df.Mdl.2<-df.Mdl[2:OutSmpEnd,]
df.Mdl.2$EMP_log_1D <- log(df.Mdl[2:OutSmpEnd,]$EMP ) - log(df.Mdl[1:(OutSmpEnd-1),]$EMP)
df.Mdl.2$L1_GDP_log_1D <- log(df.Mdl[2:OutSmpEnd,]$L1_GDP) - log(df.Mdl[1:(OutSmpEnd-1),]$L1_GDP)
df.Mdl.2$L1_WAGES_log_1D<- log(df.Mdl[2:OutSmpEnd,]$L1_WAGES) - log(df.Mdl[1:(OutSmpEnd-1),]$L1_WAGES)
Now re-estimate the model.
mdl.v2 <-lm(EMP_log_1D ~ L1_GDP_log_1D + L1_WAGES_log_1D ,data = df.Mdl.2[1:SmpEnd, ])
Now that is better -not perfect, but better. The residuals are centered near zero and show a inconsistent relationship for the dependent and its lagged value. This is not the best model but hopefully this shows how more creative charts can aid in model maintance and developement.
Note: Why did I only use the range 24-200 from the rainbow palette? Easier to see. below is a plot of all colors using the rainbow pallet. I find the lower end of the spectrum hard to distinguish form the high end so I skip it. Also note how the colors do not pop out as much with the white background. Black backgrounds are a quick way to make the colors stand out.