Here is an example problem; you want to determine how the competitive environment is affecting store performance. You proceed to estimate a model with store performance as a function of number of rival stores near by in hopes of seeing how rivals affect performance. But you have violated a key assumption of BLUE; at least one independent variable is contemporaneously correlated with dependent variable thus the error term. If a store is profitable it will attract rivals. When your independent variable is correlated with dependent variable you cannot get consistent estimate using regression analysis.
The result is a downward bias estimate on the affect of rivals on store performance. If a location is highly profitable more firms will enter the market increasing the number of rival store while not necessarily adversely affecting a store’s performance.This is a common occurrence. Other examples are income and education, supply and demand, store location and credit scores.
One solution is to use Instrumental Variables (IV). The key to IV is the use of a proxy variable(s) correlated independent variable but not contemporaneously to the error term. The IV can be a variable such as number of small streams to proxy for urban/city or a fitted value. Two stage least squares (2SLS) is a common IV technique that uses a fitted value from a second regression as the IV in the first regression. When building a 2SLS model you still need independent variables that are correlated with the variable you are trying to proxy for but not correlated with the primary model s error term to use in the second stage of the model.
In the store performance location example you could employ a 2SLS model to estimate the number of rivals based on variables such as development tax incentives, number of rivals before the stored opened and other such factors not directly correlated with the performance of store you are examining.
Instrumental variables can also be used to correct for omitted variables by choosing a proxy variable closely correlated with the missing variable.
OpenBayes A free Bayesian Belief Network library written in Python.
The-Data-Mine.com Another website focused on data mining.