BootStapping

Bootstrapping is one of the most useful statistical techniques. It is also one that is often misunderstood, over used and avoided. In general, bootstrap is a process that uses simulation to make inferences about the probability distribution of a sample population. Operationally, you take repeated samples with replacement from a given dataset to build a new dataset. This process is repeated multiple times till the number is sufficient for statistically valid conclusions to be made. It was first introduced by Effron (1979). Jackknife a similar but less popular re-sampling technique pre-dates Bootstrapping. It does not use re-sampling whereas bootstrapping does.

The power of bootstrapping is its ability to make statistical inference about a population using only small, potentially biased sub-sample. And that is why the term bootstrap comes from the phrase, “to pull oneself up by one’s bootstrap’. It is seemingly impossible task. And it does this all of this without the restrictive assumption of normality. Bootstrapping is also valid with small sample sizes (as small as twenty).

There are two main types of bootstrapping, parametric and non-parametric. Non-parametric bootstrapping does not assume a distribution for the population but instead defines the distribution from the data. Parametric bootstrapping assumes the population follows a known and parameterized distributions such as a log-normal.

Below are example uses of bootstrapping:

*If faced with a biased sample or an infrequent event you can employ bootstrapping to resample cases. You can employ this when estimating a logistic regression with a rare event.

 

*By re-sampling residuals using bootstrap technique you make inferences about the asymptotic properties the confidence intervals (CI) and other goodness of fit statistics. This is useful when the sample size is small or the assumption of normality is too restrictive.

*To make inferences about a population you can bootstrap the sampling distribution. This offers a powerful alternative to using standard methods when the assumption of normality is too restrictive.

*Bootstrapping can also be useful to detect outliers.

Further Reading

www.uvm.edu/~dhowell/StatPages/Resampling/Bootstrapping.html

www.uvm.edu/~dhowell/StatPages/Resampling/Resampling.html

wikipedia.org/wiki/Bootstrapping

Sas.com/kb/24/982.html