Bootstrap statistics — how to work around limitations of simple statistical tests (2024)

Getting Started

You have a data sample. From it, you want to calculate a confidence interval for the population mean value. What’s the first thing you think about? It’s usually a t-test.

But the t-test has several requirements, one of which is that the sampling distribution of the mean is nearly normal (either the population is normal, or the sample is reasonably large). In practice, that’s not always true, and so the t-test may not always deliver optimal results.

To work around that kind of limitation, use the bootstrap method. It has only one important requirement: that the sample approximates the population well enough. Normality is not required.

Let’s give the t-test a try, and then compare it with the bootstrap method.

Let’s load first all libraries we need:

from sklearn.utils import resample
import pandas as pd
from matplotlib import pyplot as plt
import pingouin as pg
import numpy as np
import seaborn as sns
plt.rcParams["figure.figsize"] = (16, 9)

Note: we’re using the Pingouin library for several statistical tests here. It’s great because many important techniques are kept under the same roof, and have uniform interfaces and syntax. Give it a try:

Now back to the main topic — this is the sample:

spl = [117, 203, 243, 197, 217, 224, 279, 301, 317, 307, 324, 357, 364, 382, 413, 427, 490, 742, 873, 1361]
plt.hist(spl, bins=20);
Bootstrap statistics — how to work around limitations of simple statistical tests (3)

You can sort-of pretend that looks normal, but it has several large outliers, which the t-test does not tolerate very well. Still, let’s do the t-test for the 95% confidence interval of the population mean:

pgtt = pg.ttest(spl, 0)
[272.34 541.46]

That’s a very wide interval, reflecting the spread of values in the sample. Can we do better? The answer is yes, using the bootstrap method.

If the sample was larger, or if you could extract multiple samples from the same population, the results would be better. But that’s not possible here — this sample is all we have.

However, if we assume the sample approximates the population fairly well, we can pretend to generate multiple pseudo-samples from the population: what we really do is take this sample and resample it — generate multiple samples, of the same size, using the same values. And then we work with the set of generated samples.

The number of pseudo-samples you make needs to be large, in the thousands usually.

Since the samples we generate from the original sample are the same size as the original, some values will be repeated in each pseudo-sample. That’s fine.

Let’s generate ten thousand chunks of resampled data from the original sample:

B = 10000
rs = np.zeros((B, len(spl)),
for i in range(B):
rs[i,] = resample(spl, replace = True)
[[ 279 1361 413 ... 357 357 490]
[ 307 427 413 ... 317 203 1361]
[ 413 279 357 ... 357 203 307]
[ 217 243 224 ... 279 490 1361]
[ 301 324 224 ... 224 317 317]
[ 324 301 364 ... 873 364 203]]

Each line in the array is a resampled chunk and is the same size as the original sample. There are 10k lines in total.

Now let’s build the bootstrap distribution: for each line, calculate the mean value:

bd = np.mean(rs, axis=1)
[376.35 515.15 342.75 ... 507.8 426.15 377.05]

The bootstrap distribution contains the means for each resampled chunk of data. It’s easy to get the 95% confidence interval for the mean from here: simply sort the bootstrap distribution array (the means), cut off the top and bottom 2.5%, and read the remaining extreme values:

bootci = np.percentile(bd, (2.5, 97.5))
[300.0975 541.705]

And that’s the bootstrap 95% confidence interval for the mean. How does it compare to the t-test confidence interval?

y, x, _ = plt.hist(bd, bins=100)
ymax = y.max()
plt.vlines(pgtt['CI95%'][0], 0, ymax, colors = 'red')
plt.vlines(bootci, 0, ymax, colors = 'chartreuse')
plt.vlines(np.mean(spl), 0, ymax, colors = 'black');
Bootstrap statistics — how to work around limitations of simple statistical tests (4)

The bootstrap CI (in green) is somewhat more narrow than the t-test CI (in red).

You can use bootstrap to generate a CI for the median value as well: simply build the bootstrap distribution using np.median() instead of np.mean():

bd = np.median(rs, axis = 1)
bootci = np.percentile(bd, (5, 95))
[279. 382.]

Let’s take a sample from a normal population of known variance (var=100):

sp = np.random.normal(loc=20, scale=10, size=50)
Bootstrap statistics — how to work around limitations of simple statistical tests (5)

The variance of our sample is indeed close to 100:


Let’s use bootstrap to extract a confidence interval for the population variance from our existing sample. Same idea: resample the data many times, calculate the variance for each resampled chunk, get a percentile interval for the list of variances.

rs = np.zeros((B, len(sp)), dtype=np.float)
for i in range(B):
rs[i,] = resample(sp, replace=True)
bd = np.var(rs, axis=1)
bootci = np.percentile(bd, (2.5, 97.5))
[74.91889462 151.31601521]

Not bad; the interval is narrow enough, and it includes the true value for the population variance (100). Here’s the histogram of variances from the bootstrap distribution:

y, x, _ = plt.hist(bd, bins=100)
ymax = y.max()
plt.vlines(bootci, 0, ymax, colors = 'chartreuse')
plt.vlines(np.var(sp), 0, ymax, colors = 'black');
Bootstrap statistics — how to work around limitations of simple statistical tests (6)

Here are the weights of two samples of plums, grouped by color:

plumdata = pd.read_csv('plumdata.csv', index_col=0)
sns.violinplot(x='color', y='weight', data=plumdata);
Bootstrap statistics — how to work around limitations of simple statistical tests (7)

Are the two colors, on average, different in terms of weight? In other words, what’s the confidence interval for the difference of mean weights? Let’s try the classic t-test first:

plum_red = plumdata['weight'][plumdata['color'] == 'red'].values
plum_yel = plumdata['weight'][plumdata['color'] == 'yellow'].values
plumtt = pg.ttest(plum_red, plum_yel)
[8.67 27.2]

The t-test says — yes, with 95% confidence, the mean weight of red plums is different from the mean weight of yellow plums (the CI does not include 0).

For bootstrap, we will literally resample both samples 10k times, calculate the mean for each resampled chunk, do the difference of means 10k times, and look at the 2.5 / 97.5 percentiles for the difference of means distribution.

This is the resampled data:

plzip = np.array(list(zip(plum_red, plum_yel)),
rs = np.zeros((B, plzip.shape[0], plzip.shape[1]),
for i in range(B):
rs[i,] = resample(plzip, replace = True)
(10000, 15, 2)

10000 is for the number of resamples, 15 is the length of each resample, and there are 2 samples we compare in each case.

Let’s calculate the mean for each resampled chunk:

bd_init = np.mean(rs, axis=1)
(10000, 2)

And the bootstrap distribution is the difference of means (red mean minus yellow mean) for each one of the 10000 resampled cases:

bd = bd_init[:, 0] - bd_init[:, 1]
(10000,)print(bd)[12.33333333 21. 10.33333333 ... 21.46666667 11.66666667

And now we can get the CI for the difference of means from the bootstrap distribution:

bootci = np.percentile(bd, (2.5, 97.5))
[9.33333333 26.06666667]

Again bootstrap provides a tighter interval than the t-test:

y, x, _ = plt.hist(bd, bins=100)
ymax = y.max()
plt.vlines(plumtt['CI95%'][0], 0, ymax, colors = 'red')
plt.vlines(bootci, 0, ymax, colors = 'chartreuse');
Bootstrap statistics — how to work around limitations of simple statistical tests (8)

Here’s paired data — the strength of several brands of ropes in either wet or dry conditions:

strength = pd.read_csv('strength.csv', index_col=0)
Bootstrap statistics — how to work around limitations of simple statistical tests (9)

Is this enough data to conclude that the wet/dry conditions influence the strength of the ropes? Let’s calculate a 95% confidence interval for the mean difference of strength, wet vs dry. First, the t-test:

s_wet = strength['wet']
s_dry = strength['dry']
s_diff = s_wet.values - s_dry.values
strengthtt = pg.ttest(s_wet, s_dry, paired=True)
[0.4 8.77]

The t-test indicates, with 95% confidence, that indeed the wet/dry conditions make a statistically significant difference — but just barely. One end of the CI is close to 0.

Now let’s do bootstrap for the same CI. We can simply take s_diff (the differences in strength for each rope), and go back to the one-sample bootstrap procedure:

rs = np.zeros((B, len(s_diff)),
for i in range(B):
rs[i,] = resample(s_diff, replace = True)
bd = np.mean(rs, axis=1)
bootci = np.percentile(bd, (2.5, 97.5))
[1.33333333 8.5]

Both the t-test and bootstrap agree that there’s a statistically significant difference (95% confidence) that the strengths are different when conditions are different. Bootstrap is a little more confident (interval is more narrow and is further away from 0).

y, x, _ = plt.hist(bd, bins=100)
ymax = y.max()
plt.vlines(strengthtt['CI95%'][0], 0, ymax, colors = 'red')
plt.vlines(bootci, 0, ymax, colors = 'chartreuse');
Bootstrap statistics — how to work around limitations of simple statistical tests (10)

Notebook with code:

This is just the basic bootstrap technique. It does not perform bias corrections, etc.

There is no cure for small sample sizes. Bootstrap is powerful, but it’s not magic — it can only work with the information available in the original sample.

If the samples are not representative of the whole population, then bootstrap will not be very accurate.

Bootstrap statistics — how to work around limitations of simple statistical tests (2024)


What are the limitations of bootstrap statistics? ›

Limitations of Bootstrapping Statistics
  • Time-consuming: Thousands of simulated samples are needed for bootstrapping to be accurate.
  • Computationally taxing: Because bootstrapping requires thousands of samples and takes longer to complete, it also demands higher levels of computational power.

How do you do bootstrapping statistics? ›

The simplest bootstrap method involves taking the original data set of heights, and, using a computer, sampling from it to form a new sample (called a 'resample' or bootstrap sample) that is also of size N. The bootstrap sample is taken from the original by using sampling with replacement (e.g. we might 'resample' 5 ...

When should you not use bootstrapping? ›

Bootstrap is powerful, but it's not magic — it can only work with the information available in the original sample. If the samples are not representative of the whole population, then bootstrap will not be very accurate.

What is the rule of thumb for bootstrap? ›

A rule of thumb for the number of resamples needed for a reasonable bootstrap distribution is 10,000, however for the use of this class, use 500. Too few bootstrap samples can create problems for getting a good bootstrap sampling distribution.

Why is bootstrap not used anymore? ›

Bootstrap now faces a reasonable competition from similar UI kit based CSS frameworks like Foundation and Bulma. Apart from these, It also faces a severe blow from TailwindCSS. Bootstrap is always criticized for its inflexibility.

What is the problem with bootstrapping? ›

According to one strategy, bootstrapping is flawed because it can only deliver the result that the target source is reliable, regardless of whether it is reliable or not.

How many samples do you need for bootstrapping? ›

Typically, if you are going to use percentile intervals, you would need thousands of bootstrap samples to get good tail probabilities. For some methods, like the student-t method, 200 might be enough for a well characterized statistic.

What is bootstrapping in simple terms? ›

Bootstrapping is the process of founding and running a company using only personal finances or operating revenue. It is a form of financing that allows the entrepreneur to maintain more control even though it can increase financial strain.

What is bootstrapping sampling example? ›

To generate one bootstrap dataset imagine grabbing a ball out of the jar at random, noting it's diameter and then putting it back. This process of selecting balls is repeated until you have noted down a set of balls that is the same size as the original sample.

Why shouldn't you use bootstrap? ›

Bootstrap's framework has excess, non-semantic classes and it completely litters markup. If you want to create layouts that best fit the business, you don't want to go with something like Bootstrap that is strictly going to limit the capabilities on the front end.

What are the pros and cons of bootstrapping? ›

The Pros and Cons of Bootstrapping
  • PRO: Greater Focus. Bootstrapping can also take out another pressure point of many startups which is having to impress investors to raise funding. ...
  • CON: Time. ...
  • PRO: Easier Pivoting. ...
  • CON: Lack of Investor support. ...
  • PRO: You don't dilute your ownership. ...
  • CON: Personal risk.

What are the advantages of bootstrapping statistics? ›

Bootstrapping is a useful data resampling technique, especially when the sample size is small, the population distribution is unknown, or the statistic of interest is complex or non-standard. It has several advantages, such as being easy to implement and understand without requiring complex formulas or calculations.

How does bootstrap work in statistics? ›

Bootstrapping is a resampling procedure that uses data from one sample to generate a sampling distribution by repeatedly taking random samples from the known sample, with replacement.

What is the limitation of bootstrap? ›

One notable drawback is the framework's constraints on design flexibility. For instance, designers may find it challenging to create custom layouts or adjust element proportions beyond the predefined options offered by Bootstrap's grid system.

What is the difference between bootstrapping and sampling? ›

In general, bootstrap takes sample with replacement from the data of size the same as the size of the data. One obtains the usual sample by sampling from the population. A bootstrapping sample is different because one samples with replacement from the sample itself.

What are the limitations of bootstrap protocol? ›

While Bootstrap offers numerous advantages, it's important to also consider its limitations. Many websites built primarily with Bootstrap may encounter issues that can affect user experience and hinder development efficiency. One notable drawback is the framework's constraints on design flexibility.

Which of the following is a disadvantage of bootstrapping? ›

Disadvantages of bootstrapping

Increased chance of business failure: For early-stage companies, bootstrapping may not provide sufficient resources to build traction and survive beyond the startup phase. Increased risks assumed by owners: Initial funding usually comes from owners' personal savings.

What are the disadvantages of bootstrap circuit? ›

Drawback of Bootstrap Circuitry

The biggest difficulty with this circuit is that the negative voltage present at the source of the switching device during turn−off causes load current to suddenly flow in the low−side freewheeling diode, as shown in Figure 3.

Top Articles
Quick Tips - When to run on an empty stomach and when to not to
Are there washrooms or showers on VIA Rail trains? 
Will Byers X Male Reader
Po Box 7250 Sioux Falls Sd
Belle Meade Barbershop | Uncle Classic Barbershop | Nashville Barbers
T Mobile Rival Crossword Clue
25X11X10 Atv Tires Tractor Supply
Trade Chart Dave Richard
Displays settings on Mac
General Info for Parents
Oro probablemente a duna Playa e nomber Oranjestad un 200 aña pasa, pero Playa su historia ta bay hopi mas aña atras
Seattle Rpz
Viprow Golf
24 Hour Drive Thru Car Wash Near Me Contract Marriage 2
Apply for a credit card
If you bought Canned or Pouched Tuna between June 1, 2011 and July 1, 2015, you may qualify to get cash from class action settlements totaling $152.2 million
Accident On The 210 Freeway Today
Sea To Dallas Google Flights
Ppm Claims Amynta
Lisas Stamp Studio
How many days until 12 December - Calendarr
[PDF] PDF - Education Update - Free Download PDF
European Wax Center Toms River Reviews
Water Temperature Robert Moses
January 8 Jesus Calling
Accuradio Unblocked
Black Panther 2 Showtimes Near Epic Theatres Of Palm Coast
Log in or sign up to view
Ice Dodo Unblocked 76
Publix Coral Way And 147
Average weekly earnings in Great Britain
Colin Donnell Lpsg
Memberweb Bw
Metra Union Pacific West Schedule
Mississippi State baseball vs Virginia score, highlights: Bulldogs crumble in the ninth, season ends in NCAA regional
Sephora Planet Hollywood
Kelley Blue Book Recalls
Uvalde Topic
2700 Yen To Usd
Craigslist Malone New York
Locate phone number
Online-Reservierungen - Booqable Vermietungssoftware
552 Bus Schedule To Atlantic City
Fine Taladorian Cheese Platter
2000 Ford F-150 for sale - Scottsdale, AZ - craigslist
Sams La Habra Gas Price
Billings City Landfill Hours
Bones And All Showtimes Near Emagine Canton
Worlds Hardest Game Tyrone
Latest Posts
Article information

Author: Rev. Porsche Oberbrunner

Last Updated:

Views: 6470

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Rev. Porsche Oberbrunner

Birthday: 1994-06-25

Address: Suite 153 582 Lubowitz Walks, Port Alfredoborough, IN 72879-2838

Phone: +128413562823324

Job: IT Strategist

Hobby: Video gaming, Basketball, Web surfing, Book restoration, Jogging, Shooting, Fishing

Introduction: My name is Rev. Porsche Oberbrunner, I am a zany, graceful, talented, witty, determined, shiny, enchanting person who loves writing and wants to share my knowledge and understanding with you.