Example of testing for a difference in medians using bootstrapping

import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats

previewers = pd.read_csv('2021-04-01 image preset previewed.csv') previewers.head()

let's see if there's a difference in presets applied by platform

previewers.groupby(['os']).mean()

previewers.groupby(['os']).median()

observationally there's some difference in the mean and median but not sure if it's statistically significant. we can check though

testing the means (welch's t test)

ios = previewers[previewers['os'] == 'ios']['row_count'] android = previewers[previewers['os'] == 'android']['row_count'] stats.ttest_ind(ios, android, equal_var=False)

Generating the data for the median test

# set some parameters for the bootstrapping # in this example, I want to obtain 2000 samples per os, drawing 200 observations each time np.random.seed(48602) samples = 2000 draws = 200

this generates a data frame of the samples based on the attributes we define

ios_sample = [] for i in range(samples): ios_sample += [ios.sample(draws, replace=True).median()] ios_sample = pd.DataFrame(ios_sample) android_sample = [] for i in range(samples): android_sample += [android.sample(draws, replace=True).median()] android_sample = pd.DataFrame(android_sample)

the distributions are normalish looking (play around with the number of samples and/or number of draws to see how the histograms change

this also a great example of how the CLT works

fig,axs = plt.subplots(1,2) axs[0].hist(android_sample, bins=range(5,15)) axs[0].set_title('android') axs[1].hist(ios_sample, bins=range(5,15)) axs[1].set_title('ios') plt.show()

from here we can just run a regular t test on the average median from the sampled data

print(android_sample.mean()) print(ios_sample.mean())

stats.ttest_ind(ios_sample, android_sample, equal_var=False)

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Example of testing for a difference in medians using bootstrapping

Generating the data for the median test

Example of testing for a difference in medians using bootstrapping