# Example of testing for a difference in medians using bootstrapping

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

previewers = pd.read_csv('2021-04-01 image preset previewed.csv')
previewers.head()

let's see if there's a difference in presets applied by platform

previewers.groupby(['os']).mean()

previewers.groupby(['os']).median()

observationally there's some difference in the mean and median but not sure if it's statistically significant. we can check though

testing the means (welch's t test)

ios = previewers[previewers['os'] == 'ios']['row_count']
android = previewers[previewers['os'] == 'android']['row_count']
stats.ttest_ind(ios, android, equal_var=False)

## Generating the data for the median test

# set some parameters for the bootstrapping
# in this example, I want to obtain 2000 samples per os, drawing 200 observations each time
np.random.seed(48602)
samples = 2000
draws = 200

this generates a data frame of the samples based on the attributes we define

ios_sample = []
for i in range(samples):
ios_sample += [ios.sample(draws, replace=True).median()]
ios_sample = pd.DataFrame(ios_sample)
android_sample = []
for i in range(samples):
android_sample += [android.sample(draws, replace=True).median()]
android_sample = pd.DataFrame(android_sample)

the distributions are normalish looking (play around with the number of samples and/or number of draws to see how the histograms change

this also a great example of how the CLT works

fig,axs = plt.subplots(1,2)
axs[0].hist(android_sample, bins=range(5,15))
axs[0].set_title('android')
axs[1].hist(ios_sample, bins=range(5,15))
axs[1].set_title('ios')
plt.show()

from here we can just run a regular t test on the average median from the sampled data

print(android_sample.mean())
print(ios_sample.mean())

```
0 9.67525
dtype: float64
0 8.472
dtype: float64
```

stats.ttest_ind(ios_sample, android_sample, equal_var=False)