Hypothesis Testing Homework
This homework covers some basic concepts of hypothesis testing from the lecture on 3/19. It will be due Monday April 3rd before our lecture! If you need to refer back to the content, here are the slides: https://docs.google.com/presentation/d/1eAmKZgilx7eXC9nSOi5FlHE4UYjLGZh5dkq6IGP9jRM/edit#slide=id.g20929d2de64_0_0
Question 1:
Paul is rolling a dice, and is not sure if it is a fair dice or not (in a fair dice, you have a 1/6th chance of rolling each number). He rolls it 12 times and rolls 1 10 of the times. He wants to determine if the dice is actually unfair and that there is a greater chance of rolling a 1 than other numbers or if this is due to random chance. What is a possible null and alternate hypothesis if Paul were to run a hypothesis test?
Hint: Usually, we use the variable p to write a null or alternate hypothesis test. For instance if our null hypothesis is that there is a 50% chance of flipping a coin and getting heads we would denote this as "p = 0.5" and if our alternate is that there is less than a 50% chance of getting heads we would denote this as "p < 0.50."
Question 2:
Manas and Suparna are playing a coin flip game and are keeping track of the number of heads and tails they get. Out of 10 flips, Manas gets only 2 tails. Manas says the coin is rigged and Suparna says this is due to random chance, so they decide to run a hypothesis test to settle this debate.
The following code goes through the hypothesis testing for this scenario, please do not change it since it is related to the question.
Now, calculate the p-value for the following test.
Hint 1: Our p-value should calculate the proportion of test statistics that are less than or equal to the proportion of tails we got in our observed test statistic.
Hint 2: np.count_nonzero counts the number of non-zero values in an array. Since we are passing in a boolean array into np.count_nonzero, this is supposed to calculate the number of test stats less than or equal to the observed stat.
Hint 3: Once we calculate the number of test stats less than or equal to the observed, we need to get the proportion of this out of all the simulations, so what would you divide this by?