Name:

Name:

Name:

Name:

Name:

Name:

**If your group is working in DeepNote, please download your Jupyter Notebook as a .ipynb file and upload it to Gradescope to submit it. Remember to add your groupmates on the Gradescope submission!**

# Shuffling to do Model Comparison

## Model Comparison

**Goal:** Decide between two models (simple and complex) and ask: *Which DGP/population is our data more likley to have come from?*

- Simple model: $$X$$ and $$Y$$ are
*unrelated* - Complex model: $$X$$ and $$Y$$ are
*related*

## Shuffling Definition

- A resampling method that creates samples where each X and Y are unrelated (from the simple model)
- This is achieved by randomly assigning each person's $$Y$$ to a different $$X$$, or vice versa.

- Also is another way to generate the distributions of an $$b$$, F, or PRE statistics
*But*can only be applied assuming the simple model is true- Today we will focus on the sampling distribution of $$b_1$$

- Cannot be used for confidence intervals,
*only model comparison*- We used the bootstrap resampling method last week to compute confidence intervals.

- Also known as,
*permutation test*.- Permutation is
*the act of arranging the members of a set into a sequence or order.*- In the picture above, we are permutating (or rearranging) typing speed scores to pair them with different condition values.

- Permutation is

**The purpose of shuffling is to create samples where the X and Y variables are unrelated.**

**Creating***many*shuffled samples eventually generates a sampling distribution of your coefficient of interest that assumes the simple model is true.

## Shuffling Steps

- Create many shuffled datasets to generate samples from the simple model (i.e. samples where $$X$$ and $$Y$$ are unrelated)
- Estimate the coefficient of interest ($$b_1$$, F, or PRE) for each new sample
- Collecting many estimates creates a sampling distribution of the coefficient!

- Compare the original observed coefficient to the sampling distribution to calculate a p-value (using the
`tally`

function) - Compare p-value to $\alpha$ to either reject or fail to reject the simple model
- If p ≤ $\alpha$, reject simple model
- If p > $\alpha$, fail to reject simple model

# Typing Speeds on Phones versus Computers

Before today you completed the typing speed experiment (link here) where you completed a typing speed test on either a phone or a computer and entered your results in a Google Sheet.

Today we will learn about how to use shuffling to evaluate the relationship between typing speed and device used for a typing test.

# Typing Speeds on Phones versus Computers

This week you learned how implement the model comparison process to choose between a simple and complex model and compute p-values from sampling distributions using simulation or the theoretical F-distribution

Before today you completed the typing speed experiment where you completed a typing speed test on either a phone or a computer and entered your results in a Google Sheet.

Today we will learn about how to use shuffling to evaluate the relationship between typing speed and device used for a typing test.

## Do people type at different speeds when they are on a computer vs. phone?

Here, we create a new dataset where typing `Speed`

is shuffled, but `Condition`

remains in the same.

What are some observations/differences between the unshuffled and shuffled datasets? **(type some of your observations in the chat)**

If we were to create many shuffled datasets and estimate $$b_1$$ for each dataset, we would have a sampling distribution of $$b_1$$ that operates under the assumption that *the population looks like the simple model*.

This sounds very similar to bootstrapping, but its different! Bootstrapping operates under the assumption that *the population looks like our data*.

Below is some code to show what a bootstrapped sample looks like compared to a shuffled same. Notice how

## Typing Data Variables

`Condition`

: What type of device the typing test was taken on (`Phone`

or`Computer`

)`TimeFactor`

: Length of time for the typing test (`1 minute`

,`2 minute`

,`3 minute`

)`Speed`

: The number of words per minute (WPM) from a typing test`Errors`

: The number of typing errors/mistyped words from a typing test`AdjSpeed`

: Typing speed adjusted for number of errors (`AdjSpeed`

=`Speed`

-`Errors`

)

# Section 1

1.1 Write out the word equation that will help answer the question **"Do people type at different speeds when they are on a computer vs. phone?"**

**Word equation: **

1.2. Create a data frame of your group's data with the `Condition`

variable and `Speed`

variable.

1.2 Estimate a linear model using your groups data where typing speed is predicted by device used.

1.3 Write a one sentence interpretation for your $$b_1$$ coefficient using values from your output that someone outside of Psych 100A could understand.

**Interpretation here:**

1.4 In model comparison, we ask "What if the simple model is true in the population?" Write out the GLM equation for the simple model and the complex model:

**Simple model equation:** $$Y_i = $$

**Complex model equation:** $$Y_i = $$

1.5 Notice that the simple model is a *version* of the complex model where $$\beta_1 = 0$$ (it's a simpler version!). Fill in the blank to complete the interpretation of $$\beta_1=0$$ using your predictor and chosen outcome variable:

- If $$\beta_1 = 0$$ in the population, then knowing a person's
**BLANK**does not help us guess their**BLANK**.

If knowing someone's X does not help us predict their Y, then exact pairings between X and Y shouldnt matter since we are assuming that X *does't* help us predit Y in the population. We will use this idea and the `shuffle()`

function to create a sampling distribution of $$b_1$$s where `Condition`

and `Speed`

are unrelated.

We can do this by shuffling our `Speed`

variable and re-estimating out linear model!

1.6 The code below creates a new dataset that shuffles the `Speed`

variable. Run the code to look at our new shuffled dataset and compare it to the original dataset.

What do you notice about this dataset compared to the dataset from 1.2?

1.7 Estimate a linear model where `Conditon`

predicts `Speed`

using `GroupShuffleData`

. What is the new $$b_1$$ coefficient? Is it the same as your original $$b_1$$ estimate?

We don't need to separately create a new dataset before estimating the linear model again! The code below uses the `shuffle()`

function inside the `lm()`

function to shuffle and estimate the linear model all at once. Try running the next code block 3 times and see what happens (you should get a different $$b_1$$ each time!)

1.8 Now we are going to repeat 1.6 and 1.7 at least 10 times, saving each of our $$b_1$$ estimates in a new data frame. Modify the following code to shuffle `Speed`

and estimate and collect $$b_1$$s from at least 10 shuffled samples.

1.9 Lets compare the $$b_1$$ estimates from our shuffled datasets to the $$b_1$$ estimate from our original, unshuffled dataset. Fill in the blanks to create a histogram of the $$b_1$$ estimates from the shuffled dataset (`b1GroupShuffle`

) and add a vertical line to reprecent your original $$b_1$$ estimate.

1.10 Is your sampling distribution from 1.9 centered around the estimate from the original sample? Explain why this distribution is or is not centered around the original estimate.

1.11 Based on your sampling distribution from 1.9, does it seem likely that your group's data was generated from a DGP where the simple model is true? Why or why not?

## Return to Main Room

In section one, you estimated a $$b_1$$ coefficient from your unshuffled data and a $$b_1$$ from a shuffled dataset.

- Was the $$b_1$$ from the shuffled dataset the same as your original $$b_1$$?
- Was the new $$b_1$$ greater or less than zero?

Then, you created a *small* sampling distribution of $$b_1$$s using your group's typing data that assumes the simple model is true (i.e. it assumes that `Condition`

and typing `Speed`

are unrelated). It probably looks a little sparse, but we can still make some observations about it!

- If you were to draw a line for the center of your group's sampling distribution,
*approximately*where would that fall on the x-axis? - Is your original estimate (the vertical line in the histogram) the same as the center of the sampling distribution?
- The distance between the center of the sampling distribution and the original estimate provides us with some evidence about which population/DGP our data is likely to have come from.

Now lets do this whole process again with our entire dataset called `TypingData`

!

# Section 2

2.1 Generate, save, and print a linear model with `Condition`

predicting `Speed`

using the entire class dataset (`TypingData`

).

2.2 Use the `shuffle()`

fucntion to shuffle the `Speed`

variable and re-estimate your linear model with your shuffled outcome variable.

2.3 Is your shuffled $$b1$$ from 2.2 the same as your original $$b1$$ from 2.1?

2.4 Use the `do()`

and `shuffle()`

functions to create a sampling distribution of $$b_1$$ estimates from 1000 shuffled datasets.

2.5 Create a histogram from the results of `b1Shuffle`

and add a vertical line for the $$b1$$ from your `UnshuffledModel`

.

2.6 Using your results from 2.5, does it seem likely that the class' data was generated from a DGP where the simple model is true? Why or why not?

2.7 Notice that the original $$b_1$$ estimate is in the left tail of the histogram. Use the `tally`

function to calculate the probability that a shuffled sample had an $$b_1$$ coefficient *less than* the $$b_1$$ coefficient from your original sample. Assume we are using an $$\alpha = .05$$.

Based on this probability, do you think the class' data came from a DGP where the simple model is true? Is this conclusion the same as your conclusion from 2.6?

2.8 Using the histogram from 2.5 and your answers from 2.6 and 2.7, how would you answer the original research question: Do people type at different speeds when they are on a computer vs. phone?

**If your group is working in DeepNote, please download your Jupyter Notebook as a .ipynb file and upload it to Gradescope to submit it. Remember to add your groupmates on the Gradescope submission!**

# Section 3

What if we were interested in the F-value instead of $$b_1$$? The code below produces`supernova`

output from your original data. We can also use the `fVal()`

function to extract the F-value from the supernova results (similar to the `b0()`

and `b1()`

functions).

3.1 Use the `do()`

function in combination with the `fVal()`

function to create a sampling distribution of F-values from 1000 shuffled datasets.

3.2 Create a histogram of the F-values from `fValShuffle`

and insert a vline for the F-value from your original sample `UnshuffledModel`

. Based on this histogram, do you think the class' data came from a DGP where the simple model is true? Why or why not?

3.3 Use the tally function to calculate the probability that a shuffled sample in `fValShuffle`

had an F-value *greater than* the F-value from your original sample. Based on this proportion, do you think the class' data came from a DGP where the simple model is true? Is this conclusion the same as your conclusion from 2.2?

3.4 Write an interpretation of the probability from 2.3 (also called a p-value) that a family member or friend not enrolled in Psych 100A could understand.

3.5 How does this probability compare to the p-value in the supernova table? Is it similar? Should it be?

It can be easy to get confused between *shuffling* and *bootstrapping*. Remember from last week that bootstrapping is a resampling method that assumes that the population looks like the observed sample. In contrast, shuffling is a resampling method that assumes that the simple model is the DGP in the population.

3.6 What are ** 2 similarities** between bootstrapping and shuffling?

3.7 What are **2 differences** betweeen bootstrapping and shuffling?

3.8 Imagine we had bootstrapped the sampling distribution of F-values or $$b_1$$. Do you think it would have looked similar or different than the shuffled sampling distributions from today? Will this always be the case?

3.9 Throughout this discussion, we shuffled `Speed`

, but we could have shuffled condition. Do you think this would make a difference in our conclusions?

3.10 Based on our F-values from shuffled datasets, how would you answer our research question: Do people type at different speeds when they are on a computer vs. phone?