# 6A: Respect: Does it make a difference?

## 1.0 - The Research Question

Do you think you would be more likely to do something if the person asking was being respectful, even if the thing they are asking you to do is unpleasant?

That’s what some researchers (Yeager, Hirschi, and Josephs) wanted to know. They had a hypothesis: Respectful instructions will make people (especially adolescents) more likely to follow the medical advice of their doctor. What do you think about this hypothesis? How would we write this idea as a word equation?

## The Study

## 2.0 - The Data

For other reasons, the researchers also had access to some other information about these participants (e.g., their baseline testosterone from a saliva sample, survey responses to narcissism, openness, etc.).

Here are the variables in the data frame `respectstudy`

:

`subject`

ID for the participant.`respect_condition`

Whether the participant watched the "Respect" or "No Respect" video`testosterone`

The measure of the testosterone via the saliva sample`spoon1_before`

The mass (grams) of the vegemite in the 1st spoon before it was given.`spoon1_after`

The mass (grams) of the vegemite in the 1st spoon after it was given.`spoon2_before`

The mass (grams) of the vegemite in the 2nd spoon before it was given.`spoon2_after`

The mass (grams) of the vegemite in the 2nd spoon after it was given.`ravens_correct`

Number of items answered correctly in a Standardized Progressive Matrix`openness`

Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)`conscientious`

Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)`extraversion`

Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)`agreeable`

Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)`narcissism`

Score from the Big Five personality test (OCEAN), 1 (low) - 7 (high)`emotion_stability`

Average score from questions regarding emotional stability, 1 (low) - 7 (high)`reactance`

Average score from questions regarding reactance, 1 (low) - 7 (high)`subjective_power`

Average score from questions regarding subjective power, 1 (low) - 7 (high)`aggressive_right_now`

“How aggressive are you feeling right now?”, 1 (low) - 7 (high)`sex_drive_right_now`

“How high is your sex drive right now?”, 1 (low) - 7 (high)`campus_greek`

Are you in a fraternity or sorority?`status`

Average score from questions regarding personal status, 1 (low) - 7 (high)`competent`

Average score from questions regarding self-competence, 1 (low) - 7 (high)`autonomous`

Average score from questions regarding autonomy, 1 (low) - 7 (high)`respectful`

Measure of how respectful they felt the researcher was, 1 (low) - 7 (high)

2.1 - Identify the variables that are most relevant to the respect hypothesis. What would be the outcome variable? Is there something like "amount of vegemite eaten after watching the video" in this list of variables? How could we create such a variable in our data frame?

2.2 - Which is the explanatory variable? Should we use `respect_condition`

or `respectful`

to explore the
researchers' hypothesis? Why?

## 3.0 - Exploring Variation

3.1 - Explore the variation in the outcome variable using a plot or graph.

3.2 - Also make some visualizations to explore the respect hypothesis.

3.3 - What does a value of zero mean on this outcome variable? What does a high value mean?

3.4 - Did respect make a difference in how much Vegemite the participants ate after watching the video? Make an argument for EACH side with a partner (based on the data).

- Some reasons respect did make a difference:
- Some reasons respect did not make a difference:

3.5 - Could this pattern of data be the result of randomness?

## 4.0 - Creating Some Simple Models

4.1 - If we used the mean as our empty model to predict how much vegemite someone in this study would eat (regardless of condition), what would we have predicted? How many people would we have predicted correctly?

4.2 - The mean might be a terrible model for this data. Why does it seem so terrible?

4.3 - If we used the mode as our empty model to predict how much vegemite someone in this study would eat, what would we predict? How many people would we have predicted correctly?

4.4 - Use `favstats`

to put the mean into your visualization (as a blue line). Also put the mode into your visualization (as a green line).

4.5 - Even though we would have predicted more people correctly using the mode, why might the mean still be a useful model for this data?

## 5.0 - Error from the Models

5.1 - Here is how we might write the mode as an empty model in GLM notation:

$$Y_i = 0 + e_i$$

Modify this copy below to write the mean (the number) as the empty model:

$$Y_i = 0 + e_i$$

5.2 - Let's imagine we are going to use the mode as our model. Take a look at the first student in the sample. How would you represent that student in GLM format? What would be that student's DATA? MODEL? ERROR?

5.3 - In the visualization you made above, where would the residuals ($$e_i$$) for each model be? Which model "balances" the residuals?

5.4 - If we calculated the sum of squared residuals off the mean versus off the mode, which model would have the lower SS? Make your prediction then try calculating these with R. (Note: There are easy functions to do that for the mean but not for the mode. But you can create a column of residuals from 0, square them, and add them up.)

5.5 - To decide which is a better model, we always have to make explicit how we are measuring "error" (how off the model is). One way of measuring error is to count up how many predictions were correct versus incorrect. Which model minimizes that kind of error? What kind of error does the other model minimize?

## 6.0 - Closing Thoughts

For reasons that are a bit opaque now, in statistics, we really value this **Sum of Squares** as a measure of error.
As we progress through the course, we will continue to learn more about its special properties.
Squaring, as odd as it seems right now, will allow us to do some cool stuff in the following chapters.

Here are some questions to think about:

6.1 - Think about the way we represent our models: DATA = MODEL + ERROR. Could there be a model so good that there is no need to add “ERROR” to it?

6.2 - If we shuffled this `respectstudy`

data such that the vegemite eaten would be randomly categorized into two groups, would the empty model for that data change? Why or why not? Why is the empty model a good way to represent a random DGP?