6A: Respect: Does it make a difference?
1.0 - The Research Question
Do you think you would be more likely to do something if the person asking was being respectful, even if the thing they are asking you to do is unpleasant?
That’s what some researchers (Yeager, Hirschi, and Josephs) wanted to know. They had a hypothesis: Respectful instructions will make people (especially adolescents) more likely to follow the medical advice of their doctor. What do you think about this hypothesis? How would we write this idea as a word equation?
2.0 - The Data
For other reasons, the researchers also had access to some other information about these participants (e.g., their baseline testosterone from a saliva sample, survey responses to narcissism, openness, etc.).
Here are the variables in the data frame
subjectID for the participant.
respect_conditionWhether the participant watched the "Respect" or "No Respect" video
testosteroneThe measure of the testosterone via the saliva sample
spoon1_beforeThe mass (grams) of the vegemite in the 1st spoon before it was given.
spoon1_afterThe mass (grams) of the vegemite in the 1st spoon after it was given.
spoon2_beforeThe mass (grams) of the vegemite in the 2nd spoon before it was given.
spoon2_afterThe mass (grams) of the vegemite in the 2nd spoon after it was given.
ravens_correctNumber of items answered correctly in a Standardized Progressive Matrix
opennessScore from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
conscientiousScore from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
extraversionScore from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
agreeableScore from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
narcissismScore from the Big Five personality test (OCEAN), 1 (low) - 7 (high)
emotion_stabilityAverage score from questions regarding emotional stability, 1 (low) - 7 (high)
reactanceAverage score from questions regarding reactance, 1 (low) - 7 (high)
subjective_powerAverage score from questions regarding subjective power, 1 (low) - 7 (high)
aggressive_right_now“How aggressive are you feeling right now?”, 1 (low) - 7 (high)
sex_drive_right_now“How high is your sex drive right now?”, 1 (low) - 7 (high)
campus_greekAre you in a fraternity or sorority?
statusAverage score from questions regarding personal status, 1 (low) - 7 (high)
competentAverage score from questions regarding self-competence, 1 (low) - 7 (high)
autonomousAverage score from questions regarding autonomy, 1 (low) - 7 (high)
respectfulMeasure of how respectful they felt the researcher was, 1 (low) - 7 (high)
2.1 - Identify the variables that are most relevant to the respect hypothesis. What would be the outcome variable? Is there something like "amount of vegemite eaten after watching the video" in this list of variables? How could we create such a variable in our data frame?
2.2 - Which is the explanatory variable? Should we use
respectful to explore the
researchers' hypothesis? Why?
3.0 - Exploring Variation
3.1 - Explore the variation in the outcome variable using a plot or graph.
3.2 - Also make some visualizations to explore the respect hypothesis.
3.3 - What does a value of zero mean on this outcome variable? What does a high value mean?
3.4 - Did respect make a difference in how much Vegemite the participants ate after watching the video? Make an argument for EACH side with a partner (based on the data).
- Some reasons respect did make a difference:
- Some reasons respect did not make a difference:
3.5 - Could this pattern of data be the result of randomness?
4.0 - Creating Some Simple Models
4.1 - If we used the mean as our empty model to predict how much vegemite someone in this study would eat (regardless of condition), what would we have predicted? How many people would we have predicted correctly?
4.2 - The mean might be a terrible model for this data. Why does it seem so terrible?
4.3 - If we used the mode as our empty model to predict how much vegemite someone in this study would eat, what would we predict? How many people would we have predicted correctly?
4.4 - Use
favstats to put the mean into your visualization (as a blue line). Also put the mode into your visualization (as a green line).
4.5 - Even though we would have predicted more people correctly using the mode, why might the mean still be a useful model for this data?
5.0 - Error from the Models
5.1 - Here is how we might write the mode as an empty model in GLM notation:
$$Y_i = 0 + e_i$$
Modify this copy below to write the mean (the number) as the empty model:
$$Y_i = 0 + e_i$$
5.2 - Let's imagine we are going to use the mode as our model. Take a look at the first student in the sample. How would you represent that student in GLM format? What would be that student's DATA? MODEL? ERROR?
5.3 - In the visualization you made above, where would the residuals ($$e_i$$) for each model be? Which model "balances" the residuals?
5.4 - If we calculated the sum of squared residuals off the mean versus off the mode, which model would have the lower SS? Make your prediction then try calculating these with R. (Note: There are easy functions to do that for the mean but not for the mode. But you can create a column of residuals from 0, square them, and add them up.)
5.5 - To decide which is a better model, we always have to make explicit how we are measuring "error" (how off the model is). One way of measuring error is to count up how many predictions were correct versus incorrect. Which model minimizes that kind of error? What kind of error does the other model minimize?
6.0 - Closing Thoughts
For reasons that are a bit opaque now, in statistics, we really value this Sum of Squares as a measure of error. As we progress through the course, we will continue to learn more about its special properties. Squaring, as odd as it seems right now, will allow us to do some cool stuff in the following chapters.
Here are some questions to think about:
6.1 - Think about the way we represent our models: DATA = MODEL + ERROR. Could there be a model so good that there is no need to add “ERROR” to it?
6.2 - If we shuffled this
respectstudy data such that the vegemite eaten would be randomly categorized into two groups, would the empty model for that data change? Why or why not? Why is the empty model a good way to represent a random DGP?