Name: Eric Rojas

Name: Christine Park

Name: Maya Pakulski

Name: Alison Perkins

Name: Sydney Rood

Name:Prisca Osuji

# Research Question

Use the space to briefly describe the broad research question you're hoping to address in this project.

- How does stress affect your happiness? (scale of 1-4)

Using the Emerging Adulthood data from the Journal of Open Psychology, we hope to address the effect of stress on subjective well being. Subjective well being (SWBScale) was measured by averaging the scores on a questionnaire, where higher scores represented better well being. The Perceived Stress Scale (Stress) averaged scores on a global measure of perceived stress, where higher scores demonstrated higher levels of stress.

# Simple Model

a) Word Equation

Happiness = Mean + Error

Here we are looking at the levels of happiness (SWBscale) based off the mean happiness level from our sample and the error for each individual.

b) GLM Equation

Yi = b0 + ei

Here we wrote out the equation for our model. B0 represents the mean level of happiness in our sample and ei represents the error.

c) Visualizations

Histogram v Boxplot: With both of these visualizations it is easy to tell where the center of the data is, and where outliers are roughly located. Both give information about the distribution of stress levels in the population.

However, the boxplot shows specific values of each quartile and outliers, and does not give much specific information about the shape of distribution. The histogram does not show exact numbers, but gives a more visually descriptive representation of the shape of distribution.

d) Fit a linear model

Yi = 4.47 + ei

The value 4.47 is our b0 variable from our GLM equation. 4.47 represents the mean happiness level using a simple model.

e) Quantifying Error (ANOVA)

From our anova table, we can see that our sums of squares value is 5450.747 which is the sum of all the residuals squared from our model. A larger number like this shows that we have a large amount of error in our model and the model does not explain our data that well. However it is better to compare F, PRE or P-values to really tell which model is better and how much error each one has.

f) Visualization of sampling variability

A population called "happinesspop" is created using rnorm function using a mean of 4.47 and a standard deviation of 1.32. This graph is the approximate sample distribution of samples of n=3134. The sample distribution show how much the means tend to vary after drawing 10000 samples of n=3134 from the population "happinesspop". This is a unimodal graph with a normal distribution.

g) Numeric description of sampling variability

The sampling variability is demonstrated by the standard deviation of 0.0197. Since the value is fairly close to zero, there is a low sampling variability.

h) Confidence intervals

Here we see that our 95% confidence interval for our mean happiness level ( SWBscale) is (4.423,4.516). This is the set of parameter values for which our sample mean would be considered likely. We found of sample mean to be 4.485 which is inside our confidence interval so we can conclude that our sample mean could be likely to be our population mean.

j) Conclusion:

From all of our data for our simple model we can determine that our sample mean for happiness without considering stress is 4.485. Our empty model was Yi = 4.47 + ei and by looking at the visualizations you can roughly see the spread of our mean values and that our values are relatively normally distributed and not skewed one way or the other. Our sample mean was within our confidence interval so it could be a possible value for our population mean, however from our anova table we can see that there is still much error associated with the model so we would want to compare it to a more complicated one to see if we can explain more of the variation before saying that our sample came from a population based off this model.

# Qualitative Predictor Model

a) Word Equation

Happiness= Stress + Error

Here we are still predicting level of happiness measured on the SWBscale in response to stress levels with the error accounted for as well.

b) GLM Equation

Yi= b0 +b1Xi +ei

b0 is the mean level of happiness when the grouping variable is 0. b1 will be the difference between the two means of our groups. Xi will be stress level grouping variable, where 0 is the low stress group and 1 is the high stress group. ei will be the error from the mean of each group to the actual value for each person.

c) Visualizations

Histogram v Boxplot: With both of these visualizations again, the center and outliers are easily detectable. Both display distribution of stress levels in the population visually.

The boxplot shows specific values for each quartile and outlier, and does not give much specific information about the shape of distribution. Contrary to this, the histogram does not show exact numbers, but gives a more visually descriptive representation of the shape of distribution.

d) Fit a linear model

Yi = 5.044 + (-1.149)Xi + ei

The b0 value is 5.044 and represents the mean happiness level of the low stress group. The b1 value is -1.149 and represents the increment to add on to predict the happiness level of the high stress group. As you can see, the happiness level is lower for the high stress group.

e) Quantifying Error

From our supernova table we can determine that based off the F statistic that 728.9436 times the error is reduced by our additional parameter compared to other parameters we could have chosen. Our sums of squares has also been reduced, from 5450.747 to 4405.295, showing that this model has less variation that then simple model.

f) Visualization of sampling variability

Through this visualization we can see that none of the reshuffled b1 values obtained were lower than our original b1. Further, it is also important to note that it is centered around 0 due to shuffling process occurring.

g) Numeric description of sampling variability

Through this numeric description we are able to see that the b1 values from a reshuffled dataset only ranged from 0.05 to 0.06, showing very little variability between them.

h) Confidence intervals

Using the 95% confidence interval of -1.233 to -1.066, we can conclude that the value 'zero' is not contained within the interval. This indicates that a simple model with a b1 of 0 is unlikely.

i) Model Comparison

From our supernova table we can determine that based off the PRE value 0.1895, 18.95% of the empty models leftover sum of squares is explained by stress broken down into "happy" and "sad" variables. We also know from the F statistic that 728.994 of the error is reduced by our additional parameter compared to other parameters we could have chosen.

j) Conclusion

Using the data from the Qualitative Predictor model, we can see that the sample mean for happiness considering stress was -1.149. The Qualitative model was Yi = 5.044 -1.149Xi + ei.

Looking at part c), we can see that the spread of our data is fairly normal for the 'happy' group, and our values are not skewed in either direction. On the other hand, our data for the 'sad' group shows a slight left skew.

The 95% confidence interval for our data set was between -1.233 and -1.066, and our sample mean of -1.149 falls within this interval. this indicates that -1.149 could be the value for our population mean. However from our anova table, we can see that the sum of squares value is about 4405. Given this is the sum of all the residuals squared in our model, this larger number indcates that there is a decent amount of error present.

Looking at our supernova table for our model, we can see that the PRE value was 0.1895, indicating only 18.95% of the empty model's leftover sum of squares can be explained by stress broken down into two groups: 'high_stress' and 'low_sress'.

# Quantitative Predictor Model

a) Word Equation

Happiness= Stress + Error

We are predicting happiness levels based off stress levels while accounting for error.

b) GLM Equation

Yi=b0+ b1Xi +e1

b0 in this case will be happiness when stress is 0. b1 will be the increment to add on for each unit of stress. Xi will be stress level. ei will the the error from the models prediction to the actual value

c) Visualizations

Jitter v Point: Both of these visualizations show very similar plots that display each individual point plotted against stress and a regression line that best fits the data distribution for the model. Both of these visualizations make it easier to judge distribution and spread. In terms of their differences, the jitter plot seems to show more variability by not stacking points as the point plot does.

d) Fit a linear model

Yi = 7.866 + (-1.108)Xi + ei

7.866 will be the predicted score for someone who scores 0 on the stress scale. -1.108 signifiys that when stress differs by 1 happiness level will decrease by 1.108. This shows that as stress increases, the predicted happiness will decrease.

e) Quantifying Error (ANOVA)

From our supernova table we can determine that based off the PRE value 0.3063 of the empty models leftover sum of squares is explained by stress. We also know from the F statistic that 1377 times the error is reduced by our additional parameter compared to other parameters we could have chosen.

f) Visualization of sampling variability

Here we can see our sample variability in a histogram. We can tell that our sampled b1's are greater than our original b1 almost 50% of the time. Our original b1 is very close to the center of the graph and the graph looks relatively uniform.

g) Numeric description of sampling variability

Through this numeric description we are able to see that the b1 values from a reshuffled dataset only ranged from 4.38 to 4.56, showing very little variability between them.

h) Confidence intervals

From our 95% confidence interval of (-1.167, -1.050) we can tell that zero is not contained, so we can rule out a simple model with a b1 of 0 as it is not likely.

i) Model Comparison

Through a comparison of each of the models' supernova tables, we can see that our Quantitative model is better at explaining variability in happiness. We can see that the quantmodel explains more variation in the empty model per degree of freedom (F=1377) as compared to the QualMod (F=728.94). Additionally, our Quantitative model has a PRE of 0.3063 and the Qualitative model has a PRE of 0.1895. This means that about 30% of the variation in the empty model is explained by the quantmodel as compared to only about 19% of the variation being explained by our QualMod.

j) Conclusion

Using the data from the Quantitative Predictor model, we can see that the sample mean for happiness considering stress was -1.108. The Quantitative model was Yi = 7.866 -1.108Xi + ei.

Looking at part c), we can see that the points on each plot hug the residual line about the same amount with the same level of variation judging by a simple variation.

The 95% confidence interval for our data set was between -1.167 and -1.050, and our sample mean of -1.108 falls within this interval. this indicates that -1.108 could be the value for our population mean. However from our supernova table, we can see that the sum of squares value is about 1665. Given this is the sum of all the residuals squared in our model, this larger number indcates that there is a decent amount of error present.

Looking at our supernova table for our model, we can see that the PRE value was 0.1895, indicating only 18.95% of the empty model's leftover sum of squares can be explained by stress observed quantitatively.