Name: Max Potter

Name: Sara Pozil

Name: Jason Pruett

Name: Philip Raj

Name: Hinano Negishi

Name: Darartu Mulugeta

# Research Question

Use the space to briefly describe the broad research question you're hoping to address in this project.

Our research question is about seeing if there is a correlation between worse subjective well being and emerging adulthood ages, around 18-25 years old.

# Simple Model

a) Word Equation

Subjective Well Being = Mean of well being + Other Stuff

`Subjective well being scores are explained by the mean of wellbeing of the whole dataset plus other stuff`

b) GLM Equation

Y (i) = b (0) + e (i)

Individual well being is equal to mean well being plus error

c) Visualizations

The histogram gives us a better idea of the spread and variability of the data as a whole, showing where the average sits in relation to the data. The boxplot shows spread, however, it is limited to giving us more of an idea of the 5-number summary of the data, with the range, Q1, median, and Q3 more clearly marked than in the histogram.

d) Fit a linear model

The linear model from the empty model is telling us that the predicted average SWB scale is 4.47. With this information, the GLM notation would be Y (i) = 4.47 + e (i)

e) Quantifying Error (ANOVA)

This simple model is not explaining any variability in the sample, as it is only measuring one variable so we cannot quanitfy PRE or error around the simple model besides the residuals.

f) Visualization of sampling variability

The sampling variability appears to be normally distributed around the empty model mean, 4.47

g) Numeric description of sampling variability

The distribution of means from the new samples also has their mean at roughly 4.47, with an overall range extending from 4.392 to 4.544. This distribution has a small standard deviation of .02, meaning roughly 95% of the data should be contained with +/- .04 from the mean.

h) Confidence intervals

The confidence intervals are what were expected from the favstats table above. 95% of all data falls between 4.423 and 4.515

i) Model Comparison -- N/A for simple model

j) Conclusion

This is not a good model for our question, as we wanted to see if there was a correlation between being an emerging adult and having lower well being scale values.

# Qualitative Predictor Model

a) Word Equation

Subjective Well Being = average of Subjective wellbeing + Emerging Adult (Yes / No) + other-stuff

A person's subjective well being is determined by the average wellbeing, their status as an emerging adult, plus other stuff.

b) GLM Equation

Y (i) = b (0) + b(1)*Xi + e (i)

SWBscale is equal to mean of SWB scale plus the increment to add on for emerging adult (Xi=0 or 1), + error

c) Visualizations

The histogram is a good visualization in relation to how many emerging adults vs non-emerging adults there are in the sample, but overall have pretty similar distributions and appear to have similar means, as exemplified in the boxplot.

d) Fit a linear model

SWBscale = 4.5523 - 0.1292(Xi) + ei

This linear model shows that non-emerging adults have a predicted SWBscale score of 4.5523, with emerging adults having a slightly lower predicted score (4.5523 - .1292). This shows a slight difference between the two groups, with emerging adults having lower well-being scores.

e) Quantifying Error (ANOVA)

An F value of 6.939 means that our model explains 6.939 units of variability per degree of freedom spent, however, we have a pretty low PRE score of .008, meaning a small proportion (.008) of the variation in SWBscale is explained by EA_status. Our model reduces 12.05748 of the Sum of Squares error, which is relatively small compared to the total Sum of Squares.

f) Visualization of sampling variability

This shows the distribution of means from resampling our original data. Our sample mean is included in this distribution, however, it is near the top of the data, suggesting it could be outside of our confidence interval. While it could still be within the interval, our sample mean seems to be on the higher range for the distribution.

g) Numeric description of sampling variability

From these numerical descriptions, we see that the b1 coefficient (increment to add on for emerging adulthood group) ranges from -.298 to .03, meaning it is possible to generate linear models from our sample where there is no correlation between well being and status as an emerging adult (0 is contained in the range). Without a confidence interval, we can not be certain of how likely such a result is.

The b0 variability is similar to that of the simple model, with the mean staying close to 4.47~4.48.

h) Confidence intervals

This confidence interval shows that we can be 95% confident that the true population mean lies between 4.475 and 4.629, which, as expected, includes our sample mean of roughly 4.47. The interval for EA_Status shows that we can be 95% confident the true increment is between -0.2254 and -0.033, which does not include 0. This is important, as it would allow us to reject the simple model (which states that b1 = 0).

i) Model Comparison

The MS error for the complex model is 1.738, which is smaller than the MS error from the simple model, which is 1.741. This means that the complex model is predicting sample values closer to the population mean. Our alpha level was .05, and our p value was smaller than that, meaning that we can fail to reject the complex model since it is statistically significant.

j) Conclusion

So far, the qualitative predictor model is slightly more accurate than the simple model. However the qualitative model has a PRE of .0022, so it is still not explaining a lot of the variability between EA status and SWB scale.

# Quantitative Predictor Model

a) Word Equation

Subjective Well-Being = average well-being + Age + Other Stuff

A person's subjective well-being is determined by the average well-being, plus their Age, plus other stuff.

b) GLM Equation

Y (i) = b (0) + b(1)*Xi + e (i)

SWBscale = mean(SWBscale) + increment to add on for each additional year in age + error

c) Visualizations

The first visualization makes it difficult to see the age vs SWB scale data, since many of the participants in the study were younger, clustering around ~20 years old (our target range), and the rest of the answers were pretty scattered. In the graph of our target range, the average well being score was pretty consistent around a little over 4.

d) Fit a linear model

Our linear model has a b0 (mean) of 4.482440, and suggests that the increment to add on for every additional year of age is -0.002638. This means that this model predicts (albeit a small amount) lower subjective well-being scores as age increases, drawing a negative correlation between the two.

e) Quantifying Error (ANOVA)

This table shows an F-value of 0.1924771, which is the amount of variation explained per degree of freedom spent. This is fairly low, especially compared to our other model. The p-value of .66, which is greater than our alpha of .05, suggests that we are unable to reject the null hypothesis from this model. Finally, the Sum of Squares error reduced by the model is quite low, only being .3378 of a total of around 3676.

f) Visualization of sampling variability

From this visualization of the distribution of b1 values, we can see that our sample is capable of producing b1's ranging from -.02 all the way up to ~.015. This shows that, from our sample, it is conceivable to get either a positive or negative correlation between subjective well-being and age, depending on the variability of the sample. Because this has not been established as a confidence interval, we can not necessarily reject or accept the null hypothesis from this alone.

g) Numeric description of sampling variability

h) Confidence intervals

The confidence interval shows that we can be 95% confident that the true mean lies between 4.227 and 4.737, which still includes our original sample mean of 4.47. More importantly, the confidence interval shows that the 95% confidence range for b1, that is, the increment to add on for age, is between -0.0144 and 0.00915, which means this model fails to reject the null hypothesis as a b1 of 0 is still possible.

i) Model Comparison

Comparing our two models, the model using EA_status as an explanatory variable has a PRE 22x the size of the model using age as an explanatory variable. This means that, comparatively, the qualitative predictor model explains 22 times as much of a proportion in variation compared to the quantitative model. Furthermore, the qualitative model has an F-value of 6.939, which is far greater than the quantitative model's F-value of 0.192. This means that the qualitative model explains 6.939 units of variation per degree of freedom spent, making it a better model than the quantitative model. Because we used a subset of data to get rid of N/A entries, we cannot compare Sum of Squares between these two models as they have differing sample sizes.

j) Conclusion

Overall, it appears that the qualitative model is superior to both the simple and quantitative models in explaining variation. The qualitative model (EA_status as predictor) explains more variation overall, as well as more variation per degree of freedom spent. Because we believe this model to be the best, it suggests that we reject the null hypothesis and conclude that there is a difference in Subjective Wellbeing based on status as an emerging adult. This model suggests that difference to be -0.1292, where being an emerging adult would be correlated to having a comparatively lower subjective well-being.

This does not allow us to declare causation. While the true increment (b1) likely is not -0.1292, we are 95% confident that it is between -0.22 and -0.03.