!pip install statsmodels==0.12.2

# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import statsmodels.api as sm

# Uploading the dataset
behaviour = pd.read_csv('mdk_study_essential_values.csv')
behaviour.head()

## To what extent does daily screen time and outdoor play affect the disruptiveness in children (ages ranging from 2-5)?

### Abstract

This investigation has been carried out to find a possible correlation between a child's disruptiveness and the possible causes of it. A multiple regression model has been created to showcase this. Screen time on a TV and playing games has been examined as well as the child's outdoor play time, gender and age. It was unexpected to see that screen time had no large effect on their disruptiveness but rather outdoor play time. Hence, it can be derived that outdoor play time is a contributor to a child's social skills however, more variables should be included to further expand on this investigation.

### Introduction

As the prevalence of technology rises its drawbacks also become noticeable. Children, in contrast to children 20 years ago have vastly different daily lives. Instead of playing in the playground as their main source of entertainment, video games have become the primary enjoyment. This is what will be the study of this report, the ways in which screen time affects a child's social abilities or rather - disruptiveness. This is something of value to investigate as one knows that being reliant on a screen for entertainment is not ideal, however, it still hasn't gotten as much awareness as it should. We will be studying this using the screen time of children whilst watching television or playing video games, their gender, age and their time spent outdoors - these will all be plotted against their disruptiveness to hopefully be able to draw conclusions. The model that this investigation ended up with showed us that outdoor play time had the largest influence on a childs disruptiveness, essentially telling us that despite screen time being outdoors is the most important.

### Hypothesis

### Data

The dataset comes from a cross sectional study of children aged between 2 and 5. It describes their usual daily screen time, usual daily outdoor hours and their ASBI scores for three different social skills. The dataset contains data from 575 families. For every child, the parents filled in a survey. There were different surveys for mothers and fathers. Our response variable is the ASBI score for disruptiveness, therefore we disregarded the other two ASBI scores. For our explanatory variables, we have the age and gender of the children, as well as the screen time and outdoor hours. In our dataset report, we have taken two screen time variables into account: television/DVD viewing and computer/e-game/handheld game use.

Disruptiveness is measured with ASBI scores, which is based on how often a subject shows disruptive behaviour, like bullying or teasing. It consists of 7 items, which are rated on a three point scale (almost never, sometimes, almost always). The maximum range is between 7 and 21, but most values in this dataset are between 7 and 15, as can be seen in Fig.1.

The participants from the dataset were parents of children aged 2-5 who had not started school yet, meaning that the observational units represent the children. The research was conducted in Melbourne. We deem the generalized population to be preschool children, more specifically from developed countries. Access to different kinds of technology may be different in different parts of the world, but if there is a link between behaviour and screen time or outdoor time, it should be applicable to children in different countries.

#Displays the explanatory variables against disruptiveness in four different subplots.
fig, axs = plt.subplots(2, 2)
axs[0, 0].scatter(x = behaviour['disruptiveness'], y = behaviour['screen_time_tv'])
axs[0, 0].set_title("Disruptiveness vs TV screen time", fontsize = 'small')
axs[0, 1].scatter(x = behaviour['disruptiveness'], y = behaviour['screen_time_game'])
axs[0, 1].set_title("Disruptiveness vs Gaming screen time", fontsize = 'small')
axs[1, 0].scatter(x = behaviour['disruptiveness'], y = behaviour['average_outdoor_hours'])
axs[1, 0].set_title("Disruptiveness vs Average Outdoor Hours", fontsize = 'small')
axs[1, 1].scatter(x = behaviour['disruptiveness'], y = behaviour['age'])
axs[1, 1].set_title("Disruptiveness vs Age", fontsize = 'small')
for ax in axs.flat:
ax.label_outer()

Fig 1: Scatterplots of TV screen time, gaming screen time, outdoor hours and age against the response variable

### Results

# Fitting the model with all explanatory variables
m_full = sm.formula.ols(formula = 'disruptiveness ~ average_outdoor_hours + gender + age + screen_time_tv + screen_time_game ', data = behaviour)
multi_reg = m_full.fit()
print(multi_reg.summary())

After fitting a linear model to all of the explanatory variables, we obtained an adjusted R^2 of 0.004. We then used backwards selection where we examined the p-value of each variable to eliminate less significant variables. This was done using a significance level of α = 0.05. The first variable to be removed was age, with a p-value of 0.638. Gender was removed next, with a p-value of 0.570. Then we removed screen_time_tv, the p-value being 0.557. Finally, we removed screen_time_game, which had a p-value of 0.161. This left us with only average_outdoor_hours, which has a statistically significant p-value of 0.038. The new adjusted R^2 value was 0.006.

# Fitting the final model obtained through backwards selection
m_full = sm.formula.ols(formula = 'disruptiveness ~ average_outdoor_hours', data = behaviour)
multi_reg = m_full.fit()
print(multi_reg.summary())

The intercept of our model is at 10.4952 and the average_outdoor_hours coefficient is -0.0832. This gives us the following linear regression model:

#Graphs the disruptiveness versus average_outdoor_hours in a scatter plot
#along with the line of regression.
plt.figure(figsize=(9,6))
sns.scatterplot(x = behaviour['average_outdoor_hours'], y = behaviour['disruptiveness'])
plt.title("Disruptiveness vs Outdoor hours", fontsize = "xx-large")
x = behaviour['average_outdoor_hours']
plt.plot(x, -0.0832*x + 10.4952, label = 'line of regression', color = 'red')
plt.legend()

Fig 2: Scatter plot showing disruptiveness scores against average outdoor hours and a line showing the linear regression model.

The result of the linear model shown in Fig.2 is interesting, it shows a correlation between the two variables, however the correlation is not strong. The correlation is negative, meaning that, according to our findings, a larger amount of hours spent outdoors is correlated with a slightly lower level of disruptiveness.

resid = []
for i in range(0, len(behaviour['average_outdoor_hours'])):
resid.append(behaviour['disruptiveness'][i] - (-0.0832*behaviour['average_outdoor_hours'][i] + 10.4952))
# Checking the distribution of the residuals
plt.figure(figsize=(10,6))
sns.histplot(x=resid, bins = 11) # histogram
plt.show()
plt.figure(figsize=(10, 6))
stats.probplot(resid,plot=plt) # probability plot
plt.show()

Here, we examine the reliability of the linear model. The histogram shows a normal distribution which is skewed to the right. This is also visible in the QQ-plot, as the values deviate more from the left side of the line. Other than that, the QQ-plot does not show any extreme outliers, most of the values are close to the line. From these plots, we can determine the linear model to be reliable, although a little skewed.

### Conclusion

In this study, we attempted to find out to what extent daily screen time and outdoor play affect the level of disruptiveness in children aged 2 to 5. We examined five explanatory variables: age, gender, screen time for television, screen time for game devices and average hours spent outside. In the end, the outdoor hours had the largest effect. We removed all the other variables from the model because they didn't show any statistically significant effect. In our final linear regression model, the slope is not very steep. The outdoor hours is not a very reliable predictor of disruptiveness, so there are likely other factors that come into play. Nevertheless, there is a slight negative correlation. From this study, we can observe that a larger amount of hours outside is correlated with a slightly lower level of disruptiveness and vice-versa.

Clearly, the model has its limitations. There is only a slight correlation, which means that in order to reliably predict disruptive behaviour in children, it would be necessary to examine other factors. Additionally, the dataset is limited in a few ways. Firstly, the level of disruptiveness is measured using only 7 questions, and results in scores between 7 and 21 points. Perhaps using more questions, and therefore having a wider range for the scores, would give a more accurate estimation of behaviour. Secondly, the scores are somewhat subjective. Parents might assume their child to be more well-behaved than in reality, or they might underestimate their child's screen time. Because the data was collected from questionnaires instead of from measurements, it might not be entirely accurate.