<class 'pandas.core.frame.DataFrame'>
Int64Index: 1307 entries, 0 to 1318
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 1307 non-null object
1 gender 1307 non-null object
2 age 1305 non-null float64
3 class 1307 non-null object
4 embarked 1307 non-null object
5 country 1231 non-null object
6 ticketno 1306 non-null float64
7 fare 1291 non-null float64
8 sibsp 1307 non-null float64
9 parch 1307 non-null float64
10 survived 1307 non-null object
dtypes: float64(5), object(6)
memory usage: 122.5+ KB
This graph conveys that men between the ages of eighteen and thirty have a high probability of surviving, whereas female survival chances are higher between 14 and 40.
Males between the ages of 5 and 18 have the lowest survival chance.
But, how is the gender and age distribution in each class?
This distribution by class shows us that the passengers in 3rd class were younger than those in class 1 and 2.
It can also be seen that more children are found in the 3rd class when compared to 1st and 2nd class. But the is not enough evidence to test the hypotheses.
In order to determine whether people travelled alone or accompanied, I created a bar plot showing that most males and females ravelled alone, with no spouses (or relatives).
The probability survive is higher than the probability to die when you're alone, but families with 4 are more likely to survive than a couple for instance.
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/categorical.py:3717: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
After creating the relatives column, it is evident that the chances to survive increase with 1-3 relatives.
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/categorical.py:3717: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
Here it is evident that survival probability for males and females increases the more relatives they have when between 1-3, whilst chances to survive decrease when family size exceeds this threshold. However, there is an exception for families with 5-6 members.
Lets do this for each class
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/categorical.py:3717: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/categorical.py:3717: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
warnings.warn(msg)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
FutureWarning
The above graphs demonstrate that most of the variance and lowest survival p are in the 3rd class (most passengers, youngest population), therefore survival chances were lowest here.
It is also evident that chances of survival for single females in 3rd class is similar to having 2 relatives, whereas men are more likely to die regardless of relatives in 3rd class.
Females with 4-10 relatives had an almost 0% chance at survival, apart from females with 6 relatives.
We can see evidence that more females in 3rd class died because they didn't want to leave behind their families, but we must compute logistic regression to be sure of this. The results are significant if the p value is <0.05
Optimization terminated successfully.
Current function value: 0.641859
Iterations 6
Results: Logit
===============================================================
Model: Logit Pseudo R-squared: -0.128
Dependent Variable: survived AIC: 913.5892
Date: 2022-01-17 10:46 BIC: 927.2723
No. Observations: 707 Log-Likelihood: -453.79
Df Model: 2 LL-Null: -402.17
Df Residuals: 704 LLR p-value: 1.0000
Converged: 1.0000 Scale: 1.0000
No. Iterations: 6.0000
----------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------------
gender 0.5090 0.1620 3.1408 0.0017 0.1914 0.8266
sibsp -0.5999 0.1169 -5.1329 0.0000 -0.8289 -0.3708
parch -0.2312 0.1065 -2.1716 0.0299 -0.4400 -0.0225
===============================================================
coefficients of logistic regression show a positive correlation between survival by being female and negative by having siblings or spouses.
the results passed the significance test and convey that chances to survive ultimately decrease when a passenger traveled with "sibsp" or parch.
Execution error
the coefficient of LR for male passengers show a negative correlation by having siblings/spouses. Only "sibsp" passed the significance test, showing that probability so survive decrease when a male passenger travels with a sibling or spouse.
Execution error
Coefficients of Logistical regression for female passengers ultimately shows a negative correlation by having sibling or spouses. Only the significance test for "sibsp" was passed, showing us that chances to survive decreases when a female passenger is travelling with "sibsp"
Execution error
Coefficient of logistical regression conveys a positive correlation between survival by being female and negative by having spouse/siblings accompanying them.
the results passed the significance test, showing us that chances of survival ultimately decrease when a passenger travels with a "sibsp" or "parch"