Diabetes Precition with Logistical Regression

1. Opening and exploring dataset

# Importing modules import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn import metrics from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import seaborn as sns

# Open and explore dataset dataset = pd.read_csv('diabetes.csv') dataset.head()

# Show the correlation matrix # sns.pairplot(dataset, hue='Outcome') # dataset.corr() sns.heatmap(dataset.corr(), annot=True)

# sns.jointplot(data = dataset, x = 'Glucose', y='Age', hue = 'Outcome')

2. Extracting x, y and train and test tuples

# Extracting independent and depented variables feature_cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age'] x = dataset[feature_cols] y = dataset['Outcome']

# Getting train and test data x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

3. Instancing our logical regression Model

regressor = LogisticRegression(max_iter=1000) regressor.fit(x_train, y_train)

4. Executing the predictions

y_pred = regressor.predict(x_test) y_pred

5. Prediction's Accuracy

# Calculating the accuracy score # metrics.accuracy_score(y_test, y_pred) # this one compares by identity both arrays regressor.score(x_test, y_test)

# Finding the confusion matrix cnf_matrix = metrics.confusion_matrix(y_test, y_pred) cnf_matrix