Linear regression is a fundamental technique in machine learning and statistics used to model the relationship between a dependent variable and one or more independent variables. In this article, we will explore how to implement a simple linear regression model using Python within Deepnote, an interactive data science notebook.
Setting up.
First, let's import the necessary libraries and read the data file. You can follow along by downloading the dataset from here.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Dataset
For this example, we are using a dataset containing two columns: studyTime and score. The studyTime column represents the number of hours spent studying, while the score column represents the corresponding scores achieved by students.
data = pd.read_csv("/work/percentage_study_time_scores.csv")
data
plt.scatter(data['studyTime'], data['score'], color = 'blue', marker='+')
plt.show()
Implementing linear regression
Linear regression aims to fit a line that best represents the data points. The line is defined by the equation y = mx + b, where m is the slope and b is the intercept.
To find the optimal values of m and b, we use gradient descent, an iterative optimization algorithm. We define a loss function to measure how well the line fits the data. The loss function is the mean squared error (MSE):
# For manual calculation of the loss
def loss_function(m, b, points):
return sum((points.icol[i]
The gradient descent algorithm updates the values of m and b to minimize the loss function:
def gradient_descent(m_now, b_now, points, L):
n = len(points)
m_gradient, b_gradient = (
sum(-2 / n * x * (y - (m_now * x + b_now)) for x, y in zip(points['studyTime'], points['score'])),
sum(-2 / n * (y - (m_now * x + b_now)) for x, y in zip(points['studyTime'], points['score']))
)
m_new = m_now - L * m_gradient
b_new = b_now - L * b_gradient
return m_new, b_new
We initialize the parameters and run the gradient descent algorithm for a specified number of epochs:
m = 0
b = 0
L = 0.00001
epochs = 1000
for i in range (epochs):
if i % 50 == 0:
print(f"Epoch: {i}")
m, b = gradient_descent(m, b, data, L)
print(m, b)
Plotting the regression line
After finding the optimal values of m and b, we plot the regression line along with the data points:
plt.scatter(data.studyTime, data.score, color = 'black', marker='+')
plt.plot(list(range(20, 100)), [m * x + b for x in range(20, 100)], color = 'red')
plt.show()
Conclusion
This implementation demonstrates how to perform simple linear regression using gradient descent in Deepnote. By visualizing the data, defining a loss function, and iterating through gradient descent, we can find the best-fitting line that models the relationship between study time and scores.
Deepnote provides an interactive environment that makes it easy to visualize and iterate on your data analysis and machine learning projects. The full code can be found in the provided script and can be executed step-by-step to understand the underlying process of linear regression.
Happy taking the world dominance with AI in Deepnote! 🐍🐍🐍