Sign inGet started
← Back to all guides

Linear regression in Deepnote

By Filip Žitný

Updated on July 9, 2024

Linear regression is a fundamental technique in machine learning and statistics used to model the relationship between a dependent variable and one or more independent variables. In this article, we will explore how to implement a simple linear regression model using Python within Deepnote, an interactive data science notebook.

Setting up.

First, let's import the necessary libraries and read the data file. You can follow along by downloading the dataset from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Dataset

For this example, we are using a dataset containing two columns: studyTime and score. The studyTime column represents the number of hours spent studying, while the score column represents the corresponding scores achieved by students.

data = pd.read_csv("/work/percentage_study_time_scores.csv")
data

plt.scatter(data['studyTime'], data['score'], color = 'blue', marker='+')
plt.show()

Implementing linear regression

Linear regression aims to fit a line that best represents the data points. The line is defined by the equation y = mx + b, where m is the slope and b is the intercept.

To find the optimal values of m and b, we use gradient descent, an iterative optimization algorithm. We define a loss function to measure how well the line fits the data. The loss function is the mean squared error (MSE):

# For manual calculation of the loss
def loss_function(m, b, points):
    return sum((points.icol[i]

The gradient descent algorithm updates the values of m and b to minimize the loss function:

def gradient_descent(m_now, b_now, points, L):
    n = len(points)
    m_gradient, b_gradient = (
        sum(-2 / n * x * (y - (m_now * x + b_now)) for x, y in zip(points['studyTime'], points['score'])),
        sum(-2 / n * (y - (m_now * x + b_now)) for x, y in zip(points['studyTime'], points['score']))
    )

    m_new = m_now - L * m_gradient
    b_new = b_now - L * b_gradient

    return m_new, b_new

We initialize the parameters and run the gradient descent algorithm for a specified number of epochs:

m = 0
b = 0
L = 0.00001
epochs = 1000

for i in range (epochs):
    if i % 50 == 0:
        print(f"Epoch: {i}")
    m, b = gradient_descent(m, b, data, L)

print(m, b)

Plotting the regression line

After finding the optimal values of m and b, we plot the regression line along with the data points:

plt.scatter(data.studyTime, data.score, color = 'black', marker='+')
plt.plot(list(range(20, 100)), [m * x + b for x in range(20, 100)], color = 'red')
plt.show()

Conclusion

This implementation demonstrates how to perform simple linear regression using gradient descent in Deepnote. By visualizing the data, defining a loss function, and iterating through gradient descent, we can find the best-fitting line that models the relationship between study time and scores.

Deepnote provides an interactive environment that makes it easy to visualize and iterate on your data analysis and machine learning projects. The full code can be found in the provided script and can be executed step-by-step to understand the underlying process of linear regression.

Happy taking the world dominance with AI in Deepnote! 🐍🐍🐍

Filip Žitný

Data Scientist

Follow Filip on Twitter, LinkedIn and GitHub

That’s it, time to try Deepnote

Get started – it’s free
Book a demo

Footer

Solutions

  • Notebook
  • Data apps
  • Machine learning
  • Data teams

Product

Company

Comparisons

Resources

  • Privacy
  • Terms

© Deepnote