The Math Behind Linear Regression

2 min readJul 27, 2020

By George Bennett

Linear regression is the most basic machine learning algorithm. It is essentially a technique to draw “the line of best fit” through some data that resembles a line, but has a lot of noise in it. These are several ways to make a linear regression model, but in this post I will be explaining how to do it with stochastic gradient descent. You may remember from school that the formula for a straight line is described as “Y = M * X + B”. We are essentially looking for the ideal “M” and “B” values.

The first step is to start with a “M” (slope) and “B” (y-intercept) that are non zero, but close to zero.

In the second step we plug in one of our X values into the formula and we calculate what is called the “Loss”. To calculate the loss we use a lost function. The most common cost function to use is root mean squared error. Simply take the difference of the predicted “Y” term and the actual “Y” term that is associated with our “X” value, then square that number and divide by 2. That is the cost function.

The third step is to take the partial derivative of the cost function with respect to “M”, multiply that with a learning rate (0.01 should work) and then subtract that number from “M”. Do the same thing for “B”. Take the partial derivative of the cost function with respect to “B” and multiply that against a 0.01 and subtract that number from the current “B”

The fourth and final step is to repeat steps 2 and 3 until the model converges and stops moving significantly.

The Math Behind Linear Regression

Written by Datascience George

No responses yet