Implementing Gradient Descent in Python

Shruti Roy
2 min readApr 17, 2022
Photo by Ales Krivec on Unsplash

This article will answer a few important questions about gradient descent that are frequently asked in Machine Learning Interviews. This article barely scratches the surface, but just like any challenging journey— we all gotta start somewhere!

Without further ado, let’s dive right in:

What is Gradient Descent?

Gradient Descent is a step towards local or global optima of a cost function. By step, we mean we change the parameters of the function so that we reach the optimal solution of the problem we are solving.

In machine learning we deal with two types of optimization problems :

  1. Convex Functions: Convex functions have only one optimal solution — you either have one global optimal solution or no solution at all
  2. Non Convex Functions: Non convex functions can have multiple local optimal points and sometimes we see our gradient descent functions get stuck in these local optimas.

Python implementation of Gradient Descent

Let’s take an example of a convex cost function:

y = (x-6)²

We can tell beforehand that the x=6 is the minima for this cost function. Can our function find this out?

What is stochastic gradient descent and when is it used?

Stochastic gradient descent is when you use a sample of training data to update your model parameters instead of using all the training data.

Pros:

We use it when it becomes expensive (memory wise, processing wise) to calculate the loss value for all the sample values

Cons:

It is a very noisy path to the optima, since we are using samples to get to the optima.

Here is the stochastic gradient descent implementation:

Implementation of Stochastic Gradient Descent on Toy Dataset
Output of stochastic gradient descent implementation

What is batch gradient descent?

Batch gradient descent is a subset of stochastic gradient descent. BGD or Mini batch gradient descent updates parameters which uses a small number of data points for each step.

Conclusion

It is important to understand the inner workings of models while practicing Machine Learning techniques. Gradient descent is one of the most important concepts with diverse applications, hence this question routinely comes up in MLE or Data Science interviews.

--

--

Shruti Roy

Masters in Data Science graduate from University of San Francisco. Over 6 years of experience in Analytics/Data Science industry. Loves solving problems