Implementing Gradient Descent in Python

Photo by Ales Krivec on Unsplash

This article will answer a few important questions about gradient descent that are frequently asked in Machine Learning Interviews. This article barely scratches the surface, but just like any challenging journey— we all gotta start somewhere!

Without further ado, let’s dive right in:

What is Gradient Descent?

Gradient Descent is a step towards local or global optima of a cost function. By step, we mean we change the parameters of the function so that we reach the optimal solution of the problem we are solving.

In machine learning we deal with two types of optimization problems :

  1. Convex Functions: Convex functions have only one optimal solution — you either have one global optimal solution or no solution at all
  2. Non Convex Functions: Non convex functions can have multiple local optimal points and sometimes we see our gradient descent functions get stuck in these local optimas.

Python implementation of Gradient Descent

Let’s take an example of a convex cost function:

y = (x-6)²

We can tell beforehand that the x=6 is the minima for this cost function. Can our function find this out?

What is stochastic gradient descent and when is it used?

Stochastic gradient descent is when you use a sample of training data to update your model parameters instead of using all the training data.

Pros:

We use it when it becomes expensive (memory wise, processing wise) to calculate the loss value for all the sample values

Cons:

It is a very noisy path to the optima, since we are using samples to get to the optima.

Here is the stochastic gradient descent implementation:

Implementation of Stochastic Gradient Descent on Toy Dataset
Output of stochastic gradient descent implementation

What is batch gradient descent?

Batch gradient descent is a subset of stochastic gradient descent. BGD or Mini batch gradient descent updates parameters which uses a small number of data points for each step.

Conclusion

It is important to understand the inner workings of models while practicing Machine Learning techniques. Gradient descent is one of the most important concepts with diverse applications, hence this question routinely comes up in MLE or Data Science interviews.

--

--

--

Masters in Data Science graduate from University of San Francisco. Over 6 years of experience in Analytics/Data Science industry. Loves solving problems

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Quickstart to using SingleStoreDB, MindsDB and Deepnote for Data Science

Linear Regression with Python

An Accessible Guide to Named Entity Recognition

Image Classification Using Transfer Learning: Crop Disease Classification

Introduction to Machine Learning Pipeline on AzureML Studio

Comparing Arabic dialect datasets

Prevent losing customers

The Short Story of a Virtuous Perceptron

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shruti Roy

Shruti Roy

Masters in Data Science graduate from University of San Francisco. Over 6 years of experience in Analytics/Data Science industry. Loves solving problems

More from Medium

Analysis & Prediction Of Dislikes On YouTube Data

Financial Inclusion in East Africa: Using Machine Learning to predict which individuals are most…

Can Machine Learning Really Predict The Future?

How to build an AI model from the ground