Implementing Gradient Descent in Python
This article will answer a few important questions about gradient descent that are frequently asked in Machine Learning Interviews. This article barely scratches the surface, but just like any challenging journey— we all gotta start somewhere!
Without further ado, let’s dive right in:
What is Gradient Descent?
Gradient Descent is a step towards local or global optima of a cost function. By step, we mean we change the parameters of the function so that we reach the optimal solution of the problem we are solving.
In machine learning we deal with two types of optimization problems :
- Convex Functions: Convex functions have only one optimal solution — you either have one global optimal solution or no solution at all
- Non Convex Functions: Non convex functions can have multiple local optimal points and sometimes we see our gradient descent functions get stuck in these local optimas.
Python implementation of Gradient Descent
Let’s take an example of a convex cost function:
y = (x-6)²
We can tell beforehand that the x=6 is the minima for this cost function. Can our function find this out?
What is stochastic gradient descent and when is it used?
Stochastic gradient descent is when you use a sample of training data to update your model parameters instead of using all the training data.
Pros:
We use it when it becomes expensive (memory wise, processing wise) to calculate the loss value for all the sample values
Cons:
It is a very noisy path to the optima, since we are using samples to get to the optima.
Here is the stochastic gradient descent implementation:
What is batch gradient descent?
Batch gradient descent is a subset of stochastic gradient descent. BGD or Mini batch gradient descent updates parameters which uses a small number of data points for each step.
Conclusion
It is important to understand the inner workings of models while practicing Machine Learning techniques. Gradient descent is one of the most important concepts with diverse applications, hence this question routinely comes up in MLE or Data Science interviews.