Stochastic Gradient Descent, also called SGD, is one of the most used classical machine learning optimization algorithms. It is the variation of Gradient Descent. In Gradient Descent, we iterate through entire data to update the weights. As at each iteration we are using the whole dataset to update the weights, when the dataset size is too large, Gradient Descent becomes too expensive in terms of time complexity.

So to reduce the time, we do a slight variation in Gradient Descent, and this new algorithm is called Stochastic Gradient Descent. In SGD, at each iteration, we pick up a single data point randomly from the large dataset and update the weights based on the decision of that data point only. Following are the steps that we use in SGD:

- Randomly initialize the coefficients/weights for the first iteration. These could be some small random values.
- Initialize the number of epochs, learning rate to the algorithm. These are the hyperparameters so they can be tunned using cross-validation.
- In this step, we will make the predictions using the calculated coefficients till this point.
- Now we will calculate the error at this point.
- Update the weights according to the formula given in image 1.
- Go to step 3 if the number of epochs is over or the algorithm has converged.

### Below is the python implementation of SGD from Scratch:

Given a data point and the old coefficients, this block of code will update the weights.

Given some unknown data points along with the calculated coefficient, this part of the code will make predictions.

This part of the code will take various parameters such as Training Data, learning rate, number of epochs, range r, and will return the optimal value of coefficients. The learning rate, range r, and the number of epochs are hyperparameters and will be calculated using cross-validation.

Finally, after calculating the optimal set of coefficients, we will make the predictions on the test dataset.

You can execute the code by just copy-pasting the code in an ipython notebook. You need to provide X_train, X_test, learning rate, r, and the number of epochs. If you are not able to run the code, do let me know in the comment section. I will reply as soon as possible. You can find out full working code on GitHub: https://github.com/kkaran0908/Stochastic-Gradient-Descent-From-Scratch

Simple Exercise :

- Download the dataset from Kaggle: https://www.kaggle.com/c/boston-housing
- Perform all the above steps on this dataset.
- After performing the above steps just comment in the comment section and let us know the Root Mean Squared Error of your model.

You May Like: