Everything You Need to Know about Machine Learning Syllabus to Become a Data Scientist?

Data science or machine learning is a field where everyone wants to make his/her career. But many people do not know what to study to become a great machine learning engineer. There are tons of machine learning algorithms you can learn, but in today’s world, you need not learn all of them. In this article, we are going to discuss machine learning algorithms that we need to know to become a good machine learning engineer. Here we will discuss every algorithm in very brief.



Before further deep dive into the topic first, we will learn about some basic terminologies, that will help us in understanding the syllabus in a better fashion:

Classification: 

It is a technique where we will be given some fixed number of classes. Given a data point, we have to predict in which category this particular data point belongs to. Ex: let suppose we train a model to predict whether the given image is of cat or dog. It is called classification.

Regression: 

It is a technique where given a data point, we have to predict some real value corresponding to that data point, e.g., given the location and area of the house, predict the prices of the house(a continuous variable).

Supervised Algorithms:

In these sets of algorithms, corresponding to each data point, we will have a label, e.g., corresponding to each image we will be having, whether the image belongs to a cat or dog.

Unsupervised Algorithms:

In this set of algorithms, corresponding to each data point we will not be having a label eg. Given an image we will not have a label, whether the image belongs to a cat or dog.

Semi-Supervised Algorithms:

These are special sets of algorithms, where a small amount of the data will be labeled, and the rest of the data will be unlabelled. As we have understood some terminology, so now we will try to explore the machine learning algorithms according to their nature.

Classical Machine Learning Algorithms:

In this section of the post, we will talk about conventional machine learning algorithms. For this class of algorithms, first we need to extract the features from the raw data and then feed them to the algorithms. These algorithms are ancient algorithms and have been there since the 80s-90s.

Naive Bayes: 

It is one of the straightforward classical machine learning algorithms; it works on the principle of the core Bayes theorem. It can be used for Regression as well as Classification.

K-Nearest Neighbors: 

The K-NN is easy to implement a machine learning algorithm. It can be used for both Classification and Regression.

Logistic Regression: 

It is among the most used classical machine learning algorithms. It is the special version of linear Regression. Although it’s name contains Regression, it can only be used for classification. It has a beautiful probabilistic interpretation.

Linear Regression: 

It is a classical machine learning algorithm that is only used for Regression.

Decision Tree: 

Decision Tree is the classical machine learning algorithm that is based on the core principles of simple if-else statements. It is highly interpretable.

Random Forest: 

Random Forest is nothing but a combination of various decision trees. These are less interpretable as compared to simple decision trees as we are taking the decision based on the prediction of a bunch of decision trees. It can be used for both Classification as well as Regression.

Support Vector Machine:

Currently, it’s among the most used classical machine learning algorithms. SVM can also be used for classification as well as Regression, the thing that makes it different from other algorithms is kernel trick (you will learn about it when you will learn the math behind support vector machine).

Boosting/XGboost:

In the series of classical machine learning algorithms, it is state of the art. In most of the competitions, it is highly useful. One of the drawbacks with it is that it has lots of hyperparameters. It can be trained using backpropagation, so we can use GPUs to train the model, unlike other classical machine learning algorithms. It can be used for both Regression as well as Classification.

Unsupervised Algorithms

These algorithms are mainly used in data extraction. Corresponding to each data point, we don’t have any label. We must know the following algorithms to know this part of machine learning, also called Data Mining: 
  1. K-Means++

  2. Hierarchical Clustering

  3. K-Mediods

  4. DBSCN clustering

Time Series Algorithms: 

These are the set of algorithms; those are used for the prediction on the data that varies with time such as stock Prices etc. We can learn the following algorithms to know this part of machine learning, but these are ancient approaches that are generally not used in production.

  1. Auto-Regressive algorithm
  2. Moving Average Algorithm
  3. Auto-Regressive Moving Average Algorithm
  4. Auto-Regressive Integrated Moving Average Algorithm

Optimization Techniques:

In every machine learning algorithm, there is a loss function that we need to optimize to reach the optimal point. The optimal point is the point at which our algorithm has as little error as possible on the test dataset. Below are the set of algorithms that are used for optimization: 

  1. Gradient Descent
  2. Stochastic Gradient Descent
  3. Mini Batch Stochastic Gradient Descent
  4. Adagrad (mainly used for neural networks)
  5. Adadelta
  6. RMSPROP
  7. Adam

Dimensionality Reduction Algorithms:

In real life, most of the time, we have a dataset that has very high dimensions. It has various drawbacks such as the problem of curse of dimensionality, high training and testing time, heavy memory requirement to fit the data into the memory. So using these sets of algorithms will help us in reducing the dimension of each data point in the dataset without losing much information. Below are some of the algorithms of this category:

  1. Principal Component Analysis
  2. T-SNE(A Nice Data Visualization can also be done using T-SNE)
  3. Truncated SVD

Deep Learning Approaches:

In the current time, most of the large organizations are having access to huge historical data, and at the same time, they also have the huge computational power to process that data. Because of these two reasons, in most real-life scenarios, deep learning approaches work way better than classical machine learning algorithms discussed in section 1. Although deep learning is a hot area of research, if you can learn below topics, it will help you in most of the tasks:

Convolution Neural Network:

These are state of the art for various computer vision tasks such as image classification, etc. Under this category, you can study various algorithms and pre-trained architecture mentioned below:

  1. VGG 16, VGG 19, ResNet 152, etc 
  2. RCNN, FRCNN, YOLO, etc

Recurrent Neural Network:

In real life, we see a huge amount of sequential data, where the current point in the data depends on some previous point. We can take an example of any English sentence; almost all the time, the current word depends on the previous words. RNN works well in case of sequential data.

Long Short Term Memory: 

In real life, we see a huge amount of sequential data, where the current point in the data depends on some previous point. We can take an example of any English sentence; almost all the time, the current word depends on the previous words. RNN works well in case of sequential data.

Gated Recurrent Unit:

It is also like LSTM only with slight differences. If you know the working of GRU it will help you a lot in developing the understanding of various other algorithms as well.

Encoder-Decoder: 

In some real-world applications, the length of input and output in the dataset is not fixed. We can take an example of language translation. Let’s suppose we want to convert an English sentence to its corresponding Hindi sentence. For different English sentences, Hindi conversion will have a different size (length). So to handle all these dependencies, encoder decoders are used. There are tons of other applications of encoder-decoder as well. Below are some other concepts that you need to know to consider yourself as a deep learning expert.

  1. Dropout
  2. Batch Normalization
  3. Weight Initialization Techniques (Usage and drawbacks)
  4. Activation Functions (What are the drawbacks of some particular activation function and why to use some of the particular activation function)

There is no fixed syllabus for deep learning. It’s a massive area of research. Every day new topics are getting included in deep learning. So try to update yourself with all the latest advancements that are taking place every day. The best way to learn about all these latest tools and techniques is by reading the latest research papers in that particular field. 

In this article, we have discussed machine learning and deep learning syllabus. If you are comfortable with all the techniques described above, along with the maths behind them, you can consider yourself as a good data scientist. 

If you think you are comfortable with all these things, you can fill this Google form, and we will take your interview, and based on your performance in the discussion, we will provide you the feedback about your machine learning or deep learning skills. 

If we made any mistake in assigning the wrong group to any algorithm, please do let me know in the comment section.

You May Like:
1. How to use Linkedin for data science or machine learning jobs?
2. How to prepare data structure and algorithms for machine learning and data science profile?