Introduction to Machine Learning (Supervised, Unsupervised, Reinforcement Learning)

Depending upon the business use cases, there are different kinds of machine learning algorithms. In this post we are going to learn about three basic machine learning approaches:

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

In any machine learning project, we generally follow a fixed pattern. We have tried to define those points below, it will help us in understanding this different kind of machine learning algorithms.

Gathering Data — The Quality, and quantity of data that you gather will directly determine how good your predictive model can be. Some models require continuous live fed data. In any machine learning project, it is the most important step.

Data Visualization (Exploratory Data Analysis): With the help of exploratory data analysis, we try to get various insights about the data. It helps us in feature engineering/data preparation, appropriate model selection, evaluation metric, etc. 

Data Preparation — In real life almost all the time data is noisy and messy. We need to prepare the data to make any machine learning model on top of that. This process of data preparation is called data preprocessing. In the end, to train and test our model, we split the data into training and test data sets.

Choice of the ML algorithm — Depending on factors such as the nature of data (labeled or unlabelled), type of data (numerical, Audio-visual, categorical), the measures of accuracy, cost of human intervention/correction. We choose the appropriate algorithm although its kind of a hyperparameter but a good knowledge of maths behind some particular algorithm helps in choosing the appropriate algorithm.

Continuous Learning of the model — Incrementally improve the model’s performance, by adjusting output parameters or rewards in each iteration. Evaluate model accuracy.

Prediction/Result Analysis:  Predict the expected results by running the model. Present the output in meaningful human-readable forms (Tables, graphs, images, etc).

Why is it important?

  • It is a fact that data scientists spend 80% of their time cleaning and manipulation of data, and only 20% of their time actually analyzing or building the model on top of it !!
  • Administratively, incorrect/inconsistent data can lead to false conclusions and misdirected investments.
  • In the real world businesses, incorrect data can be costly. Many companies use customer databases that record data like contact information, addresses, and preferences.

Types of Machine Learning:

Supervised Learning:

Supervised Learning is a process of inferring a function from labeled training data. A supervised machine learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples.

Supervised learning problems can be further grouped into Two Parts:

Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight” or “price of some home”.

Classification: A classification problem is when the output variable is categorical, such as “red” or “blue” or “disease” and “no disease”. Example of Supervised Machine Learning Algorithms: Naive Bayes, KNN, SVM, Logistic Regression, Decision Tree, Linear Regression, Random Forest, etc.

NOTE: A vast majority of practical machine learning uses supervised learning.  

Steps in Supervised Learning:
  • We will be having training data, in which corresponding to each data point we will be having a continuous value or a label.
  • We will train and validate the model by showing the example available in the training data.
  • Once we validate the model’s performance, we will test it on the unseen data or test dataset.

If the model was able to identify the output (almost matching the actual output, you hide from the model), you are ready to deploy your model. This process is also called the validation stage.

Real-World Application of Supervised Learning:
  1. Predict the flight ticket prices on Diwali to obtain the maximum profit to the airline company.
  2. Given an image, identify if it has been modified by some software such as Adobe Photoshop or not.
  3. Identify whether a website has obscene images or not.
  4. Predict an unusual behavior in an internet banking transaction.
Supervised Algorithms Working Cycle

Unsupervised Learning:

Unsupervised Learning is an ML technique to find patterns in data, in an exploratory manner. The data is not labeled, which means only the input variables(X) are given with no corresponding output variables. Algorithms are left to themselves to discover interesting patterns in the given data set.   Since data is unlabelled, there is no easy way to evaluate the accuracy of the algorithm — one feature that distinguishes unsupervised learning from supervised learning and reinforcement learning. Grouping of similar data into groups or clusters.

Unsupervised Learning problems can be further grouped into two parts:

Clustering: Grouping of similar data into groups or clusters.

Example: K-Means, K-Means++, K-Medoid, etc.

Dimensionality Reduction: Compression of the data to reduce the its complexity without altering its structure.

Example: Principal Component Analysis, Singular Value Decomposition, etc

Steps in Supervised Learning
  1. We will be having training data, in which corresponding to each data point we will be not having a label.
  2. Train the model on the training data, during training, we will be exploring the data, we will not be having any idea which variables are the output target in the data.
  3. Simplify and group the data so that it can be categorized into distinct sets.

If the model helps to identify useful real-world patterns, your model is successful. Measuring the accuracy of prediction is domain-specific and highly subjective.

Real-World Application of Unsupervised Learning:

  1. Recommendation systems in e-commerce sites such as Flipkart or Amazon work on the principle of unsupervised learning.
  2. Grouping the customers of a supermarket based on their purchasing behavior.
Workflow of Unsupervised Learning Algorithms

Reinforcement Learning:

The reinforcement learning algorithm (called the agent) continuously learns from the environment in an iterative fashion. Aims at using observations gathered from the interaction with the environment to take actions that would maximize the reward or minimize the risk.   In the process, the agent learns from its experiences of the environment until it explores the full range of possible states. The decision-making function is used to make the agent perform an action. After the action is performed, the agent receives a reward or reinforcement from the environment. The state-action pair of information about the reward is stored.

Steps in Reinforcement Learning:
  1. We will be having an initial state, corresponding to each initial state we will be having more than one next state. The input state is fed into the model and observed by the agent.
  2. Based on the input, the model returns a STATE. The decision-making function is used to make the agent perform an action. After the action is performed, based on its output, the agent receives a reward or reinforcement from the environment/user. The state-action pair of information about the reward is stored. This process continues in iterations and the model continuously keeps on learning from live data. At every step, it presents actions from states. The agent choosing the right step at each iteration is based on the Markov Decision Process.

EXAMPLE — “I don’t know how to act in this environment. Can you find a good policy/behavior and meanwhile I’ll give you feedback.”

Real-World Applications:

  1. Self-Driving cars work on the principle of Reinforcement learning.
  2. Games such as alpha go, chess, etc are a really nice example of Reinforcement Learning.
  3. Robots are another good example of Reinforcement Learning.

You May Also Like:

  1. All You Need to Know about Activation Functions (Sigmoid, Tanh Relu, Leaky Relu, Softmax)
  2. All You Need to Know About Sampling Distribution in Statistics
  3. Scratch Implementation of Stochastic Gradient Descent using Python
  4. Evaluation Metrics for Classification (Accuracy Score, Precision, Recall, Confusion Metric, F1-Score)
  5. Top Skills You Must Not Avoid to Become a Great Data Scientist
  6. Feedback on Your Preparation for Data Science or Machine Learning Jobs (Mock Interview)

Amazon Review Text Classification using Logistic Regression (Python sklearn)

Overview: Logistic Regression is the most commonly used classical machine learning algorithms. Although its name contains regression, it can be used only for classification. Logistic Regression can only be used for binary classification, but modified Logistic Regression can also be used for multiclass classification.

It has various advantages over other algorithms such as:
  1. It has a really nice probabilistic interpretation, as well as geometric interpretation.
  2. It is a parametric algorithm, and we need to store the weights that we learned during the training process to make predictions on the test data.
  3. It is nothing but a linear regression function on which the Sigmoid Function has been applied to treat the outliers(or large values) in a better way.
    1. Linear Regression Y = f(x)
    2. Logistic Regression Y = sigmoid(f(x))
There are several assumptions while applying Logistic Regression on any dataset:
  1. All the features are not multicollinear, and it can be tested using a perturbation test.
  2. The dependent variable should be binary.
  3. The dataset size should be large enough.
Logistic Regression Implementation on the Text Dataset (Using Sklearn):

You can download the data from here: First, we will clean the dataset. I have written a detailed post on the text data cleaning. You can read it here:

After cleaning, we will divide the dataset into three parts, i.e., train, test, and validation set. Using the validation set, we will try to find out the optimal hyperparameters for the model. After getting optimal hyperparameter, we will test the model on the unseen data i.e. test set.

Now we vectorize the dataset using CountVectorizer (Bag of Words), it is one of the most straightforward methods to convert text data into numerical vector form.

Now we will import all the required that will be useful for the analysis.

Alpha, Penalty is the hyperparameters in Logistic Regression (there are others as well). We will try to find out the optimal values of these hyperparameters.

Output :

0.0001 ------> 0.5
0.001  ------>  0.7293641972138972
0.01  ------>  0.8886922437232533
0.1  ------>  0.9374969316048458
1  ------>  0.9399004712804476
10  ------>  0.9113632222156819
100  ------>  0.8794308252229597
Optimal AUC score: 0.9399004712804476
Optimal C: 1

“We can see that for c=1, we are getting an optimal AUC score, so for final modeling, we will use it.”

Our dataset we have two classes, so predict_proba(), is going to give us the probability of both the category. We can understand it by an example, so predict_proba() for a point will return the values like this [p,1-p], where p is the probability of positive point, and 1-p is the probability of the point being negative. For whichever category, we have a higher probability. We will assign that category to the test point.

OutPut:AUC score on test data: 0.8258208984684994

AUC score on the training data: 0.8909678471639081

Exercise for You:

  1. Download the same data from kaggle:
  2. Apply logistic regression on top of that data using a bag of words(BOW) only, as I have done in this post.
  3. Change the penalty from l1 to l2 and comment down your AUC score.
  4. If you are facing any difficulty in doing this analysis, please comment below I will share the full working code.

Additional Articles:


Back Propagation Algorithm in Deep Learning

It is one of the most useful concepts in entire deep learning. Most of the algorithms are trained using backpropagation algorithm only. Here in this article, we are going to talk about, what are the various steps in training an algorithm using the backpropagation algorithm.

Steps in Backpropagation Algorithm:

We will be given some dataset to train the model. it will be in the form (Xi, Yi). Where Xi is the x values and Yi is the corresponding predicted value.

  1. First, we will initialize the weights using various methods such as random_uniform, random_normal, glorot_normal, glorot_uniform, he_normal, etc.
  2. Pass each data point Xi into the network (also called forward propagation).
  3. Calculate the loss by using (Yi and Ypredicted).
  4. Compute all the derivatives using the chain rule and to increase the training time use memoization to calculate the derivative.
  5. Update the weights using the available algorithms such as SGDAdagrad, Adam, Adadelta, etc.
  6. Until convergence, repeat the steps from 2 to 5.

An important thing about Back-Propagation is that it works only when the activation function is differentiable. If the function is easily differentiable, we can train our model very fast.