Measure Distance Between Two Vectors in Machine Learning

In machine learning, we frequently need to find out the distance between the two vectors. We can take an example of K-NN, where we find out the distance of one point to k points. There are various types of distances that we can use to find out the distance between the two vectors such as:

  1. Euclidean Distance
  2. Manhattan Distance
  3. Minkowski Distance
  4. Cosine Similarity
  5. Jaccard Distance

All the above-given distances have there own advantages and disadvantages. In this post we are going to talk about all these distances one by one and how can we calculate them using python.

Euclidean Distance: During feature engineering or training or testing some machine learning algorithms, it is one of the most used distance. We can use the below-given formula to calculate the distance. 

Euclidean Distance Formula

Here we have given the formula for 2-dimensional points, it can be generalized to n-dimensional points. Euclidean Distance is also known as the l2 norm.

By the following code, we can calculate the Euclidean Distance between the two points x and y using numpy in python.

Manhattan Distance: Manhattan Distance is the sum of absolute difference of all the coordinates in both the vectors. Below is the formula to calculate the Manhattan distance between two vectors X = [x1,y1,z1] and Y = [x2,y2,z2].

Manhattan Distance

Below is the python code to calculate Manhattan Distance:

Cosine Distance: Cosine distance is also called cosine similarity, below is the formula to calculate the cosine distance between two vectors A and B.

Cosine Distance Formula

Python code to calculate the Cosine Distance:

Jaccard Distance: Jaccard Distance is calculated with the help of the below-given formula. It can be calculated for both numerical and string.

Jaccard Distance Formula

Python code to calculate the Jaccard distance: 

References:

  1.  Jaccard Distance
  2. Manhattan Distance