Introduction

Hello learners, welcome to this course on advanced c++. In this course, we will learn multithreading, multiprocessing, concurrency, etc in a detailed fashion. Once you complete this course, you will be highly comfortable interviewing with any HFT firm. 

Why learn advanced c++:

  1. C++ developers are among the highest-paid engineers in the industry. 
  2. You will learn how systems work internally.
  3. If you are comfortable with C++, it will be a cakewalk to grasp the other language or framework.

Pre-Requisite for the Course:

There are no specific pre-requisites for the course as such. You should be comfortable with simple loops, data types, if-else, etc. During the course, if you are not able to understand something, just continue.

Instruction to follow the course:

  1. Do not copy the code, but write your own code. 
  2. Try to complete the exercise given at the end of each section.
  3. Do not go through the course just for the sake of completion.
  4. Put your entire code on GitHub, so that you can show it as a proof of concept to your future employer.

So best of luck, and let’s start.

Advance Theoretical Problems around Memory Management, Process Synchronization, and System Programming

  1. How does the compilation of code take place internally (what are the various steps and what is the significance of each step)?
  2. What is dynamic memory allocation? Why do we need it?
  3. Why do we need heap and stack separately during dynamic memory allocation?
  4. What is the difference between malloc and calloc?
  5. What is the difference between call by value and call by reference?
  6. What is the difference between call by reference and call by address?
  7. What is a dangling pointer?
  8. How does the size of a structure is calculated?
  9. What is the difference between structure and OOPs? What is something that we can do using OOPs but structure can not handle that?
  10. What is a deadlock, how can we avoid it?
  11. What is the dining philosopher problem, can you implement it?
  12. Why do we need a garbage collector? What do you think why there is no automatic garbage collector in C/C++?
  13. How does dynamic memory allocation take place in vectors in C++?
  14. What do you think about using C/C++ for most of the low latency tasks?

ALL ABOUT LISTS:

LISTS IN PYTHON

Lists are arguably Python’s most versatile, useful data types. You will find them in virtually every good Python program.
In this article we will be covering various aspects of lists; like how to use them, what all functions we can apply on lists and how much useful lists are for our program.

Let’s start with some characteristics of lists:

  • Lists can contain any type of elements, unlike arrays.
  • Lists can be altered after their creation, hence they are mutable.
  • Lists have a definite count and are ordered.
  • Elements of the list can be accessed by their index.
LET’S SEE HOW TO GET HANDS-ON LISTS:

A) Creating a list
Creating a list is very simple, one just has to assign the element of the list in [] square brackets and assign it to a variable; i.e the name of the list.

#creating a list of numbers and printing it
list = [1,2,3]
print(list)
//output: [1, 2, 3]


B) Fetching  an element by its index
When we don’t require the whole list to work with, we can simply fetch a single element by typing the list name, along with the index in the square brackets. Like in the given example we want to fetch element at index 2; i.e two, so we write list2[2].


#creating list of mixed data and fetching it using index
list2 = [1,‘one’,‘two’,2]
print(list2[2]) 
//output: two

C) Checking the length of the list
For certain operations like while using loops, length of the list is required. To find out length, there is a function called len(). We can simply write the list name for which we want the length. Like here, we want length of list2, so we write len(list2).


#printing the length of list using len() function
list2 = [1,‘one’,‘two’,2]
print(len(list2))
//output: 4

D) Adding element in the list
If we while working in a project wants to add some more element in a list. One doesn’t have to make
list, rather just use some functions to add elements to the list.

#adding a single element to list using append() function
list2 = [1,‘one’,‘two’,2]
list2.append(3)
print(list2)
//output: [1, ‘one’, ‘two’, 2, 3]


#adding element in a list at a specific index using insert() function
list2.insert(0,‘lists’)
print(list2)
//output: [‘lists’, 1, ‘one’, ‘two’, 2]

E) Adding a list to a list
Sometimes, one also needs to add a whole another list to a list. For that, we have a function like extend().

#adding multiple elements in the list using extend() function
list2 = [1,‘one’,‘two’,2]
list2.extend([5,‘five’,6])
print(list2)
//output: [1, ‘one’, ‘two’, 2, 5, ‘five’, 6]

F) Creating a Multi-Dimensional list
Multi-Dimensional lists are the list holding the other lists. Well, it is preferred to use dictionaries instead of
Multi-Dimensional lists.

#creating a multi dimensional list
list4 = [[‘one’,‘two’,‘three’],[1,2,3]]

#printing multi dimensional element
print(list4[0][1])
//output: two

G) Negative Indexing in lists
Negative Indexing eases our way to access the data from the last part of the list.
Let’s say if the first element is at index 1, the last element will be at index -1.

#printing the list in reverse order
print(list2[::-1])
//output: [6, ‘five’, 5, 2, ‘two’, ‘one’, 1]

#NEGATIVE INDEXING
list = [1,2,3,4,5,6,7,8,9,10]
print(list[-1])
//output: 10

H) Removing elements from the list
Rather than creating a whole new list, one can simply add or remove elements from the existing list.
remove() function is used to remove element.

#removing element from list using remove() method
list = [1,2,3,4,5]
list.remove(5)
print(list)
//output:

#using for loop for removing elements
list = [1,2,3,4,5,6,7,8,9,10]
for i in range(1,5):
    list.remove(i)
print(list)
//output: [5, 6, 7, 8, 9, 10]

#removing element from a specific position using pop() function
list = [1,2,3,4,5,6]
list.pop(2)    #removing element on 2nd index
print(list)
//output: [1, 2, 4, 5, 6]

I) Slicing operation on lists
When you only want to work with a part of the list, slicing the list is what you can do. Slicing is basically selecting a
part of the list, you want to work with.

#applying slice operation on the same list
list = [‘GO’,‘WASH’,‘YOUR’,‘HANDS’,‘&’,‘BE’,‘SAFE’]
sliced_list = list[1:6]              #printing the elements in range
print(sliced_list)
//output: [‘WASH’, ‘YOUR’, ‘HANDS’, ‘&’, ‘BE’]

#printing elements from a pre-defined point to end
list = [‘GO’,‘WASH’,‘YOUR’,‘HANDS’,‘&’,‘BE’,‘SAFE’]
sliced_list = list[1:]
print(sliced_list)
//output: [‘WASH’, ‘YOUR’, ‘HANDS’, ‘&’, ‘BE’, ‘SAFE’]
#printing the whole list from begining to the end 
list = [‘GO’,‘WASH’,‘YOUR’,‘HANDS’,‘&’,‘BE’,‘SAFE’]
sliced_list = list[:]
print(sliced_list)
//output: [‘GO’, ‘WASH’, ‘YOUR’, ‘HANDS’, ‘&’, ‘BE’, ‘SAFE’]

#printing the list in reverse order
list2 = [1,‘one’,‘two’,2]
print(list2)
print(list2[::-1])
//output: [1, ‘one’, ‘two’, 2]

[2, ‘two’, ‘one’, 1]


These were some basic functions related to the list. If you are new to this concept
of lists, then these will benefit you all.

Thank You for Reading!

How will you select the model ?

Model Selection

1. The central issue in all of Machine Learning is “how do we extrapolate what has been learnt from a finite amount of data to all possible inputs ’of the same kind’?”.
2. We build models from some training data. However the training data is always finite.
3. On the other hand the model is expected to have learnt ‘enough’ about the entire domain from where the data points can possibly come.
Let us understand some of the key concerns in selecting an appropriate model for a task.

a)Occam’s Razor

A predictive model has to be as simple as possible, but no simpler. Often referred to as the Occam’s Razor, this is a fundamental tenet of all of machine learning.
Occam’s Razor is therefore a simple thumb rule — given two models that show similar ’performance’ in the finite training or test data, we should pick the one that makes fewer assumptions about the data that is yet to be seen.

b) Over-fitting

Over-fitting is a phenomenon where a model becomes way too complex than what is warranted for the task at hand and as a result suffers from bad generalization properties.
Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.
Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data. Intuitively, overfitting occurs when the model or the algorithm fits the data too well.

c) Regularization

Regularization is the simplification done by the training algorithm to control the model complexity.
This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.

Roles of Regularization:

  1. It significantly reduces the variance of the model without a substantial increase in the bias.
  2. It is used in the case of Overfitting.
  3. It shrinks and regularizes the coefficients for a better prediction, without losing the important properties of the data.

d) Bias-Variance Trade-off

  • Error due to Bias is the difference between the expected (or average) model prediction and the correct value or true value.
E[Y’ — Y]: Y’ is Predicted Value & Y is Actual Value.
•Imagine Running Multiple models several times will have the range of predictions.
Error due to Variance is the variability in the results of a model when the dataset is changed.
•High Variance increases the spread of points which results in less accurate predictions.
•A Low Bias and High Variance Model is an Overfitted Model
•Variance is how much the predictions for a given point vary between different samples of the training data.

•Model with high variance pays a lot of attention to the training data and does not generalize well on the test data.
•A low bias algorithm is not easy to learn but highly flexible, due to this they have higher predictive performance.
•A High Bias and Low Variance Model is an Underfitted Model
•A high bias algorithm is easy to learn but less flexible, due to this they have lower predictive performance.

e) Model Complexity

Complexity

Number of parameters required to specify the model completely. For example in a simple linear regression for the response attribute y on the explanatory attributes x1,x2,x3 the model y = ax1+bx2 is ‘Simpler’ than the model y = ax1+bx2+cx3 — the latter requires 3 parameters compared to the 2 required for the first model.

f) Cross Validation

The Dataset is randomly partitioned in k equal sized samples.
Out of k samples a single sub sample is retained as validation dataset for testing the model and remaining k-1 are used for training data.
This process is repeated k times and the results can be averaged out to produce a single estimation.

g) Hold-Out Strategy

Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the model is trained on, and the test set is used to see how well that model performs on unseen data.
A common split when using the hold-out method is using 80% of data for training and the remaining 20% of the data for testing.

Thank you and Keep Learning 🙂

Metrics for Classification model

CONFUSION MATRIX:

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities.It is a 2*2 matrix used to check the performance of classification models.
It tells us how many 0’s are identified as 0’s and how many 1’s are identified as 1’s.

It reports the number of false positivesfalse negativestrue positives, and true negatives. This allows more detailed analysis than mere proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

The confusion matrix shows the ways in which your classification model
is confused when it makes predictions.


Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

True Positive:
Interpretation: You predicted it positive and it’s actually true.
You predicted that a woman is pregnant and she is actually pregnant.
True Negative:
Interpretation: You predicted negative and it’s actually true.
You predicted that a man is not pregnant and he actually not.
False Positive: (Type 1 Error)
Interpretation: You predicted it positive and it’s actually false.
You predicted that a man is pregnant but he actually is not.
False Negative: (Type 2 Error)
Interpretation: You predicted negative and it’s false.
You predicted that a woman is not pregnant but she actually is.

Accuracy:
The best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1 – ERR. Accuracy is calculated as the total number of two correct predictions (TP + TN) divided by the total number of a dataset (P + N).
Precision:

Precision can be seen as a measure of exactness or quality.Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances.
Recall:
Recall — Also called Sensitivity, Probability of Detection, True Positive Rate. Ratio of correct positive predictions to the total positives examples.
F1-score:
The F1 score is the harmonic mean of precision and recall taking both metrics into account in the following equation.F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.
Specificity and Senstivity:

Sensitivity refers to a test’s ability to designate an individual with disease as positive. A highly sensitive test means that there are few false negative results, and thus fewer cases of disease are missed. The specificity of a test is its ability to designate an individual who does not have a disease as negative.

AUC-ROC



AUC-ROC stands for “Area under the ROC Curve.” It is to measure classification models by various thresholds (the default threshold is 0.5).AUC – ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.


An Overview of Dictionaries in Python

WHY USE DICTIONARIES? HOW TO USE DICTIONARIES? WHAT ARE THE BENEFITS OF USING DICTIONARIES?

All these above questions will be answered here.

Dictionaries play an important role in Data World. 
Imagine ….
You are working in a school as a counselor, and want to keep track of the student in each class.
You can put the strength of students in a list, here I am taking a group of 5 class dataset.

student=[52,35,65,41,53]

And to keep a track of which class contains a certain no. of students, we will create another list with the class name.

classes =[’10A’,’12B’,’10B’,’11A’,’10C’]

Now suppose that you want to find out how many students are there in the “11A” class.
Firstly you have to figure out where in the list is “11A”; so that you can use this position to get the correct student number.

So, we will use index(), to get the index of “11A” in class, like this:

ind_11A = classes.index(’11A’)

Now we can use this index to subset the student list, to get the students corresponding to ’11A’.

ind_11A

Out: 3

student[ind_11A]

Out: 41

Yeah, so here we built 2 lists and used the index to connect corresponding elements in both the lists.
And, it worked!
But, it’s a pretty terrible approach, in other words, it’s not convenient.

Wouldn’t it be easier if we had a way to connect each class to its student, without using any index??

This is where DICTIONARY comes into the picture.

Let us convert the class data to a dictionary….

It starts with “{” (curly braces), and inside curly braces, you have a bunch of what we call key: value pairs.
Keys and values are separated by : (colon).

In our case, the keys are class names, and values are the corresponding students.

school = {’10A’ : 52, ’12B’ : 35, ’10B’ : 65, ’11A’ : 41, ’10C’ : 53}

If now you want to find out the number of students in “11A”, you can simply type school and then the string “11A” inside the
square brackets.

school[’11A’]
Out: 41 

In simple words, you pass the key in square brackets and you get the corresponding value.

“THE KEY OPENS THE DOOR TO THE VALUE”: pretty poetic, isn’t it?

This approach is not only intuitive but also very efficient, because python can make the lookup of these keys very fast,
even for huge dictionaries.

A point should always be remembered while using dictionaries, that is:
The key should be unique.

SOME COOL FUNCTIONS OF DICTIONARIES:

  • To add a new key to the existing dictionary –       school[’12A’] = 45
  • To remove a key –          del(school[’12A’])
  • To check if the key is in dictionary –        “10A” in school
  • To copy a dictionary into another dictionary –       school2 = school.copy()
  • To create dictionaries having dictionaries –       Dict = { ‘Dict1’: {1: ‘G’, 2: ‘F’, 3: ‘G’}, ‘Dict2’: {‘Name’: ‘Geeks’, 1: [1, 2]} } 
  • To get a certain element from just one dictionary –      Dict[‘Dict1’][1]


So, these were some basic functions related to dictionaries. I hope now you are well versed in the benefits of using dictionaries.

Thank you for reading! 


Work From Home Machine Learning or Data Science Internship/Job

It’s a part-time job. Our entire team is from IIT. Most of the people are working in machine learning or data science domain. Although at every point we will be assisting you, it will be useful to have good command over writing skills. As a part of the job, you will be writing various articles related to computer science/machine learning/data science. If you have good command over python and programming, it will be a huge plus. For every article, you will get paid, and proper credit will be given to you. At a certain point, if we feel that you are good with some particular computer science skills, we can provide you a referral for the job.


Frequently Asked Questions:
  1. Whatever you are writing, it should be your property. Copying anything from anywhere is not allowed at all.
  2. We will pay you according to the difficulty level of your article/blog. If the article is about the latest computer science technology, you will be paid well as compared to if the article is about a very common computer science topic.
  3. We will pay you before publishing the article on our platform (via Paytm, Google Pay, PhonePe, PayPal).
  4. We may ask you to modify the article by providing you feedback.
  5. We will make sure. You learn new things. 

Please fill out the form, and we will contact you as soon as possible:

Evaluation Metrics for Classification (Accuracy Score, Precision, Recall, Confusion Metric, F1-Score)

Evaluation Metric is one of the most critical parameters of any machine learning project. If we talk about classification, we can use various metrics to evaluate our model, such as accuracy score, precision, recall,f1-score, etc. All these metrics are chosen according to the use cases, such as if we talk about medical uses, recall is very important. If the data-set is highly imbalanced, f1-score might be a good measure of the performance, but it is difficult to interpret. Confusion metrics is also one such performance evaluation metric, and it helps in visualizing and understanding our classification results. At the end of this post, you must know about the following things:

  1. 1. What is precision, recall?
  2. 2. What are the various components of the confusion metric?
  3. 2. What is the accuracy score?
  4. 3. What is f1-score?
  5. 4. When to use all these performance evaluation metrics?


Accuracy Score:

Accuracy score is defined as the ratio between the total number of correctly predicted points to that of the total number of data points in the dataset.

Accuracy Score = Correctly Predicted/Total Number of Data Points

Note: When the dataset is imbalanced accuracy score might be misleading. So in the case of the imbalanced dataset, we use f1-score to evaluate the model.

Confusion Metric:

It is one of the most used performance measurement tools for classification tasks. Here we will take an example of binary class classification to understand the confusion metric in a better way.

Each cell of the confusion metric, we will understand with the help of an example. Let suppose we are trying to evaluate a cancer detection model, in the model 0, 1 are representing patient don’t have cancer and patient have cancer respectively. 

True Negative (TN):

This cell tells that the patient did not have cancer, and the model also predicted that the patient doesn’t have cancer.

False Negative (FN):

It tells that the patient had cancer, but the model predicted that the patient doesn’t have a disease. It can be very dangerous because the model is not able to detect cancer, and the patient may die. In most of the medical domains, False Negative should be as less as possible.

False Positive: 

It tells that the patient did not have cancer, but the model predicted that the patient has cancer. If we talk about the medical domain, this scenario will not be hazardous because in the further diagnosis doctor will come to know that the patient did not have cancer, and the patient will not die.


True Positive:

This cell of the confusion metric tells that patient was having cancer and model detected it correctly.

Precision: 

Precision is defined as the ratio of True Positive with that of Total Number of data points predicted as positive by our model.

Precision = TP/(TP+FP)

Precision is an important criterion when we want to reduce false positives from the model. Let suppose we trained a model to predict rain, if every time our model is telling that it will rain (Means very high number of false-positive), such a model does not make any sense.

Recall: 

The recall is defined as the ratio of True Positive with that of the Total Number of positive data points in my data.

Recall = TP/(TP+FN)

The recall is an important criterion when we want to reduce the False Negative. Let suppose I want to train a model to predict whether the patient has cancer or not. If every time model is saying that the patient does not have cancer, it can be fatal for the patient.

F1-Score: 

As discussed in the first section, if the dataset is imbalanced, the accuracy score might be a misleading criterion to evaluate the performance of the model. In such cases, we use f1-score to assess the model. It is the harmonic mean of precision and recall.

f1-score = (2*precision*recall)/(precision+recall)
Like accuracy score, higher the f1-score of the model better the model is. 

I have practically implemented all these criteria using python on some dataset. You can see the code on this GitHub profile: You Tube Video Category Classification using Machine Learning 

If there is any mistake in any section of the blog please let me know in the comment section I will try to improve it as soon as possible.

Get Feedback on You Resume for Software Engineering/ Machine Learning/ Data Science Jobs

We are a group of IITians, working in some of the top product based companies such as Nutanix, Flipkart, Paytm, Amazon, etc. In this resume feedback process, we are going to review your resume. Based on the resume you submitted, we will provide you the feedback. We will suggest you submit the resume only in word or overleaf format, by enabling the sharing option on, so that we may comment or edit the things. Please fill out the below form:

You May Like:

  1. Everything that you need to learn to become a great data scientist.
  2. How to use Linkedin to get machine learning or data science jobs?