 Namaste, welcome to the second module of practical machine learning with tensorflow 2.0. In this module, we will study basics of machine learning, this is more like a refresher we assume that all of you have basic background in machine learning, but this session is meant to be a refresher of machine learning and we will also understand some of the machine learning concepts using neural network playground as a visualization tool. So, what is machine learning? Let us try to understand machine learning from a programming perspective. I am sure all of you are programmers and have written programs to perform specific task with computer. So, how the programming is different from machine learning? We will try to answer that question first and then slowly go into basic terminologies of machine learning and various different modules of the machine learning systems. So, let us try to understand machine learning again from a programmer's perspective. Let us take two problems. The first problem is let us try to let us write a program to add two numbers a and b. Most of you will wonder what is a question? This is such a basic question probably this particular program is among some of the early programs that all of us have written. Yes, you are right. So, how do we really write this program? So, we essentially write a function a function f which takes two arguments a and b and then it returns plus b. So, this is the program that all of you are familiar with. We can add two numbers very very easily by writing a computer program. Let us try to solve a slightly different problem with the same technique and we will see whether we can solve it or if we need some more tools in our toolkit. So, the second problem is let us say we have bunch of handwritten digits. So, it is 8, this is 9. So, what I am doing is I am fixing an area in which you can write these digits and now the task is can you write a program to recognize these digits. So, your job is to write a function that recognizes digit given the picture digit image. So, can you write a program just as you did for addition of two numbers to recognize handwritten digits. You can think for couple of minutes and try to give answer of this particular question. Now, I can imagine that some of you must have started thinking about writing rules for different kind of numbers. Are rules really scalable? What if I write the number with slightly different orientation or I write number in a very different style? Probably rules will break, rules would not be able to cater to all the situations, but as a human being we are able to recognize these numbers. So, what makes us recognize these numbers? We will come to this question in a bit, but before that can we write down the process of recognizing these digits just as we did in the other problem where we added two numbers. When we were given two numbers a and b, we immediately came up with a step or we immediately came up with a function to add two numbers which was simply a plus b. But as you can imagine or as you must be facing right now is it is incredibly hard to come up with stepwise process to recognize the digits. So, how do we really solve this problem? And before getting into solving the problem I would like to I would also like you to think what is the difference between these two problems? Why am I able to solve the first problem very easily? But second problem is a bit of a harder problem for me to recognize digits with computers. What are the key differences in these two problems? In the first problem the formula to add two numbers was known to us. So, given two numbers a and b I can simply do a plus b and that gave me the answer. But in case of the second problem where I am when I am trying to recognize the digits I am able to recognize it with my vision, but I am unable to come up with steps that I can code up in computer. So, that computer can also start recognizing digits. So, we need to do something else. So, what is that something else? So, that something else is machine learning. Let us take a step back and try to understand why we are able to recognize these digits. You can think that we have we are we are seeing these kinds of digits right from our childhood. When we started our formal education we are introduced to these digits and when and we have also observed many people writing these digits. So, somehow our brain is trained to recognize these digits even if they are written in slightly different style or in slightly different orientation. So, I can easily recognize that this particular number is 8 and this number is also 8 even though they are written differently. So, can we try to mimic the training that we provided to a brain? Can we can we give the same training to computer? Let us try to explore that and this is the question that computer this is the question that machine learning tries to explore. Let me write down the key difference between the programming the traditional programming paradigm and machine learning. So, this is our traditional programming world where we have a program. We give some data as an input and we also input the rules rather we code these rules in program and then pass data into this program the rules get applied on the data and we get the output. We did exactly the same thing while adding two numbers. When you sort the numbers we also give step by step instructions to the computer as how to sort these numbers. Now, let us look at how machine learning operates. Remember the handwritten digital recognition examples example and we see that we have data, but we do not have rules. So, we cannot try traditional computer program, but we can actually provide lots of examples of hand written digits along with the corresponding digit. For example, I can say that this is the image and 8 is the digit corresponding to this particular image 9 is the digit corresponding to this particular image 2 is the digit corresponding to this particular image and this is 8. So, we provide we have lots of examples where we have images of handwritten digits along with their actual labels which are nothing, but the numbers that are there in the handwritten digit. So, we have data and we also provide the intended output as input to ML and machine learning comes up with rules or sometimes we also call as patterns. You can now see the clear difference here. Let us highlight that. You can see that the rule is on the left hand side here, the rule is on the right hand side here and the output which was on the right hand side had moved to the left hand side has moved to the input side. So, traditional program takes data and rules as input. The rules are applied on the input data to produce the output. In case of traditional programs, in case of machine learning we have data and the output as input given to the machine learning and machine learning comes up with rules or patterns or models that it sees in the input data. We will learn details of this particular process as we progress in this lesson, but this is the key difference between the traditional programming paradigm and machine learning. We will write down the steps in the machine learning process here. So, we have data and we have labels machine learning trainer. So, the trainer looks at the input data and corresponding labels or you have mentioned labels also as outputs earlier. So, let us call it as output to make it consistent with the earlier representation and this gives us a model or rules. The model is nothing but mapping of input to the output. So, once we get this particular model what we do is so, we can take the new data and pass it to the model to get the. So, you can see that once you get the model the process is exactly the same as the programming world because once I know the model I know exactly the formula to map the input to the output. So, the process or all the work that we do in machine learning training is to take the data and desired output and use machine learning trainer to come up with a model. And once we have modeled we can use that model to get output on the new data. Now, you can see that so, there are two stages in the machine learning process. This particular stage where we had data and we got modeled this particular stage is called as a training phase. There are two phases one is called training and there is another phase where we take the model take this new data and get the output. So, this particular phase is called as inference. So, there are two steps one is a training training is nothing but given data and output come up with the model or the formula and once we have the model we apply that model on the new data to get the output. So, you can see that inference and prediction is very very similar to the traditional programming paradigm. While this training is something new to all of you if you have a programming background and we will try to understand this particular process training as well as inference in detail as we progress in this course. Now that you have understood how machine learning algorithms are different from the traditional programming world and you have also understood two broad steps in the machine learning pipeline. It is a time to go through some of the terminologies and understand them in a bit more detail. So, what are the key components? So, first is the first component is data. Data is an important clinic visit of machine learning. You must have heard a term called data is new oil and data is indeed very very important if you want to train machine learning models. If you do not have data we probably would not be able to train machine learning models. So, data is first and the most important aspect or important input or important clinic visit for a machine learning model. So, what are the some of the examples of the data? So, in the example that we saw the handwritten images are example of the data. So, there are multiple images we use normally x superscript i to represent i data point. So, let us say D is the data and in this data we will have lots of images x 1, x 2 all the way up to let us say if there are n images x n. So, these are all the images in the data. So, this is an example of handwritten images. If you are trying to predict the price of the house based on some of the attributes of the house. So, in that case in that case house is a data point. So, we will have bunch of houses over here. So, what happens is that data has two important components one is called features and second is called label. Features are nothing, but attribute of an item. In case of handwritten digit what could be the features? So, if let us say our handwritten digit image is in 28 by 28 grade the value of each of the pixel is a feature for this handwritten digit problem. In case of housing price prediction, house price prediction what could be the features? The features could be number of bedrooms, area in square foot, let us say a distance from school and there could be many more such kind of features. These are some of the examples features that I am denoting. So, those are the features in case of housing price prediction. Now, the label is the thing that we are interested in predicting. In case of handwritten digits the label is the digit between 0 to 9 because our task is to take an image and predict one of the 10 labels. So, labels in case of handwritten digit could be 0, 1, 2, all the way up to 9. So, there are 10 possible labels in case of handwritten digit recognition. In case of housing price prediction these are all features and the label is the price of the house. Price of the house is the label in case of handwritten digit recognition problem and these are all features. We denote each feature using subscript. So, we can say that this is feature x 1, feature x 2, this is feature x 3 and we use superscript inside bracket to denote the index of this particular data point in the data matrix. So, what we have in data concretely is a feature and a label and features for i data point are there are n features x 1, x 2 all the way up to x n and then there is a label which we denote with letter y and we also use the same superscript to denote that this is the label for ith item or ith data item. So, data you can think of this as we have pairs, we have features and we have labels and we have n such objects or n such items in the data. So, this is the basic information about data. Now, let us first focus on labels and then we will come back to the features. Now, what happens is depending on the label we get different types of machine learning problems. So, we just saw that there are two types of label in case of handwritten digit recognition, we had labels which were discrete quantities. Labels were one of the 10 digits 0 to 9. In case of housing price prediction the label was more of a continuous quantity for example, housing price the price can be any real number. So, in housing price prediction we had label that was a continuous quantity or continuous number and in case of handwritten digit recognition we had a discrete quantity. So, we can have before even getting into the type of label we first check whether the label is present or not. If label is present label can either be present or absent. If label is present we call the corresponding machine learning algorithm or technique as supervised learning algorithm. If the label is absent we have unsupervised learning techniques or unsupervised learning models. What are some of the examples of supervised learning? Handwritten digit recognition where the input has the images as well as labels is an example of supervised learning. Housing price prediction where we have attribute of the house and the price of the house is also an example of supervised learning. What are some of the examples of unsupervised learning? Can you think of some of them? One of them could be if I want to group students based on their attributes. I do not really know what are the classes of the students, good student, bad student, average student I do not have any of those ideas. So, label is essentially absent. So, all that I want to do is I want to group students based on some of the attribute. So, this is an example of unsupervised learning. If label is present now we have further classification of the supervised learning models. So, let us use different color. So, labels can either be discrete continuous. If you have discrete label we call that supervised learning problem as a classification problem. We have classification problem and if label is a continuous number we call those supervised learning problems as regression problems. And we have seen the examples of a classification problem and regression problem. So, the hand written digit recognition that we are trying to solve is an example of a classification problem whereas, housing price prediction is an example of a regression problem. Now that we know you know different type of machine learning algorithms based on availability and non-availability of labels we go back and try to understand other component of the data which is feature. So, we can also have features of different types. The simplest features to handle are numeric features. Numeric features we essentially have numbers. For example, we had number of bedrooms in the housing price prediction problem and we also had area of the house in square feet that is also an example of a numeric feature. Number of bedrooms, area, these are examples of numeric features. We also have other type of feature which is called as categorical feature. In this case we normally have values coming from some finite set. For example, the name of the city we do not have the numeric representation of name of the city, but we get multiple strings in the categorical attribute called name of the city. Mumbai, Chennai, Pune, Bangalore can some of the examples of the city and city makes a categorical attribute. Second example of categorical attribute could be the color, red, orange, green. These are also categorical attributes because we cannot represent them numerically and these attributes come from some kind of finite set. We have to understand that when we are doing machine learning and when we are trying to build models, we only can input the numbers. So, now what we need to do is we need to take these categorical attributes and convert them into number. Let us see how we can achieve it. And for simplicity, let us say there are only 3 cities Delhi and Chennai in our data set. So, one way in which we can represent these categorical attributes is in form of one hot encoding. Let us try to understand what one hot encoding is. So, instead of using city as a single feature, we said that there are 3 features, one corresponding to Mumbai. Let us call this as feature underscore Mumbai, one corresponding to Delhi, feature underscore Delhi and one corresponds to Chennai, feature underscore Chennai. So, instead of having a single feature on the city, we converted that into a representation where we have 3 features. And whenever the city is Mumbai, we switch on the corresponding features to the city over here. So, in this case, we put one corresponding to feature for Mumbai, which is F underscore Mumbai and 0 in other cities. Similarly, if we get Delhi as the city, we put 1 in the column of Delhi and everything else will be 0. Similarly, if we have Chennai as a city, we put 1 only in case of Chennai. So, you can see that this is called as one hot encoding. One hot encoding is one way of encoding the categorical features. The other way of encoding categorical features could be based on hashing or embedding. We are going to look at these advance way of encoding in later in the course. So, now that you know the basic terminology of machine learning around data features, labels and you also know how machine learning is different from writing traditional computer programs. So, we will stop here in this particular module and in the next session, we will continue this exploration and understand more machine learning terminologies like model, training and lot more other terms. So, hope you had great time understanding these concepts. Namaste.