 Hello, welcome to this video session. In this video session, we will be touching upon the concepts in machine learning, namely these concepts are called as overfitting, underfitting and bias and variance in machine learning. So let us see what outcomes have been planned for this session. So at the end of this video, it is expected that the students will be able to describe the terms overfitting, underfitting and variance in bias in machine learning, specifically talking about while building the machine learning models. So let us look at what exactly these terms are and we will look at the conceptual overview of these terms. And once done with that, we will also look at the small code snippet wherein using Google collab, we will be able to simulate or we will be able to generate the bias and variance and the effect of degree of ML model on bias and variance as well. So what exactly is the overfitting? So talking about overfitting, in terms of general machine learning terminologies, it is nothing but it's a machine learning models, tendency to match the training data so closely that the model eventually becomes incapable to make correct predictions on new data. So it is so tight fitted with the trainings and it is so highly trained on training data that it almost matches the noise patterns in the data as well. So this affects the model's capability to predict for extrapolated data values or in talking about in some other language. The model's performance, when we provide a new data to this model for predictions, that time the model's performance, most often it becomes a miserable. And in general term, we call them overfitting. So to be precise, it happens when model gets trained on the noise in the training data to the extent that it adversely affects the performance of the model on the new data. So this is overfitting. Now let us look at what exactly we mean by underfitting. So what do we mean by underfitting? It's generally a result of training process which fails to capture the complexity of a training data resulting into a model with a poor predictability. On the contrary to the overfitting, in the underfitting, the model tries to be very simplistic. So the algorithm or the model design is so that the model learns and it tends to be very simplistic that it never captures the true essence of the predictability present in the features of the data set. So this generally results of poor predictability. And in these cases, we call this as an underfitting issue. So what happens generally the result for underfitting is because of training on very few epochs or training on very few samples of the data set. This generally results into underfitting. So given that we have seen overfitting and underfitting and in general talking about machine learning model. So in machine learning, we talk about a concept called as generalization error. So it's a general representation of the error in the ML model. When I talk about error in the ML model by error, I mean that the slight deviation of ML model from actual expected values. And idealistically, we cannot expect any model which is perfect in all sense under percent is would be an exaggerated statement. So let us look at two further majors which we call as bias and variance. So let us look at them. So when we talk about bias, there are a lot of biases in ML and this bias term is used for various different meanings as well. So what we are referring here is we are precisely referring to a bias called as prediction bias. So what we mean by prediction bias? In the prediction bias, more precisely what it says is it's a prediction bias. It's a value which indicates how mean value of predictions differs from the mean of actual expected values in a data set. It means that let us suppose we have a set of actual values expected and we have a set of predicted values for a specific data set. So the mean of predicted values, how far it differs from the mean of actual expected values, this value, this entity value which is representing the magnitude of this deviation is called as prediction bias. So since it's a difference, mean difference, it's generally represented as a bias square and that indicates a square difference. So this bias value is somewhat, it's a squared difference value. So now given understanding of bias, let's also look at the understanding of what we exactly mean by variance. So when we talk about variance, we are actually talking about the standard statistical variance. But here when we apply the concept of statistical variance, what exactly we are referring is, here the variance is the standard statistical variance observed in the predicted values by the model. So we are more interested to know the variance in the values being predicted by the model. So generally speaking, what exactly it is? So generally speaking, this is the squared value of deviation from the mean of predicted values of the data set. So let's suppose my ML model is predicted set 10 values, we take a mean and we calculate the variance on that. So it's a squared value of deviation from the mean of predicted values of the data set. So roughly speaking, if we want to talk about bias and variance, so bias is nothing but how the mean of your predicted values deviating from the mean of actual expected value. In the variance, what we do is we are taking about how the predictions of the ML model vary. If they vary too much across various folds of training data set, then it's a bit of concern. So what we feel is no matter how and what kind of training sample is passed to my ML model, the predictability of my ML model has to be trustworthy and credible. So if it varies too much, it indicates that the training is not smooth. And the predictability of the model can't be trusted or much. So that's what exactly when we are talking about variance. Now, given the understanding of overfitting, underfitting, bias and variance, let us quickly look at what exactly is the relation among these four these concepts. So talking about what exactly is the difference here. So now when we talk about relation between four these terms, this is what we have to be very careful in looking at. So in general, what we do is we take a concept called a generalization error of machine learning model, which is nothing but which is a square of bias plus variance. So the bias is with the squared value. Then what we expect is any model is expected that it has a very less amount of generalization error. The least amount of generalization error is ideally expected. So ideally, a perfectly learned machine learning model is supposed to have a very low bias and very low variance. But it's very controversial rather than saying controversial, I would use the word tradeoff, which says that achieving low bias would always result into higher variance achieving higher variance would lower variance would always result into high bias. So you have to perfectly trade off the values of variance and bias so that your error level of generalized error is under the acceptable levels. So that's also famously called as the bias and variance tradeoff. So it is to be very importantly noted that that always a high bias means poorly learned ML model or very simple model learned always indicates that it is what I meant to say that a very high bias indicates under fitting and very high variance means your model's predictability is varying to a very high degree. Then it indicates that your model is most probably overfit to a specific training set so that for a newer training sets it differs to a very high degree. So it's to be very important to note that high bias is always present whenever you find an under fitting in your ML model and high variance is a particular indication of overfitting in your ML model. So now I'm talking about the bias and variance tradeoff. We can see here that generally when we have the model complexity and the error metric you can see that let me use the mouse and let me tell you hover and explain what exactly it is. So generally the variance at the initial as and when model complexity is less I mean see now the model complexity in rough words can be equated to the number of features to be learned or the number of parameters and hyper parameters to be learned in the model. So complex model means more parameters and more features to be learned and during the process of training as and when training gradually increases we always see that the complexity of model increases. So this can be also be considered as increase in training as well but in general we represent this as a model complexity. So now talking about this as and when the so this curve this is variance and this is bias. So initially we have bias and as and when the model complexity increases the bias keeps on decreasing but as and when model keeps on increasing the training there is always a risk of overfitting and it increases the variance. So you can see that optimally we always expect the our models curves to stop at this point which we call it as optimal model complexity. So neither we want a model which is less complex than that nor a more complex model. This is what in ideal expected for every ML engineer and he always tries to design a model which comes under this frequency here. So now talking about this what we do is this side is called as underfitting and this side is called as overfitting and as in this is an error curve. So you can see the error always decreases and you can see that the error also already increases if it is poorly underfit error is also high. If error is also complete if the model is very overfit the error is also high. So I ideally expect error to be the least. So this is famously called as bias versus variance trade off curve. So as a quick reflection let us discuss what you feel like you can pause at this point and you can try to answer what is the answer for this question. So high variance indicates overfitting true or false? Yeah. The answer is true. So this is the bibliography I have referred some of the concepts have been taken from Google ML glossary. Thank you. That's it for this video.