 Welcome to this video session. So in this video session, we're gonna touch the concept of learning curves in machine learning. Let us look at the planned outcomes for this video. So at the end of the session, by watching this video, we expect the student will be able to describe the concept of learning curves in machine learning terminologies. So let us see what exactly we mean by learning curves and why they are used, how they are useful. And if possible, we will try to see a small Google collab-based, notebook-based demonstration to look at the learning curves for some standard polynomial regression fit. So generally speaking, what exactly are learning curves? These are the curve plots of explained variance scores during training and validation of machine learning models. So now when I talk about the word explained variance, I talk about the expected measurement of variance which we expect in our ML model. So when we talk about the curve plots, they are two curve plots, one during the training phase and another during the validation phase of the machine learning model. Now, in most cases, in most cases, when we try to plot learning curves, ideally it is more relevant when we go for cross-validation, wherein the data set is divided into folds and the training just shuffles across the various folds of the data set and we also have a proper validation set available to make sure the training is very smooth. So in such processes, what we do is we have a separate curve plotted for training and separate curve plotted for validation. Now, often this learning curves are very useful in determining the tendency of a training process moving towards overfitting or underfitting. So this is a very peculiar beauty of learning curves and most machine learning engineers, they use this learning curves to look at their models' tendency to overfit or underfit and this helps them a lot in doing a proper feature engineering or stop at proper point and do a proper feature engineering so that the, or they can employ some kind of regularization to overcome overfitting or underfitting. So in a plain words, a learning curve shows how error metric value changes as the training set size increases during training and validation process. Now, for our purpose, we will be using explained variance as an error matrix. So you would find explained variance as a error metric or some indirectly, we can also call it as performance metric, just it would be an inversely proportional. We use various majors. For this video, we'll be using explained variance but you could find a lot of literature which uses other metrics as well. Some people use accuracy and some people use some other metric which is convenient to the context under which the ML learning model is being developed there. So, now looking at this diagram, this explains whatever we have discussed far more neatly. So what we do, let me use a mouse and hover and let you know what exactly. These are the two learning curve plots and the way we are interpreting that. So let us suppose on an X, we have a training samples. So this indicates the number of training samples being trained and this indicates the performance metric. So in our case, it could be an explained variance as well. And let us suppose this is the blue dotted line indicates the desired performance level. So initially, when I start training, so I would be at this point, wherein you will find that the performance metric of training is less and during validation, it starts with high because we have very few samples. When you have a very few samples, it's almost an indication of a very small set. So the performance metrics and this, what it appears here, during validation, it starts with high and during training, it starts with low point. As and when we increase training samples, they tend to converge to a proper point. So if this point of convergence is below the desired performance, it means that I am expecting predictions at this performance metric and I am getting a model's prediction at somewhat a gap, this. So this indicates bias. Okay, these are expected performance. The model is perfectly converging with proper training and validation, but still I have, it is deviating from actual expected values. So this gap is generally an indication of bias and as I said, whenever there is a bias, it's always an indication of underfitting. So this is what I was talking about, the ability of learning curve to let the ML engineer know the tendency of a machine learning model towards fitting towards underfitting or overfitting. So by looking at this second part of this diagram, you can see here, they exactly same, but now you can see that the convergence point is very, it's not so near. And you can see that there's a gap between the desired performance and the training performance and the validation performance. So this indicates there's a very high variance. So there's a high variance and they are not yet able to converge. This indicates two cases. Either it might require a few more training samples, maybe here it would converge at some future point or if you say the training data is enough, you cannot increase training samples, this indicates what we do. See, it just means that for a training, I have received this performance, but when I validate my model on the same training sample, I get some different performance metric. It means that it has deviated from your prediction. There's too much of variance in the model stability to predict. So this indicates high variance and as we have seen earlier, high variance indicates overfitting. So this is a perfect indication of overfitting. And this is what we were talking about. With the help of learning curves, we can find out whether the model suffers some high bias or high variance. So by looking at this, you might argue that what would be the ideal diagram. So this is something which is very ideal. And this is what every ML developers expects that their learning curves to be in this fashion. So you can see that the both training and validation by the end of training, they converge at the desired performance levels metric. So this indicates there's a good trade-off of variance and variance. That's what we were calling about. And this is what practically we expect such kind of ML models. So these are learning curves. Now, so I have a small Python notebook file. Let me show you. And let me see how learning curves and validation curves can be plotted on Google collab notebook files. So it's a small polynomial fit regression. Let me switch to collab ID. Okay, so I am on Google collab repository. I have this notebook file. I'll just explain you what exactly I'm doing and I'll explain the block of statements. What I'm doing is I'm doing some polynomial regression by using a Psykit's polynomial features and linear regression and pipeline. So I'm using numpy and I'm using my matplotlib to plot it and I'm using a small generating function which generates a value based on the normal distributions as mentioned here. So this generates a normal distribution. It gets random values based on normal distribution mentioned here. So let me execute this. Okay, sorry. Let me first import this. Let me execute two. Now, so these are import statements. Now I'm taking a sample data of 200 samples and I am making a test split of 6040. And I've just set some relative error margin to 1.0. Now what I'm doing is now I'm generating some sample X samples and for that I am generating a Y values based on the function defined earlier. Now, once I done, once I have a train test and test split what I'm doing is I'll just plot this samples whatever we have generated. So when I execute this, what you see is you see this sample this is how my test and samples have been distributed. Now, so to this, let us suppose if I want to fit the model and if I want to do the training and data set for this I will be plotting that. Now, when I'm plotting that, I take some samples in the degree. So since this is a polynomial regression fit I choose some random values of degrees from one to 21. Now, I hope you know that when we talk about degrees of a machine learning model specifically when talking about polynomial feature in the scikit it what it does it just denotes the the degrees we are trying to learn. So I hope you are understanding these are the degrees which indicates. So degree of two indicates more number of parameters to be learned, higher the degree more the parameters to be learned and it indicates a higher the complexity of the ML model to be learned. So what I'm doing is we are just varying the degrees and then we are just fitting the model and then we are plotting that model. So now if I execute this, now you can see that only for this one, what you can see is that as and when the degrees or you can relate this degree of fit as a model complexity. As in model complexity is increasing you can see that the convergence between validation curve and training curve is deviating. So it's a good indication that if the model complexity increases the tendency of ML model to overfit also increases. This is quite evident from the gap which I see between the validation curve and the training curve. So now if I want to plot this curve for three degrees so degree one, five and 15 you can see how my learning curves are being plotted. Now here you can see that it's a training samples and what you see here is explained variance. So this is my performance metric for this time. You can see when the degree is one then you can see a much of gap and this is not ideally expected for us. When degree is five, yeah, it looks like a bit a small expected one because it is deviating up and now this is not much recommended for us. So this looks a little bit better and the degree of 15 also looks slightly less expected or less preferred as compared to degree five. Okay, so you can see that more the model complexity they fit so you will see much of variance here. So I'm not talking about the desired performance level. So here you can say performance metric is 0.6 it is 0.8. Now it depends upon our concept what exactly we are trying to fit for this. So this is a notebook, fine. I hope you understood what exactly is a learning curve and how exactly it affects the, it indicates the, your ML models complexity and its tendency to overfit and underfit. It's very quite useful for ML engineers. So as a quick reflection, let us try to answer this question. So learning curves can help in knowing if we need more training samples to our faults. You can pause the video and you can try to recall the answer. The answer for this question is true. Yes, it always gives me an indication whether I need a more training samples or not by just by looking at the convergence point. So that's it for this video. Thank you.