 Hello everyone and welcome to another episode of Code Emporium where we're going to delve into the curse of dimensionality. In machine learning, we typically have some information that we want to put into a model in order to make a prediction. And in order to enter information, we typically use a vector that consists of individual features. And each of these features are added to one dimension. And so increasing the dimensionality over here will allow us to put more information into the model that it can use to better infer this final prediction. And so dimensionality can be pretty useful. On the flip side though, adding too many dimensions can quickly turn into a curse. In order to kind of visualize exactly what the curse of dimensionality is, let's say that you perform an experiment of taking 1000 examples all in like one dimensional space. Then you will try to determine the pairwise distance between all of those points. Now two of these points will be the closest to each other and another two pair of points will be the furthest away from each other. Now this min distance and the max distance, let's determine the difference between them and say that that number is like 0.5. And so for just like one dimension that we might put over here, we probably get some 0.5 point up here. Now let's perform the exact same thing but with two dimensional data where we now have the min distance and a max distance taking the difference and then plotting it out. And you'll notice that as you keep increasing the number of dimensions of your data, finding these pairwise distances, the difference between the maximum and the minimum distances is actually eventually going to converge to 0. So in more plain English, this means that as the number of dimensions increase, the sense of relative distances and neighborhoods vanish. So if there's a point A, you cannot determine whether it is closer to a point B or a point C when the dimensions are extremely high. And this is the curse of dimensionality. Why exactly though does this happen? Well, let's try to look at points in one dimension space. We have seven points that are kind of lined up on this x-axis and they're relatively pretty close to each other distances aren't really that high. But as soon as we introduce this other dimension of freedom over here, which is the y-axis, we see that these seven points are now a little more spread out. And this spread is even more so when we add a z-axis along a third direction, these seven points become farther apart. And mathematically, we can actually see this is the case if we use distance metrics like the Manhattan distance or the Euclidean distance. Because adding dimensions, you'll always add a positive term to the distance. And so as dimensions tend to infinity, the distance between these points will also tend to infinity. And so you lose the concept of neighborhood and relative distances. Now that we kind of understood what the curse of dimensionality is and how it occurs, why does this curse even matter at all? Well, the curse of dimensionality here causes overfitting and this can affect different models in varying degrees. So the models that are the most affected by these are the k nearest neighbors, the k means, and decision trees. And a lot of this is to do with the fact that they are very heavily dependent on these distance metrics and distances between individual data points. Now in the case of k nearest neighbors, we have this point over here where we want to try to classify it based on its nearest neighbors. But because we don't have a sense of relative closeness to other points, it's not going to be classified very well. And so k nearest neighbors is going to have a very poor performance and different values depending on the training set, which leads to overfitting. And similarly now in the k means case, we want to determine the distance of every point from its nearest cluster center represented with x. But as distances increases, that means that there's no sense of relative distances, particularly in high dimensions. And so different sets of data may lead to different classifications of this point, which is basically overfitting. And hence poor performance for k means. In the case of decision trees, these are very interesting because decision trees on their own, they partition space for every single node split. And so if we have two points that are far, far away from each other in very high dimensional space, you can kind of draw a boundary that kind of looks like this, or maybe that like goes like this, or something that looks like this. The split can be multiple things and because the splits can change at every single time you might have slightly different training data, this is the definition of overfitting. One way to overcome this curse of dimensionality is to well increase the number of data points that you have to fill in that space. So for example, in the KNN case, now that we have points that kind of fill in the space, this white point now has closer neighbors for which we can find nearest neighbors. And so the classification of this point is much more stable and thus increasing the performance of this K nearest neighbors algorithm. We see a similar case in K means where more data points means that there is now a sense of relative closeness of the cluster centers to a specific point, which allows us to better classify a particular point. Thus K means also increases in performance. And in decision trees, now that we have more example data points in space, we can better draw a more efficient boundary that is less subject to variation from input training data. And thus in all of these cases, we have mitigated overfitting and thus somewhat overcome this curse of dimensionality. However, increasing the number of data points is actually a very tough thing to do. And so typically what we do is dimensionality reduction. We try to reduce the number of dimensions along which all of these points are projected and thereby trying to minimize the effect of the curse itself. On a final note here, I wanted to touch on the fact that the curse of dimensionality can cause overfitting, but overfitting can also be caused without just the curse of dimensionality. And good examples where this happens are linear regression, decision trees and neural networks. Now for a linear aggressor and for all of these, we typically mitigate this through some regularization techniques or also the dimensionality reduction techniques. So a good example of like how we would mitigate the overfitting for linear regression is through adding a penalty term. This penalty term can either be a ridge or lasso regression which will penalize the coefficients of our prediction, thus constraining the coefficient values to take on only specific set of values, hence mitigating overfitting. Now with decision trees, a way to mitigate overfitting is pruning decision trees because they might get too deep. And for neural networks, one way to mitigate overfitting is to do dropout where we randomly turn off selected neurons in order to help the neural network generalize better along different paths. And so I hope the concept of overfitting and how it intersects with the curse of dimensionality is a little more clear and also what this curse of dimensionality truly entails. There is a lot of reading material up on this too that I've come across that I'll link in the description down below, so please do check that out. In any case, that's all I have for today. Thank you all so much for watching. If you agree or disagree with some of these points and you want to call out things and add your ideas, notions and comments, add it down below. Happy to take a look at it. Want to grow a community here and I'm happy that I'm doing it with you all. So thank you all so much for watching and I'll see you next time.