 Machine learning has changed our world in many ways. We have different methods to learn training data for classification and regression problems. Some parametric methods, like polynomial regression and support vector machines, stand out as being very versatile. They create simple linear boundaries for simple problems or nonlinear boundaries for more complex problems. So the big question, how are these algorithms so versatile? This is due to something called kernelization. In this video, we're going to kernelize linear regression and show how it can be incorporated in other algorithms to solve complex problems. So let's get to it. So kernelizing linear regression, let's start from the top. This is the output of a linear aggressor. X is a vector of features and W is a vector of weights and Y is the output real number or classification. It's a good starting point, but the problem here is that it's linear. Too simple for many applications. Let's change that. Here, Phi fx is a nonlinear basis function that adds more complex features. This is now the output of a polynomial regression. Note polynomial regression isn't the same as nonlinear regression. Phi fx is just square terms, cube terms, and polynomial terms in general. With this, we can create complex models. But how do we find Phi fx? Simple. Normally without the basis function and just considering the base features, we minimize the least squares cost function with respect to the weights. After vectorizing and simplifying, you get optimal values as x transpose x the whole inverse times x transpose y. I made a detailed simplification of this in my linear regression video. So check that out after this one. Now, what we want is the weights for the nonlinear basis function. So replace x with Phi. Want to include regularization? Just include lambda i. Lambda is regularization parameter and i is the identity matrix. With ultra regularization, this is now ridge regression. The point of a parametric model is to estimate the value of parameters W, which we can do, only if we know Phi. Or at least the covariance matrix, Phi transpose Phi. But that's pretty hard to compute, so you can clearly see the problem here. We're going to do this entire thing again, but this time with colonization. So consider our ridge regression cost function, J. Taking the derivative with respect to the weight vector and equating it to zero, we solve for W. To make things easier to look at, we'll call the first part alpha. So the weights become the dot product of the basis function and alpha. Once again, consider the original cost function and vectorize and substitute our weight W. Then we just keep simplifying. The repeated term is Phi Phi transpose, which is a square matrix called the gram matrix or kernel matrix denoted by K. Now we're getting to the juicy stuff. The kernel matrix Phi Phi transpose has elements that are the dot products for every pair of feature vectors. This kernel matrix is of significance because of two properties. One, it's symmetric, so the matrix equals its transpose. And two, the kernel matrix is positive semi-definite, so the product with any other vector and its transpose leads to a non-negative solution. I'll come back to why these properties are important in a bit, but for now, let's use these properties in our simplification of the cost function. First off, transpose the scalar term. We can do this because scalers are symmetric. We minimize this cost function to get the optimal value of alpha. Once we have the optimal alpha, we can express the weights in terms of this kernel matrix. So what's the difference between this set of weights and the one that we computed before without kernels? Before, we had to compute Phi transpose Phi, which is the covariance matrix. But now we need Phi Phi transpose, the kernel matrix. But wait, it looks like to compute this inner product as well, we still need Phi. However, that's actually not the case. I said before, this kernel matrix has two properties. One is that it's symmetric, and two is that it's positive semi-definite. According to Mercer's theorem, a symmetric positive semi-definite function can be expressed as the inner product of some Phi. In other words, we can rewrite every term in the gram matrix, such that it is a function of only the base features. This is the kernel trick, and the fundamental reason why kernels are so powerful. I'll show you with an example of how this kernel trick works. Consider this polynomial nonlinear basis function. We have the kernel matrix where every term is an inner product of two samples. Expanding these terms, we can see that this inner product of nonlinear basis functions can be represented as a function of the basis vector inputs, Xm and Xn. This is an example of a polynomial kernel function. Thus, we can compute the kernel matrix K without knowing the true nature of Phi. And in the same way, we can show that the different standard kernel functions don't require knowledge of the complex basis vector Phi. So we are able to express the kernel matrix in terms of the base features. That's great, but how do we make a prediction, the most important part? Well, what we really need is W transpose Phi fx, where X is the input base feature vector of the test subject and W is the learned weights. But expanding this, we can eliminate Phi transpose Phi fx by expressing it as a column vector of the kernel matrix. Effectively, to make a prediction, we don't need to know the true nature of Phi at all. This result is significant as it shows that even with a complex basis function, we can make predictions with only the base features. This is why kernel base methods can be so versatile and are used for simple or complex classification problems. I demonstrated kernelized linear regression in this video, but we can kernelize other algorithms as well. And that's all I have for you now. So if you liked the video, hit that like button. If you're new here, welcome and hit that subscribe button. Ring that bell for notifications when I upload. And if you're still looking for your daily dose of AI, click or top one of the videos right here on screen for another awesome video. I'll see you in the next one. Bye.