 So let's talk a little bit about linear regression. What do we have? Y is the sum of w1x1 plus w2x2, and so on and so forth. And at the end, we have a bias. This, of course, we can write as sum over d is 1, 2 big d of wdxd plus b. Alternatively, we can just write this as vectors w transpose x plus b. Now, there's this ugly thing, the bias. It feels like it makes our equations needlessly complicated, and in fact, it does. So there's a very simple way of making b go away, because b would be in just about every equation we have, which is we take the x, and we always just agree that we will append a 1. So x is what x regularly is plus a 1, and then the last weight adds the bias in that sense. And it's kind of like we almost always do it, because it just simplifies the notation so much. Now, in linear regression, we generally have mean squared error as our cost function. What is the loss? Now, mind you, what does the loss depend on? It depends on w, the weights, on b, the bias, which we will put into the weights. And it will also, of course, depend on the data, but that is implied here. So what is this? 1 over n of yi hat, the estimate, minus yi, which is the actual value, squared. And that, again, is w times plus x minus y squared. Here we have these terms. Now, linear regression. Linear regression is, in a way, the workhorse of machine learning. In fact, people always make jokes that people brag that as soon as they use linear regression, they use machine learning. In a way, much of machine learning is surprisingly similar to linear regression. For example, Amnest is a great data set that's often used in machine learning. You can, with linear methods, get it about 90% right. So that's quite a bit. And now, we can solve it, this problem, in terms of matrix algebra, that we probably will have seen before, where what we get is that the optimal weights are x transpose x. So that's the correlation matrix, to the inverse of this matrix, times x transpose y. Now, let's be clear, this is all in matrix notation here. This is distribution to the mean squared error linear regression prom, and you will probably have seen it many times. So there's different ways of solving it. Now, you can solve it by matrix inversion. Matrix inversion is relatively slow. Alternatively, you can solve it by gradient descent. And Pythosh makes this very easy. What do we need to do? We define a linear artificial neural network. We define a mean squared error squared cost function, and then we call a relevant training loop. So with the techniques that we have at our disposal, we can solve linear regression very efficiently. So now, what I want you to do is, with this data set, solve it in two ways. First, solve it by matrix inversion. Second, solve it by gradient descent. And I want to ask you, are the two results the same?