 In this final video for 4.5, let's talk about the best approximation theorem. This is a pretty big deal as this will eventually lead to the forthcoming least squares problem. Suppose we have some subspace of fn, call it w, and suppose we have some vector y that lives in fn, y does not necessarily live in w. Then it turns out the orthogonal projection of y onto the subspace w, we're going to call that y hat for short. This orthogonal projection y hat is in fact the closest space in w to y itself. I want you to think of it like in the following way. Let's say w is a plane in r3 and we have our point y which lives here, which we can think of it as a vector. Maybe we see something like this, this arrow pointing towards it. Well the closest point that's going to live in w right here to y is going to be this orthogonal projection down here, y hat. The closest point in w is this orthogonal projection, so that's what you want to see here. The distance between y and y hat is less than or equal to the distance between y and w for any vector w inside of w. This is why we call y hat the best approximation of y and w. It's the closest vector in w to y. The proof is pretty slick. It's a quick application of the orthogonal decomposition theorem we saw in the previous video. Take any vector little w inside of w. Then we're going to take a look at the difference between y and w. We want to show that the length of this vector is bigger than y minus y hat, the orthogonal projection of y into w. First of all, we're just going to insert y hat into the equation. Take y minus y hat plus y hat minus w right there. Notice of course y hat minus y hat, they can't slide. Just get back to y minus hat here, excuse me, y minus w right there. But why is this sum important? This is actually the orthogonal decomposition that's guaranteed by the orthogonal decomposition theorem. Notice that y hat belongs to w, the orthogonal projection does. W is just some arbitrary element inside of big w. The difference is a linear combination of vectors in w. Since it's a subspace, it'll be in there. This vector belongs to w. On the other hand, we've already seen previously that y minus its orthogonal projection onto w is itself orthogonal to, it's orthogonal to vectors in w. This is something that belongs to w perp, as we've seen. So this is the orthogonal decomposition of y minus w. In particular, these vectors are perpendicular with each other. This is relevant because of the Pythagorean equation. If we have a sum like this, if we have this orthogonal decomposition, we know that if we take the norm of y minus w squared, this is going to equal the norm of y minus y hat squared plus the norm of y hat minus w squared, like so. Now these are all non-negative quantities because our norm satisfies this positive definite condition. If I were to remove some positive side, some positive quantity on the right hand side, that makes the right hand side get smaller. And so we no longer have equality, we have an inequality. y minus w length squared is greater than or equal to the length of y minus hat squared. Taking the square root, we've then established the inequality given to us by the best approximation theorem. Let's see an example of such a thing. Let's take two vectors, u1 and u2, 1, 2, 3, and negative 5, 4, negative 1. This will be a spanning set for w, but this is also in fact an orthogonal spanning set, an orthogonal basis for w. Notice if I take u1 dot u2, you end up with negative 5 plus 8 minus 3. That gives us a zero right there. And so then we showed in a previous video that if you take the vector y, that its orthogonal projection onto w is going to be this vector right here, negative 13, 16, and 3. Therefore, the distance between y and the space, the plane w, is going to be the distance between the plane and the vector is going to be the distance between y and its orthogonal projection. For which if we calculate the difference of this thing, we're going to take negative 9 minus 13 here. We get y minus y hat. This is going to equal a 4. If we take 20 minus 16, that's also a 4. If we take negative 1 minus 3, that's a negative 4. Like so, you can factor out the 4 out so you get 1, 1, and negative 1. And so then the length of this thing is going to be 4 times the square root of 1, 1, 1. That is, we get the distance of 4 times the square root of 3. So we talked about the distance we can find between a vector and a subspace, but can we do that for any affine set for any flat? Well, up to translation, every flat is just a vector space. So what we could do is if we have some flat, capital H, and it contains a particular vector, little h, we could just subtract little h from it so that h then becomes a subspace. It'll then be a subspace that passes through the origin. Let's now call it w. And so if we want to calculate the distance, so if we're looking for the distance between h and y, this is the same thing as the distance between w and y minus h. So we just translated the flat into a flat through the origin, aka a subspace. And then we calculate the distance there. Well, when you want to calculate this distance, this is going to be the distance between the vector y minus h and the orthogonal projection of y minus h, like so. But as this orthogonal projection is actually a linear transformation, y minus h hat is the same thing as y hat minus h hat. And reordering things, we get y minus y hat minus h minus h hat. And so this then gives us a formula to compute the distance. We compute the distance between a affine set and any point. So you have to take y minus its orthogonal projection and then subtract from that h minus its orthogonal projection, where h is any, of course, any vector in the flat. And y prime and h prime are the orthogonal projections of these vectors into the subspace w.