 In the process I described earlier, I glossed over some extra steps that OpenGL performs. We could in principle go straight from view space to screen space, but in OpenGL what actually happens is we go from view space to what's called clip space, and then from clip space to what's called normalized device coordinates, and then out to screen space. So what is clip space? Well, it's so called because once we have things in clip space, this then is where a clipping test is performed on each vertex. Once we're in clip space, if either X, the Y, or the Z do not lie in the range of negative W to positive W, then that should tell us that the coordinate is outside of our frustrum. It's not going to be visible, and so it should be clipped. So to get from view space into clip space, firstly the W of clip space is simply the one and the same value as the Z value from view space. That's the easy part. To get the X of clip space, we're going to take the X of view space, multiply it by the focal length, and then we also divide it by half the width of the view plane. The view plane again is the imaginary grid of pixels that is in our 3D world centered on the origin, but then sets some distance, a focal length distance, up the Z axis. And we imagine it to have some width and some height that effectively determines the aspect ratio of our output image. So that's how we get a clip space X's, and for the Y's it's the same deal, except instead of course for the view space X, it's the view space Y, and it's the height of the view plane rather than the width. To get the Z of clip space, however, we do something very strange, that it's actually unrelated to what we're doing with X, Y, or W here. So it's kind of an unrelated story, because past this point, the Z value is only going to really be valid for depth testing. It's not going to factor into the rasterization algorithm. So something quite strange happens with the Z's, we'll explain in a minute. But anyway, why are we scaling the X and Y's by the focal length, and why are we dividing by the dimensions of the view plane? Well, notice that in clip space, we're in a sense partly accounting for perspective. We have multiplied by the focal length, but we haven't yet divided by the Z's. That comes later. That's why we're saving the Z value in W, so that when we go to normalize device coordinates, then we can do the division. We can divide by W, which is effectively what the Z value was. But first, one way to think of this is that by multiplying X's and Y's by the focal length, we in a sense are normalizing our scene as if it has a focal length of one. Because say I have a focal length of two, and I want to change my focal length to one, yet still get the same image. Well to go from a focal length of two to one, you're dividing by two. To get the same resulting perspective image, you would multiply the X's and Y's by two. Because you account for perspective, we multiply by the scaling factor, which is the ratio of the focal length over the Z value. So by multiplying in the focal length, but not yet dividing by Z. In a sense, we've redefined our scene to have a focal length of one, and yet it's still all going to look the same. So that explains that part. That's why we're multiplying X's and Y's by the focal length. Why are we dividing X's by half the width of the view plane, and dividing the Y's by half the height of the view plane? Well, consider a point that is on the view plane and say on the right edge. So let's say it's X value is 500, and let's say our view plane has a width of 1,000. Half the width is 500. If we divide our X value of 500 by 500, we get one. And let's assume that our scene has a focal length of one, so the Z of this coordinate is one, and also the focal length is one. So now when we test whether this clip space X is in the range of negative W to positive W, well positive W in this case is going to be positive one, and our X is one. So our point, which was on the edge of the view plane, just barely makes it. It just barely passes the clipping test, which makes sense because if the X value were any greater, it should be clipped, it would be outside the frustum. Now imagine we have a point that's not on the view plane, but it is also on the right edge of the frustum, it's somewhere on that plane. Well when we divide half the width from that X, we're going to get an X value, which should be equal to its W. And so you can see that if the X value were to exceed the W value, it would be out of the bounds of the frustum. And the same logic applies for negative values on the left side of the frustum, and that applies for positive Y values for the top of the frustum and negative Y values for the bottom of the frustum. So note here that our frustum is effectively defined by our focal length and the dimensions of the view plane, it's within its height. So it makes sense then that any clipping test would have to factor those things in. Now, as for the Z values, we do something quite strange. Let's say that an eucliping plane is defined at, I'm just making up a number, 10, and our far clipping plane is defined at 1,000. Totally arbitrary number is not really realistic, but just go with it. What we want to do is scale and translate all the Z values such that the range that ran from 10 to 1,000 will now run from negative 10 to 1,000. I'm not going to bother showing you the formula for this, it's standard linear interpolation stuff. But that is what we want to do here. And one reason we do this is because now to test whether the Z values of our coordinates, whether they should be clipped, we can do the exact same test as we do for Xs and Ys. Say that I have a coordinate which is on my near clipping plane, so the Z, again, let's say is 10. In clip space, the Z value is going to be negative 10, and the W is going to be positive 10, but the Z then would still be in the range of negative W to positive W, so it wouldn't be clipped as we would expect. And likewise, for a point on the far clipping plane, its Z in clip space is going to be 1,000, its W will be 1,000. And again, that means that its W is in the range of negative W to positive W. If, though, we had a point whose Z was, say, 1,001, then its clip space Z is going to be something larger than 1,001 because of the scaling effect. We've gone from a range and expanded everything into a slightly larger range. So, so every Z value beyond the far clipping plane is getting its value increased a bit, but the W is still going to be the old Z value 1,001. And so, the new Z is not going to fall in the range of negative W to positive W. And so it's going to be clipped. And then likewise, if we have some coordinate with the Z value that's too close to the camera, it's on the wrong side of the near clipping plane, like say the value 5, then when we scale the values, let's see, Z, it would be somewhere around like negative 6 or 7, I suppose, after scaling. I'd have to do the actual computation, but it'd be something like that. And that would mean that its Z value would not be in the range of negative W to positive W. So, this peculiar rescaling process of the Z values is allowing us to do the same test, and I suppose that's a little bit of a savings. On the other hand, the rescaling process itself is arguably adding in more work, except in practice what's going to happen is we're going to take our projection transform that gets us to clip space from view space and combine that with our view space transform. So, for every vertex in the world, we're not applying two matrix multiplications, we're just doing the one. We only have to combine the view matrix and the projection matrix once for everything in the whole world, right? So, because of that, the rescaling of the Z values we're doing for clip space isn't really an extra cost. So, I think that's one rationale for this whole odd scheme. And note that this clip test only works because of we factored in the focal length. If we didn't normalize our scene to a focal length of one, this clip test wouldn't work for the Z values. So that's one reason we factor in the focal length. But to understand the other reason why we do this funny rescaling of the Zs, consider what happens next when we translate into normalize device coordinates. And that's done simply by dividing the X, Ys and Zs of clip space by the W of clip space, and then for the W of device coordinates, we just take the inverse of the clip space W. And now what's happened after dividing by W is, well, we've accounted for perspective, yes. But also all of our coordinates now are in the range of negative one to positive one. Because our clip tests bounded vertices in the range of negative W to positive W, when we divide by W, then the new value is going to be somewhere in the range of negative one to positive one. What's really strange about these normalize device coordinates, again, is the Z values because we rescaled to a different range in clip space. And now we've divided them all by their respective Ws. And now our Z values have no linear relationship back to the original Z values. However, we've effectively preserved the scaling factors in W. What these Zs are going to be used for though, is depth testing. Our Z values have now been remapped in a way where those values that were closer to the near clipping plane, they now are occupying a significantly larger range of the NDC coordinate Zs. Whereas those values that were closer to the far clipping plane further in the distance, they've effectively been squeezed into a smaller proportion of the range. For a Z range of negative one to positive one in that range, the points that were closer to the camera are in a sense hogging more of the range space. And this turns out to be a good thing because of imprecisions of floating point. When we fill in our triangles and draw pixels, we're going to check for that pixel, what the Z value is for the pixel already drawn there. And we only want to draw a new pixel there if it's Z value is lesser, if it's closer to the camera. And through this Z depth test, we ensure that the right pixel is drawn on top of anything behind it. But the Z values have a limited number of bits of precision. And so we can end up with scenarios where sometimes the wrong pixels are drawn on top. And because we can't have infinite precision numbers, we can't avoid this entirely. But if we can choose, we prefer it not to happen for things drawn close to the camera for those things further away. It turns out that in practice, depth test errors of things in the foreground of a scene tend to be much more noticeable than depth test errors for things distant in the scene. And so by squeezing and expanding the Z values in such a way that the Z values closer to the near clipping plane, closer to the camera, take up more of the range, you're less likely going to have depth test errors for things closer to the camera. Whereas on the other hand, we've now effectively increased the likelihood of depth test precision errors for things further away from the camera. But it does tend to be the case that we have most of our geometric complexity, most of our complexity of the scene for things closer to the camera. So this makes sense. So that's why we do the funky business with the Z values. Last thing to do is to transform from NDC to screen space. And this transformation is fairly simple. This is where we're going to be doing our rasterization. So we just want to carry the W values verbatim from device coordinates. Those are unchanged. The Z values, for reasons I don't understand, they get translated in a range from zero to one. You add one to the Z value, then divide by two, and now effectively they all line the range of zero to one. I don't know why this is done, but that's how it works. As for the X and Y values, these are scaled and translated according to how we have defined our view port. Not to be confused with the view plane and open GLU to find the view port, which is for the window we're drawing into, how large of the area of that window are we drawing? That's the width and height of the view port. And this rectangle we're drawing, where is the bottom left, respective to the window's bottom left? So we have an X and Y offset value as well. So you take, you say your X value, you add ones, and now its value is ranging from zero to two. If we divide by two, then everything is in a range of zero to one, and then you multiply by the width, and now everything is in the range from zero to the width. And lastly, we add in the X offset. And then it's the same thing for the Y values, except you use height instead of width. Now, once we're in screen space, this is where rasterization is done. And even though the X and Ys have been scaled and the size of our view plane grid in a sense has totally changed potentially. The relationship between the Xs, Ys, and the scaling factors associated with them still holds. And so we can form a planar equation between the Xs and Ys of screen space. And the scaling factors here stored as the Ws. We can get a planar equation, and we can use this planar equation to find the very center coordinates and the interpolated attributes for each pixel. Because the linear relationship between these screen space triangles and the original 3D triangles still holds. We did funny business with the Zs, but we're ignoring them now for this rasterization purpose. So don't get confused by that. Despite all the scaling of the Xs and Ys, the linear relationship still holds. Again, the good news is you don't really have to understand this, because what our job is going to be in OpenGL is to using vertex shader code, determine what our vertices are in clip space, and from there, OpenGL in the hardware does all the business of translating to screen space and rasterizing. And when the fragment shader runs for each rasterized pixel, we're gonna get the interpolated vertex attribute values handed to us. We don't have to compute all that. I do, though, however, think it is nice to have some idea of what is going on in that process, even if you don't understand all the details. Oh, one last thing here I didn't note. So for the screen space, when we define our viewport in the projection matrix, when we go to clip space, we're specifying effectively the aspect ratio of the view plane. If our view plane and our viewport don't have the same aspect ratio, you're gonna end up with a scene that is potentially stretched and squished horizontally or vertically. So generally you're gonna want your viewport to have the same aspect ratio, but you don't have to. You might wanna do something strange, but that is usually what we want. One more little detail, actually, I have, in a sense, lied to you about one detail of OpenGL, and that is that in view space in OpenGL, the camera is looking down the negative z-axis, not the positive z-axis. And so in the formulas I showed you, there are some cases where signs need to be flipped, but I just thought it was easier to show the formulas and think in terms of a positive z-axis rather than negative, so it's a small little detail there.