 Very broadly what we do in 3D rendering is we take a bunch of individual models that are defined in terms of their own local coordinate systems in terms of their own local space. We transform those into a common world space with different models positioned and rotated in different positions as we see fit and then we account for the position of the camera by translating all the coordinates of our world into so-called view space or camera space and Lastly, we transform from this 3D view space into a 2D screen space into a two-dimensional image Things start out in their own local coordinate systems each model that we have separately defined in modeling programs like Maya and Blender and such And then we're going to bring them together into one world space Well, when you do that each model has its own transform matrix because they're transformed in different ways So for each instance of every model We apply the transform matrix of that instance to all the vertices of that model And now all the vertices of all our models are in one big world space Now we could just render from world space because as we'll explain the rendering algorithm assumes that you have a camera sitting at the origin and Either looking up the z-axis or down the z-axis that is usual configuration You could have a pointing up or down some other axis, but usually it's the z-axis So let's say that our camera is at the origin and it's pointing up the z-axis and as the algorithm works It translates from 3D space into an image with the camera assumed to be at that position But what if we don't want the camera to be at that position? Well, you could tweak the algorithm to account for different camera positions and orientations But the simplest thing to do what's actually done is we instead of moving the camera We move everything around the camera because if say you have the camera at the origin Looking at the z-axis and you want the camera to move say two units down the x-axis Well, you'd end up with the same image if everything in the world instead moved two units in the opposite direction on the x-axis So whatever transformation you want to apply to the camera You get the same effect as if instead you apply the opposite transformation the inverse transformation To everything in the world all the vertices in world space So this is in fact what we do we figure out well I want my camera to be translated here from the origin and rotated this way You figure out what that matrix looks like and then you find its inverse and then with the inverse camera matrix aka view matrix That is what you apply to every vertex in the world space and now everything will be in view space It will be in the right position relative to where we want the camera now with everything in view space aka camera space now we apply our rendering algorithm and Translate from a 3d world into 2d and this last step is done with what's called a projection matrix So the rendering process is for every vertex of an instance We apply a so-called model matrix and now everything is in world space and then to everything in world space We apply the view matrix aka camera matrix and then we apply the projection matrix onto every vertex in view space To get everything in terms of screen space Now what does this projection matrix look like well first? What I think is useful to imagine is at the origin say we have this so-called a view plane Which you can imagine to just be a rectangle of some width and height with its center on the origin And it is flat on the XY plane and you can think of this as being the imaginary window into which we are seeing the 3d world And if we're creating our image with what's called an orthographic projection Well, this is not a realistic kind of image It's like a blueprint where parallel lines in space do not converge in the two-dimensional image But this is the simplest possible kind of projection from 3d into 2d You can imagine if you split up the view plane into a discrete grid of pixel centers corresponding to the pixels of our screen space and imagine a ray of vision Projecting perpendicular out from the XY plane and end up the z-axis assuming our camera. It looks up the z-axis Just imagine these rays shooting out and the question is for each ray What's in our three-dimensional geometry? What polygon surface does it collide with if any and at that point of collision? What is the color of that surface and that is what we want for the color of the corresponding pixel? So from the pixel you're shooting a ray of vision It hits something and the question is what is the color there? That's what we want the color on the pixel today And in this orthographic projection all of these rays of vision are just shooting out perpendicular from the XY plane So they're all running in parallel To simulate what the human eye sees or what a camera sees we want though a perspective projection and for a perspective projection Imagine that our view plane instead of sitting on the origin Imagine we've moved it up the z-axis some distance from the origin And that distance will call the focal length and now imagine at the origin which we can call the focal point It's our it's our center of vision Imagine that the rays of vision all run from the focal point Through each one of the pixel centers on the view plane And so they're fanning out from the center both in the X dimension and the Y dimension and This basically describes how human eye sees or camera sees in reverse though of course because in physical reality light It's coming from those directions towards our eye But we're imagining here rays of vision going in in the opposite direction and what they collide with colors at those points of collision That's what we want to see for those respective pixels That basically describes what's called ray tracing where for each pixel we're computing that ray of vision Finding its collision with our geometry What does it collide with first and we actually go through the collision computations and this works It produces an image and in fact it has many virtues in terms of producing the best quality images But unfortunately it is very slow. So it's not how things are done in real-time rendering At least until very very recently very recently GPU hardware has introduced some built-in ray tracing hardware assistance that is beginning to make ray tracing Nearly viable for real-time games But as of very recently real-time rendering has always done the alternative to ray tracing which is called rasterization Where for each one of our 3d polygons? We project backwards and figure out where those vertices what their corresponding points on the view plane is and so for say for my 3d triangle I can figure out what the corresponding 2d triangle in the view plane is Accounting for perspective and then we have to somehow fill in the pixels and figure out for each pixel inside that 2d triangle What are the corresponding points on the 3d triangle? But anyway this first step of for our 3d vertex figuring out what its corresponding vertex is on the view plane There's a simple formula where you take that 3d x Multiply it by z of the focal length and then divide by z of the coordinate itself And that gets us here x prime the the x on the view plane that corresponds to x the 3d coordinate And we do the same formula for y just subling in y instead of x and this very simply is how from the 3d coordinate We get the corresponding point on the view plane if you want to understand why it works It basically just has to do with the geometry of corresponding right triangles as again I'm explaining in the prior video. It's actually not complicated What is complicated though is the rasterization process where we have to fill in the pixels on the view plane in between So triangle rasterization. How could we do this? Well, there's a fairly obvious solution, which unfortunately is quite expensive What you would do is well, we have three points in 3d space and given three points in 3d space You can figure out the planar equation And for each pixel we can compute the equation of the 3d line that runs from the focal point and through that pixel center And with these two equations we can solve for where x, y and z satisfies both equations And this is doable. It's just expensive because first you're finding the equation in line for every single pixel That's actually the cheaper part, but then finding the intersection that actually is a little expensive more expensive At least than what we ideally would want to pay for each pixel The other problem with this method is that actually what we really want to know about a pixel is not what is the corresponding 3d Cartesian coordinate what we really want are interpolations of the vertex attributes What are vertex attributes? Well, they can be whatever we want We can just associate with each vertex some piece of data That's gonna have some meaning in how we want to render But the most common kinds of attributes that we attach to our vertices are RGB colors UV texture coordinates and normal vectors Color values can be used to determine what the colors of the pixel should be and Normal vectors will see considerably later when we bring in lighting calculations You want to have the normals often Influence how the lighting is performed and generally what we want to achieve is a smooth lighting effect on our geometry So that sort of disguises the angledness of our little triangles, but that'll come much later So let's just consider texture coordinates. You want to paint a texture onto your triangles Well, the way this is done is that we have some two-dimensional texture image Which is considered to have its own coordinate system of UV coordinates where you use the horizontal axis and v is the Vertical axis with the origin usually in the bottom left as you see here in the picture and the idea is that say for a polygon Let's say a triangle for each 3d vertex of our 3d triangle It has an associated attribute which is a UV coordinate a point on the UV texture such that the idea is that you want to Imagine that portion of the UV texture painted on to the corresponding 3d triangle And these triangles don't have to be the same shape. They don't have to be the same size What we want is for the texture to be squashed and stretched as appropriate to fit onto the 3d surface Now for the points on the surface of our 3d triangle There is a linear mapping to the corresponding triangle on the texture For example, when we render an edge of our 3d triangle the color we paint at the halfway point Should match the color we see on the texture at the halfway point of the texture triangles corresponding edge So again, there's a linear relationship between the texture triangle and the 3d triangle But as we've already said because of the effective perspective, there's not a linear relationship between our 3d triangle and the view plane triangle Transitively then there's not a linear relationship between our view plane triangle and the texture triangle So when on our view plane for a pixel we want to figure out what the corresponding UV coordinate should be We can't just do a straight linear mapping. We have to account for perspective Here we see what's called the frustum which is like a lopsided pyramid with its top chopped off Which effectively represents the bounding area of what's going to end up in our image Everything outside the frustum won't end up in the image because either it's outside of a field of view Or it's too close to the camera if it's on the wrong side of the near clipping plane or it's too far from the camera It's on the wrong side of the far clipping plane and the whole process of discarding objects polygons Vertices and pixels that aren't going to end up in the image. That's called culling Now it's not actually super critical to cull absolutely everything outside of our frustum The way the rendering algorithm will work is that for vertices line outside the frustum They just simply won't show up in the image But for efficiency reasons as early in the process you don't want to waste time Processing data that doesn't actually end up on screen, right? So imagine like in a game where you have a world with many many objects in the big area Perhaps and the camera is looking in a certain direction obviously anything behind the camera You want to cull anything outside your field of view you want to cull anything too far away from the camera past the far Coding plane you want to cull that too, but if we don't do a perfect job It's not necessarily end of the world and there's trade-offs here where you can do a quick pass and narrow down For this part of the world. Yeah, obviously we don't consider things in totally different regions of our game world We don't need to even consider them for rendering But then you have cases where things are close to the frustum But not quite in or partly in and there things become a trade-off The algorithms that would perfectly cull everything that don't end up in the image they exist But they're expensive and so the question is are you gaining more by culling more stuff than you're losing in the time spent culling First we want to eliminate stuff That's not in the frustum at all things that are fully outside the frustum Obviously, we don't need to render the stuff then you have the cases of objects and polygons that partly Overlap the frustum and partly fall outside and for those we typically want to clip them So say you take a triangle that's partly sticking out of the frustum And you divide it into two triangles that are fully within the frustum Discarding the part that doesn't fit in the frustum and then we have actually the trickiest problem of culling Which is you often have things in the frustum entirely which are yet entirely obscured by other things They could draw in front of them So it'd be nice if we could avoid all the work of rendering these things which are just going to get drawn over That's just wasted work Getting rid of such things is called occlusion culling and it's the trickiest and potentially most expensive part of the culling process and then lastly when it comes time to render a polygon usually we consider polygons to have a front face and a back face and Very often you don't want to render the back faces sometimes you do sometimes you don't so for every polygon You determine whether we're seeing its front face or its back face And typically we cull back faces. We don't want to render them Now culling actually is not something we're really going to get into at all We're going to be constructing very small scenes. So beyond just open gls standard clipping of things outside the frustum We're not going to do any culling whatsoever Obviously like in a proper game engine you would want to do some culling Once you understand the whole transform business of how we get from 3d coordinates into a 2d image That part is actually going to be really easy in most any renderer Assuming you're just doing a standard perspective projection rendering That's the easy part where things are going to vary a lot is in how you account for lighting Now in ray tracing we can better account for how light actually behaves in the real world You can better account for things like shadows and in direction But again ray tracing is unfortunately too expensive until perhaps very very recently for real-time rendering purposes So in the rasterization world the way we account for lighting is involving a lot of hacks We have these fairly artificial notions and not really based in physical reality of things like ambient lights point lights directional lights spotlights area lights And then for handling shadowing We have to do other hacky things shadow mapping on beta occlusion and some other techniques and It's kind of just the all combination of hacks. That's what we're going to cover in more of the later code examples We're going to show you how to account for lighting