 All right Anyone hear me can everyone those people at the back hear me okay great Okay. Hi guys My name is Omar. I used to be a student here. I was from computer engineering So how many of you are from computer engineering? I want to see a handful how many from computer science everyone else I guess okay nice Yay, so this is a somewhat Hardware software talk. I will try to cover more of the hardware part Since you guys don't know much about it and I can get away with it Okay, and I want to talk a bit about how I got into this so I work So I work at sea. I'm not sure everyone heard of sea, but we used to be called Marina and We work on quite a few interesting projects I happen to be working on this new project that we haven't released yet it's related to live streaming and I happen to do a lot of video processing work recently. So that's what got me into GPU programming. Okay Okay, so let's get into it So okay, so let's get into it. So what is a GPU? I guess most of you know what a GPU is right you guys play games. So you must know what a GPU is But if you really think about what exactly is a GPU When you work with it or if you want to program for it, right? Then it's a different question a GPU if you think about it is a piece of hardware that is optimized to render Graphics, but not just that it can actually do a lot of different things the philosophy in terms of hardware for GPUs is that it can is composed of units that Many many units that help it do a lot of simple but parallel operations concurrently and that's why it happens to be good for graphics and it happens to be good for a lot of a lot of Parallelizable algorithms. Okay So we're gonna we're gonna jump into that in quite a bit of detail later on But first what I want to get you guys around and what I want to what I want to illustrate to you guys is the Graphics pipeline that is how really are graphics rendered on your screen when you use a GPU and what does coding? Stuff for the GPU looks like for modern-day devices Especially in my case particularly iOS because I'll be focusing on iOS since I am an iOS developer. Okay so When I say the graphics pipeline what I mean exactly is how to draw stuff how to draw shit on the screen, okay, and Like any graphics tutorial we're gonna start off with triangles because triangles form the Fundamental basis of how you draw anything else everything else is composed of triangles. Okay? So the whole story for rendering starts off with these things called vertices, okay? Vertices are just points in space and if you think about it you look at this triangle It is just composed of vertices 012 these vertices are just points in some arbitrary space Okay, you don't really need to define what the space is this could be your 3d model for example If you have a 3d model that looks pretty nice with a few million vertices It so this these were all these vertices can represent for example your car model or something They're just points. Okay in in so in this space So what do we do with these points? What we will define is this thing called a vertex function. Okay a vertex function Job is to transform these vertices that are in some arbitrary 2d space into this normalized device space, which is a 2d space, okay? so These device coordinates. They're also called clip space coordinates. Okay, they exist in this Special coordinate space so on iOS this coordinate space Basically has this this kind of structural lights it varies from minus one to one on both x and y axis So like this is 0.5 for example, so this is one for example on both sides. Okay, so This is the reason we do this is to normalize everything so everything so when at render time Whatever arbitrary space you were using at render time and you want to display on the screen You know exactly what the coordinates roughly again are roughly are going to be okay So for example if we look at back at our triangle if you want to render a triangle like this with some rotation involved The way we the way we would do that in a GPU context would be we will need to specify a word X function Okay, and a word X function will execute once per word X Okay, so that means you we have three vertices will be called once for this word X Once for this word X once for this word X and that's how the GPU will sort of know Okay, I have these three vertices. I know where they exist in this clip space. Okay clear so far Okay So what does a word X function look like and how do we define one? Let's say let's say I want to write some code now because I'm sick of looking at triangle diagrams Okay, so in terms of code, right now. This is some simple C plus C C plus plus kind of code if you don't know C C plus plus you guys need to take your 1 1 1 again, but Like basically it's just a struct. I have a struct here. I have Inside the struct. I have two properties. Okay, one property is position Position is just a floating point vector of two of two components Vector of two components because I have X and Y direction Okay So position so this is just a position and I have a color in this case The color is a RGB a so I four components in my color and that's why it's a vector of four components Okay, so in my use case because I'm just rendering a boring triangle with just one color in the 2d space Because I'm not very adventurous right now. I just have this Struct okay in your use case obviously this This word X can be very different. Maybe you want to do some very intense stuff with lighting You want to have a different you want to have 3d? Coordinates you want to pass a bunch of things my point is this truck can be whatever you want it to be This is just one way to represent a word X that in for the GPU you it can be anything Okay, this is this is kind of arbitrary Okay So this is a word X. I also need to define what will my word X function return Okay, and this is where things get a bit interesting so remember I mentioned last time a word X functions definition is it will take a word X or It'll take only one word X each function will be will take one word X at a time and will return to you the Device space the the normalized clip space coordinates. So where is that word X on my screen? That's what the word X function will return to you right So we need to represent some structure that contains that information and maybe we want to pass some other information as well So one way to do that is you can define another structure in other struct. Let's call this struct Rasterized data, okay, and this thing has two two properties, okay, one property is the Clip space position. Okay, this is the actual position of the Point in the clip space The device space and it's a float for and I'll explain why in a bit and we also pass the color Okay, and you can ignore this for now Ignore ignore all this comment for now because you won't understand it But the main thing to note here is that my position has this special Funky looking tag around it. Okay, so I have this struct Right, I need to somehow tell the come tell the GPU or tell the hardware that Which of these fields is the position is which of these fields is the position attribute? Okay, so that the GPU knows that this is the vortex the output of my vertex vortex function that I need to use Later on and that's how I do this. I use this funky looking thing. This is a part of metal syntax. Okay This will help me indicate this property is a is is a vertex position. Okay, and Because of this function I can actually have a much more complicated complicated struct as long as I mark one of those properties as position The vertex function will be able to execute correctly because it will know that which of those properties are the position Okay, and this thing is a float for because you later we might need to deal with 3d space So you might need you might need more than just two coordinates in my case. I will just use the first the first two. Okay, so Now I will show you the Function definition just the definition of the implementation of a vertex simple vertex shader Okay, or simple vertex function in metal. Okay, so just before we go on just to clarify for those of you who don't know this metal is iOS's or Apple's GPU API. Okay GPU programming language. It's a programming language That's that is a super set of C and C plus plus. That is why the code you saw earlier Looks like looks pretty similar looks might look familiar It's a super set of CC plus plus everything that is valid CC plus plus is valid metal code Obviously you cannot use the standard library or you cannot use any of the inbuilt functions that you might be familiar with CC plus Plus because you're running on a GPU now So you want to do simple operations you have to use the metal standard library to do them because and the reason for this is because Normally your standard libraries are designed for CPUs, right? They will come when they compile they will compile into CPU instructions That's not the case for metal for metal You are compiling for the GPU and GPU instructions look very different the kind of programming model You have is also very different so you can't reuse that standard library You can't reuse a lot of existing code that you might have, okay? So coming back to this vertex function, okay? So this is what a vertex function will look like, okay? And Remember last time I mentioned let's look at this one by one So we have a we have over this first thing called vertex. This is a keyword It's a keyword to identify that this function is a vertex function, okay? Then we have the output struct that we define. This is the output of this function This is something we defined earlier. It can be anything like I said as long as you Specify where is the position attribute inside this truck? It can be any any data structure you want it to be, okay? the name of the function and then there's a few arguments if you notice, okay? the argument that one of the arguments that is important is the vertex ID so remember last time I mentioned that a Vertex shader will execute once per vertex you need somewhere to identify which vertex you're at right now And this is what the vertex ID is for, okay? If you use open GLES or some other similar equivalent on other platforms, they also have similar concepts, okay? Then the other thing in that we need obviously are our input vertices, right? So because we are taking some vertices input and we are converting them into some output vertices on this clip space thing, okay? And so we will pass those were diseases This is if you remember we defined this type last time This is our type that we define for vertex that contain the position and a color, right? It's over here. I pass these vertices in In this and this this basically is a pointer. So it basically means this is an array, okay? But if you notice this this funky thing on the side, this is this attribute means that this thing is a buffer, okay? Why is this there? Why do we need this and why you probably understand why we need this, right? Because the vertex ID is provided by the GPU system. So that's why we need to annotate this argument as the word XID What this buffer means is that this word is this word X buffer is Is something that's provided by the GPU systems? It's not a because this thing is not a normal function You can't just call it like a normal function you like you have a main then you call this function It doesn't work like that normally for this kind of Metals GPU setup you need to configure these functions and call them in on the CPU side So I'll show you example of example of what I mean later on, okay? But what is buffer attribute actually means is that this is the real? This this is the buffer that will pass to you from the Native side from the CPU side and the buffer at the zeroth index contains the vertices The buffer at the first index contains this other argument Which is the viewport size in my case? I have a viewport size because I want to I'm rendering on some screen So I need to know what the size of that screen in order in order to know what the exact coordinate will be Because like I mentioned when we return from this function we return something in a normalized space That's that's that's a fixed size So if you want to handle things like different aspect ratios, we need to pass a size, okay? okay, so Anyone not clear till this point or anyone thinking this makes no sense Anyone have any questions at this point like I can stop for questions at this point Okay, so far so far. It's not that complicated yet. Okay. Hey good Then the next step right so after we have vertices you can imagine we've so we have something into the space now We have we know our points into the space. Okay, we know what our output is going to be What's the next step? We have to do we need to do this thing called rasterization That means we need to find out What pixels are in are going to be included in our? Triangle right because when you actually render something on a screen you are rendering it In terms of pixels right so you have this fix you look at this figure to ignore this text for now if you look at this figure There's a bunch of pixels here, right? So when I render a triangle Obviously there is I can't render across I need to know which exact pixels will be part of my triangle, right? So the nice thing about GPUs is that they handle this automatically for us, okay? This is called Rasterization this process, okay, and normally in programmable GPUs you are not responsible for it the GPU Will automatically handle this for you, okay? so How so the next question will be then how can you control what to show at each pixel like if you want to color each pixel What do you do right? Let's say you have some very sexy lighting model that you want to apply and you want to color each pixel with some very special color Then the way to do that will be you need to implement this thing called a fragment function, okay? A fragment function will be called for each of these fragments that are rasterized So for each pixel that is included in your that is included in your triangle You will you a fragment function you can program a fragment fragment function And that fragment function will be called on a per pixel basis and that way you can specify What's the color you want for that pixel? Okay, so for example, here is a very simple fragment function I just pass it so you can see it's called fragment it returns a float for float for is because I'm returning a color and Color needs RGB a so that's why it's float for and it's a fragment shader I call it fragment shader it takes in the output that I got from my back from my vertex shader actually and This funky looking attribute that tells me that this is indeed the output from my Vertex shader, okay, and in this case basically, I will just return a color return the same color that I that I Passed when I was in though when I created this vertex. So To go back to recap Okay, so I go back to my definition of what I return in my vertex shader Okay, and our next shader I return a position so you all know position right position is fine I can specify a position in 2d space no problem let's say now I want to also specify a color right and I want to vary this color based on the Position of where it's in I want to interpolate this color, okay? the way so in order to do that if I so if I have any so basically in For this structure right any output from the vertex shader if you don't have a special tag around that special tag around that field That output will be interpolated over a fragment. Okay, what do I mean by that? What I mean is That the values of that field will be will nicely vary Depending on their position with respect to the vertices, okay? So the colors if you specify a color it will vary it will it will be interpolated automatically for you So the rest this is this is something that the rasterizer will do for us, okay? I'll show you guys an example later on to make this clear because I know what I'm when I say it It's not exactly that clear, okay? so Let me let's let me stop here, and let's look at a demo I do understand so far What what I said what exactly I mean? okay So okay, can you guys see this code is this visible everyone can see this right, okay? Here I have some simple. I have the hello world example for GPU programming. It's called hello triangle Okay, and let me just run it to show you what it does first, and then I'll go through how it does it, okay? so Give it a bit of time to come give it some time to compile and Okay, it's done Okay, here we go. Yay triangle. Okay, so okay, so when I said interpolated color, this is what I meant, right? So in this particular case One of my vertices is blue one of my vertices is green one of my vertices is red Okay, that's the color. I specified in my vertex data. Okay when I render this triangle It automatically Interpolated that color value and it did it in such a way that it's uniform Uniformly distributed across based on the distance of the vertices is a term for that I forgot the term exactly that that they use for this kind of interpolation, okay? so this is this what I meant by interpolation, okay, and If I look at my shaders for this for this example So it's similar what I showed you guys earlier. This is the vertex shader. Okay, it takes in So it takes in the vertex ID takes in the bunch of vertices and the size it will do some simple transformation to find the So it gets the out the viewport size and will find the basically the position that we want in which case to get the position I just divide the viewport size by two and Specify some color based and that's it. Okay It's a very simple two line two line shader and my fragment shader just actually returns the color that I pass in from this thing So if you notice this is so this was the output So this is the in this is the input as I mentioned to my shader the position and the color so when I when I create a vertex I specify position and color and and and The color value later on will be interpolated. Okay, that's what I meant by interpolation. Okay, so That's all for that's all for this this pod Okay next let's say in GPU normally if we use GPU you don't want to map each pixel by pixel Right, so if you think about from like a game point of view or you're thinking like you're running on the 3d text 3d Thing you don't want to go point by point right individually. That's quite wasteful Normally you have this thing called texture a texture which is an image that will be applied Applied on your 3d model or your 2d model or whatever, right? So a texture will be used to color so to speak your thing on your whatever you want to do on on In the final output, okay? So if so how will a texture work if you think about it a texture begin texture is quite simple It's quite similar. So in this in our case We changed the definition of our vertex last time we used to have a color now. Let's instead change that to Texture coordinate, okay? So instead over here. I have a vector float to texture a texture coordinate. So basically this is the Coordinate I want for that position in the texture space, okay? So if I have if I'm rendering like a triangle, right? This is the first word text for example, and this is the this is the call this texture coordinate refers to the point in the Texture which I want to map to that vertex, okay? The texture is just a 2d. It's just a 2d texture, okay? For texture coordinates, this is a texture coordinate space basically starts from it goes from zero to one On all directions middle centered at 0.5. Don't ask me why it's different from the cliff cliff space. I don't know why Okay And basically same thing as before we have some vertex output So after we are done processing the vertex we need to tell we need to pass this data to the fragment shape fragment Shader and the output once again We have the cliff space and we mark it with this position tag and we have a texture coordinate The same thing with color as before the texture coordinate will be interpolated for each fragment Okay, that's that's the night that now you can see why you want this interpolation, right? It's gonna interpolate it you can then find the position very what which part of the texture you want for this pixel Okay, that's the main reason why this thing is quite helpful, okay? so But then you have a texture one of the things you need to be aware of is you can't you the texture may not Exactly map may not be the same size right as your as what you want to render You might have a texture that's too small or too big for your image. So how do you do this? How do you handle that and in order to handle that we have this concept called a sample some sampler Okay, we will do is we will sample the texture to find that okay at this position What's the color you run some algorithm and you give me a color? Okay, so it will so there's a bunch of different algorithms It has one of the simple algorithms to do linear sampling It'll just look around and do an average of to find like what should your roughly what's your color roughly be? Okay, so this is what roughly the code looks like you just Create a texture sampler you sample the texture and and that's it Okay, and you're done and this this is one way through which you do texture sampling, okay? So okay, why do I mention this right? So my people might be thinking the abstracts of why am I talking about this and especially why we're talking about textures in this talk Right, the reason I'm talking about textures is if you think about video, right? So think about your video player What exactly is the video player doing a video player gets images, right? So when you send a request when you open YouTube for example You are going to be receiving chunks of data that represent that are encoded with your data, right? And you decode that data so you decompress that data you will get these series of images, right? So time time equal one this image time equal to this image, etc So what you do is you take that image and that's a texture, right? So you put that texture you feed it a GPU to render it and that's why that's how video rendering works, right? You just and then you apply some Transformation on it Let's say you because you want to rotate that image because your device is rotated or something You've got some transformation on it and that's how video rendering works, okay? That's why I that's why I talked about texture, okay? Okay, so before we jump to the demos actually I'm not going to talk a bit about GPU architectures first Okay, before we don't jump to demos, okay? so For GPUs, right if you think about GPUs Let's look at what a GPU looks like if you take the lid off, okay? So a GPU if you this is what a few components of a GPU, okay? So GPUs normally will like any other Integrated system they will have of some power system. They will have some interface. They will have some memory controllers Some display interface stuff that basically interfaces with the display hardware And they will have some many of them will have a video processing unit This is to implement the encoding decoding algorithms in hardware because that's a lot more efficient at doing in the software, okay? So but we are not going to focus on all these we are going to focus on the main Core of the GPU, which is the graphics and computer rate, okay? This is the actual rendering engine of a GPU, okay? And this is something we're going to focus on Okay, so What makes the question I have to you is what makes a GPU different from a CPU? Okay, that's a very simple question and I'm going to try to answer that question, okay? So we go a bit deeper and What I want you to think about is so I have this very simple shader over here, okay? This is a program So you think of it first Let's think of this program from a CPU point of view and we'll slowly assess how to optimize it, okay? So we have a very simple shader here and this shader does some algorithm you can ignore what the algorithm is I don't care, okay? But let's look at the output the assembly output in so the actual instructions that will output Okay, and if you think about it from a CPU point of view, right? The you will have let's say you have this code it will Create a bunch of instructions, right? So there will be some sample instruction some some mathematical instructions, right? Some multiplication addition clamping more multiplication happening here, right? It's a bunch of instructions and if you're a CPU you will go through them one by one, right? Sort of one by one unless like obviously if you have pipelining and stuff You obviously have to take that into account, but basically you will go one by one For each pixel, right? So this is a fragment shader you're going pixel by pixel So you need to execute these instructions this bunch of stuff for each pixel, okay? that so normally if you have a 720 by 1080p image that's a few less than a few million pixels So you have to instruct in execute these instructions a few million times, okay? You can see how that can be problematic if if you use a just a pure CPU, right? Let's look at bit deeper into a CPU. So what is actually the CPU, right? We think about a CPU it has a bunch of components it has Something in charge that handles the decoding and fed decoding of the instruction fetching that Fetching that instruction as the ALU which does the actual execution It has some execution context. This is like the excuse the stack the function the function called stack, etc Normally they will have a cache and this is a big part of a CPU, right? You'll have L1 cache L2 cache L3 cache and these are very important for a CPU and We have a whole bunch of very fancy features that see that CPUs have these days They have this out of order logic of very fancy branch predictor some memory prefetching, etc He doesn't talk more about the talk to Rajesh. He's here. He teaches a module on this stuff. Yeah, okay? So see so CPUs have all this very fancy stuff that makes them very good at doing Complex tasks, but not that many of them, okay? So here's an idea, right? I have a suggestion. Why don't we slim the CPU down? Let's cut away some of those parts, okay, so that we can have more of them on a hard piece of hardware, okay? It's a very simple idea. So let's let's remove all this fancy stuff and let's just keep the basics Let's just keep the fetching the code cycle the alu and some execution context so we can just run instructions We don't care about caching. We don't care about branch prediction. We don't care about all the other stuff, okay? So if you do that Now the nice thing is that I can have more of these right? Because I have fewer components so I can put more on a piece of hardware I can develop make it cheap as well, right? So if I have two of them I can run these two in parallel, right? So I can basically I sort of get I can sort of have many many cores so to speak and and this is cheap And this this can work fine. So this is one strategy I can do, right? So and there's something that GPUs do so GPUs like this is like some of GPUs for example Here's a sample GPU that have 16 cores, okay? And this is fine because the core themselves are quite lightweight. They're not doing that much. It's where it's quite simple, okay? And each core can execute one fragment at a time So I can do 16 pixels in parallel at the same time That's much better than one pixel or two pixels in parallel, right? If you look at a simple traditional CPU you can only have four cores maybe Two cores eight cores if you're special so this is much already already a bit better than some CPUs, okay? But we can do better we can you go further than this, okay? So how do we go further than this? one idea is that a lot of fragments, right? Because our instructions if you look at the fragment shader that code is the same for a lot of fragments, right? So what we can do is a lot of these fragments they can share the instructions instruction stream The code is the same the data is different, right? So if you think about it the Strut that you are doing is actually the same for all these pixels But it's just that their values are different So we can actually use this thing called simd processing simd stands for single instruction multiple data Okay, what this means is you have one instruction But you execute on a large chunk of data at the same time this way you can optimize your hardware a bit more because You have to do you basically you have to build less hardware, right? Because there's less stuff be less less stuff happening, okay? So we can utilize this technique and if you look back at our code, right? So this is the compile code originally these instructions operate on one floating point at variable at a time, right? So they will this multiplication instruction will operate on this one floating point number, okay? So if we change that instead to a vector operation Okay, so this is a vector operation what this means is it's still multiplication, but the inputs are two vectors So I'm Processing eight fragments at one time. I'm doing eight pixels at the same time in parallel Okay, because I can do a multiplication of these two vectors and get an output vector Okay, and this is a lot faster and it's more it makes more efficient use of my hardware, okay? So this is the second idea that we change each shader to simd instructions, okay? So Yeah, here's this so so in this so here's an example So here I have eight pixels, right? So I have one two three four five six seven eight pixels Each of them will be fed as a vector array into this instruction into this shader And each will be given a color, but they are still part of a vector Okay, so that's that's the bright bright idea here and this thing is quite common actually It's not just common in GPUs vector processors actually exist in your traditional CPUs as well There are in many modern day CPUs that they are there actually so if you use the iOS device It comes with the vector processor as well, and it's quite useful thing to have okay so Now the nice thing is yay now I can do if I combine the previous to do approaches So I have 16 cores and I have eight simd instructions per core So I can do 16 by 8 so 128 fragments in parallel now. It's just much better I'm 128 times faster than a single core CPU Right so life is pretty good, and I can do 16 Simultaneous instruction streams so which means that I can actually do 16 different programs at the same time if I want to okay So this is quite good But we have one big problem now remember last time we got rid of all the caches right we got rid of all the branch prediction stuff and all the nice CPU stuff and all that stuff actually is there for a reason is There for a reason because if you know Fetching from memory is goddamn slow Okay, if you think about it How many cycles does it take to do a multiplication or to do a bit ship or to do a addition? These will maybe take one to ten cycles like not that many the order magnitude is quite small But fetching something from memory Can take a few hundred cycles. It's an order of magnitude slower The way CPUs fix this problem is through caching so L1 they'll have this L1 cache So they have this first thing you have is a register a register is free You can it's almost one cycle to access it or few cycles only now We have L1 cache which is the tens of cycles L2 cash was hundreds of cycles I basically the number of the order of magnitude to fetch L1 L2 L3 is much faster than to fetch fetching from memory Okay, and that's the bright light That's why CPUs have caches and that's why if you run your program to optimize for caches You'll magically run a lot faster even though you're using the same algorithm This is something that this is like this is one of these funny things you see if you if you if you optimize your programs a lot Okay So the thing is we have this thing called stalls right so stall happens when You cannot run an instruction because there's some dependency and because the previous dependencies taking too long So once again, I meant so so here's a CPU example right the CPU has L1 L2 and CPUs can execute quite efficiently when the data is in cache right because caches are close. They're close to the CPU It's much faster to access them the data pipeline that they the pie the data connection that they have with the CPU is much faster compared to Memory which is which is bound which is bound by like, you know, it cannot be as fast as the L1 L2 cache Okay So the thing is in GPUs, right? We have this thing. We have a lot of IO operations. So if you think think about it things like textures, right? I talked about textures earlier textures involve Doing memory fetching because I need to read the texture memory, right? And I need to use it in my shader and these can have very high latency like it can be 100 to 1000 cycles, right? So like I mentioned, we removed all that fancy caches so the problem now is that We will have stalls because when we are doing when we are doing IO operations, we're gonna get stuck. So in this case, for example, let's look at it Let's look at example here. Let's say I have a fragment shader and a fragment shader and at this point is doing some IO work, right? It will be stuck So I got to wait for this to finish, right? So how do I fix this problem? So how do I how do I Still make use of my hardware and the simple idea here is just running in parallel What I mean is just just apply them scheduling tricks So that if I'm stalling right now I can schedule the next bunch of fragments that were scheduled and run those instead While I wait for this guy to finish stalling and I can so when this guy stored I can start loading the next bunch and so on this way I have full utilization of my hardware without doing wasting any time waiting for my IO to finish, okay? So these are some kind of scheduling scheduling tricks that GPUs will use because they don't have Caching or they don't have branch prediction on all those fancy things, okay? So that ends the theory part of my talk, okay? So let me go back now to the promise demos, okay? so Let me Open some okay, so let me open a simple demo for To show you some code for how does it work? With buffed with textures, okay? So let's I'm gonna do some simple is a very simple application So I have this image that here's what I want to do Okay, so I have this image Okay, this is you can imagine this code is designed to be a low-level video player that renders only one frame Okay, it's the most useless video player possible, but it's a start and This is what it does. It just renders the image you saw earlier And I just renders renders it on the screen, okay? But why am I showing this off because this is quite simple and stupid the reason I'm showing this off is because I'm doing this Directly by programming the GPU directly, okay? Because if you think about images, how you render an image, this is how the hardware exactly will render an image, okay? so I'll just show you guys some code For how we set this thing up for how how this whole GPU pipelines is set up in iOS at least, okay? so in iOS What the way it works is we have this Concept of a metal device that we will use this this refers to the actual hardware that we are rendering and we have What we need to specify in iOS is this thing called a render pipeline and a pipeline has this thing called a command queue which is the bunch of instructions that I want to queue for this GPU and This queue will basically every time I have new instructions. I will queue them on this on this but on this queue, okay? So in this example, I have some some some code to basically load a texture and then I have a bunch of vertices and These vertices right? Pixel positions map to texture coordinates. Okay, so this is the same structure. You saw earlier It's just position and texture coordinates So these are positions and texture coordinates in this case this thing is just a square It's just two triangles. It represents a quad, right? This is one triangle. This is another triangle these two triangles together make a square, okay? and I will pass these bunch of vertices and I will create what is called a buffer in metal and the one fun thing about metal, right? So one fun thing about metal is in metal and in iOS in general Memory is shared between CPU and GPU, okay? So normally you might have heard of this problem that if you have using a plug-in it will GPU that Copying buffers from GPU to your video card. It's often very slow. It's often IO intensive operation So one way that mobile devices get around it or one way that iOS gets around it is it will share memory it will mark this memory space is shared between iOS and between GPU and CPU That means that the GPU can DMA that memory directly It has direct memory access to that direct access to that piece of memory and sort of the CPU So that's why I my buffers in in metal will be shared okay, and This is when I remember last I mentioned that you need to the way you in walk of vertex and shader and a fragment shader as you need to basically Configure them in some way so in this case I configure them by getting them from this library thing which will contain the Compiled versions of those of that code and then I can set up a pipeline basically and I build basically Yeah, it's a bunch of stuff now and what will happen in each Frame is this is this is the code that will happen in each frame in each frame I will basically Specify what the what are the vertices I want for that frame was the what is the texture? I want for that frame and then I will draw that frame and I will pass it to my command buffer, okay? That's the gist of it, okay? so the point here is that The way to think about Stuff when you think about the GPU is you need to think about it in terms of vertex buffers vertex shaders and fragment and fragment shaders and you need to pass the Appropriate arguments in this case my in this case is my buffers Is this is this buffer? And and the texture you need to pass these arguments in each draw call Okay, so if I was to animate this for example, I was to have some animatable stuff This stuff will change per draw call. I could do some animation I could change the vertices I'm providing that way I can have some animation and nice effects And that's how animation frameworks work They will change the vertices in each frame and that way you can get some animation effects, okay? so That's that's all I have Okay, this was a pretty low-level talk. So are there any questions now? Okay, feel free to ask any questions. Yeah, if you guys want to know more