 Alright, and we just hit the most advanced, well not the most advanced topic, the first of the advanced topics, and yet it is something that is its most popular, prevalent vulnerability, so we're talking about cross-editing vulnerabilities here, and in addition to that it also requires incredible technical knowledge of exactly what's going on in a binary and with x86 assembly, it also has to do with function passing and how function frames work in x86, so this really brings together a lot of your knowledge on different topics, so have you heard of buffer overflows before? Everyone. Can anyone do them? This is actually one of those vulnerabilities that really separates the people who think they know what they're doing from the people who actually know what they're doing, so buffer overflow at a high level is incredibly easy to understand, in buffers in C are stored on the stack that have fixed lengths, and so you can write above that length and overwrite the memory. At a high level, the concept of overflow is very simple, these things, the ideas here seem very simple, but when you get down to it and really start looking at and analyzing the code, and actually start to sit down and go, okay let me explain this clear buffer overflow vulnerability, it's really where the rubber meets the road and it separates the people who think they know what they're doing from the people who actually know what they're doing, so all right, so the key idea behind buffer overflows is that in C and C++ applications, there is no check that when you write to an array that you're not going past the length and why is that, you're in a different spot, I don't know who you are, sorry, what's that, yeah so what is, so how do you access an or access an array in C, what's the syntax for it, or accessing right into it to the map, yeah so let's actually go through this because this is something that will help hopefully motivate us, there's fundamentally no boundary checks when you're writing to an array in C or C++, and this is the root cause of buffer overflows and over overwrites, they are very complicated because they are architecture and OS dependent, like we saw, it depends on what's a thousand bytes beyond that buffer, right, that buffer V, it could set fault, it could do something horrible, or it could do nothing, right, maybe we're overriding an array that does nothing, who cares, they can be used to be exploited both locally and remotely, so what we've seen, so a buffer overflow could be exploited remotely, so if you think about it, essentially what we can do is we can write to memory that we shouldn't be able to, right, the program let's say can write to different buffers, so we'll see how to do this, this can allow us to alter the data of the program and also alter the control flow of the program, which is kind of important, right, so what's the control flow of the program, what happens to the program, right, you take different branches depending on the conditions of the program, right, if you look at a C program you can look at it from reason about the control flow of the program, right, oh after this happens then this function gets called and then this branch condition is checked and depending on this condition, this will happen or this other thing will happen and so on and so forth, right, it's all specified in the program source code, but if you can corrupt that control flow and get it to do what you want to do, now you can get it to execute maybe other functions that it didn't think it should be executed, so you can get it to do things that the original developers did not intend. There's actually been a lot of work on this in making this completely automatic, so there actually was just a DARPA cyber grand challenge that they had last summer at the Deftron Hacking Competition and what this was was they basically had a capture the flag competition with BOSS, this was fully automated attack and defense, so they had to create what they called cyber reasoning systems that would take in programs, try to find vulnerabilities to generate exploits and also generate patches to fix those vulnerabilities and then use those to try to exploit everyone else's services, so it's a really cool process, my old lab at Santa Barbara got third in this, so they created a really cool cyber reasoning system that can actually automatically exploit vulnerabilities and they got $750,000 for their effort, which is pretty cool. This is similar to, you guys know the DARPA had this grand challenge of autonomous vehicles, so they were trying to, they funded money to try to get people to develop autonomous vehicles, you can see where we are today, we have vehicles, autonomous vehicles driving on Bill Avenue. So this is actually, and it's crazy when you think about how old buffer overflow vulnerabilities are, so when we talked about the Morris worm, one of the vulnerabilities that it exploited was just the classic buffer overflow vulnerability, and you think about all the way since that until now, this is still a relevant problem in software, and so even though there's been a lot of effort and research in developing defenses, there's this constant cat and mouse game of new defense and then we find a new way around it, so we're gonna look at that too. And a lot of these from research actually came into commercial tools that are deployed basically on all of your systems now. But to fundamentally understand this, we need to really take a hard look at the computer architecture to really understand and conceptualize how does the x86 stack look like and how does it function as all the work here? So what does the stack in general, again the data structure, last day and first hour, right? So we push things on the stack, so stacks have two operations, we push things on, we pop things off, right? So for x86 binary for an application, what does it use the stack for? Yeah, you can kind of think of it as any kind of scratch memory. So remember we saw in that diagram, right, of the ELF process that the stack starts and grows down while the heap is starting at the bottom and growing up, right? So throughout the program's execution, the stack can and will change. The concept of the stack is used in all types of architectures, so this is actually an incredibly important concept to memorize and understand. We will always, always, always in this class, always draw stacks starting from the bottom and growing down. So those start at high memory addresses and going down. And of course you could draw them the other way, but that's wrong and terrible, so it will do it this way, so it will always be 100% on the same page. And so functions, as they execute, can push things onto the stack, which will move the stack down, and they can pop things off the stack to save things. So different assembly languages may or may not support this. On x86, there are explicit assembly instructions to do this. Hopefully you've seen some of this in part three of this segment. So the important registers that we have to understand exactly what their functionality is is the ESP, so the extended stack pointer. So this is a register that points to, we can think of the bottom of the stack, right, like where is the current bottom most part of the stack. So push EAS, right? What is the exact semantics of push EAS, right? Because we have to be able to read the assembly code to understand exactly what's going on. So push EAS, decrements the stack pointer by how much? How many? Four bytes. Four what? Four bytes, why? Yes, because EAS is 32 bits, it's four bytes, so it has to move the stack down four and we're going to store that. So push moves the stack pointer down four and copies the value that's inside EAS and puts it onto the stack. That's it, that's all it does. So what's the opposite of a push? So what's that going to do then? So if you just know the exact semantics of this, you need to tell me what precisely is the exact semantics of a pop EVP. Say it again? Copy one value, more specific. Which location? Yes, so look up the address that's inside ESP, copy the value that's located there into EVP and then decrement the stack pointer by four. So that will move the stack pointer up increment, increment the stack pointer by four. So it's really important, right? So here we first push, we first decrement the stack pointer and then copy the value. So when we want to go backwards, we have to do it literally in exactly the opposite order, right? You have to take the value that we're wanting to put it in the register and then increase the stack by four. So that should be exactly what that says. Yep, so we can go through an incredibly simple example of this. So we have our stack at high memory location. It's going to grow down to lower memory locations. And so how do I know? Let's say the stack pointer, I say the stack is here. How do I know that that is a fact? EID? ESP. EID doesn't have anything to do with it. I don't care what's in the EID, right? When I'm talking about the stack, all I care is what is in ESP. So that's why when we draw this arrow here, it's going to point to exactly what is it, the value that's inside ESP. So for instance, if it's the value x10000, then the value that's inside ESP must be 10000. It has to be. This pointer in this arrow is just helpful notation for us. Yes. This is just a pointer. So on the right are memory addresses, loading memory addresses. We'll put the values inside these blocks. For now, it doesn't matter. There's nothing in here. So is ESP pointing to the block below it or a block below it? ESP has the value of 10,000x based on what we have here. And you can see it's slightly off the pixels above it. So what do we know if I just told you this, that the stack, the ESP points to 10,000x, what can you tell me about the rest of the memory in this program? What is above the point? What can you tell me about that? Yes. So anything above this, if the stack grows down, this means anything above ESP is saved program memory. It's being used by the program somehow. It's storing some values. What can we say about everything below us? Garbage. We don't care. It's not empty. It's memory. It's just ones and zeros. These are memory addresses with bits there. But fundamentally for us, it's garbage. We don't care. To the program, it has no meaning because the stack can grow down and overwrite those values as much as it wants. It can be a heap. What will it think about? So the OS will stop you from going all the way to the heap. You'll get a second fall before you get there. So we can just think about it like we have a bunch of memory. Questions on this? Yeah. So are you taking the zero, zero, the bottomless address as the allocated memory for the stack? No, literally all the way from all f's to all zeros. I mean, it's not that the stack could grow down all the way there, but fundamentally, conceptually, it could. There's nothing to stop. If the stack won't do that, depends on where it starts. We don't know that. You can change where the stack starts, but yes. Starting at $10,000, we don't have a lot of memory left. Just a nice number. Right now we are considering a process. Yes, we're looking at a process of memory space. So only one process. Remember, because of virtual memory, the operating system, each process sees everything as if it owns it, right? It doesn't see or care about any other programs. Okay, so now we can go over some simple instructions. Push eax, top dbt. So we have eax, what's the value inside eax? Close. All right, yeah, that's great. X, A. What's an ebx? Whatever will say zero. What's inside the esp? $10,000 in hex. It has to be $10,000 in hex, otherwise my diagram is wrong. If there's any other value that blue pointer would be somewhere else, right? Because that's the important thing that matters is what value is inside esp. Then we get to a push eax. What is this going to do? Exactly. What's the favorite thing that happens when it executes this instruction? Decker, how much? Four bytes. Four bytes? So it's first going to change the stack pointer to be fffc, which is four bytes down. So now the stack value inside esp change, which means our pointer needs to change four bytes down. So this would be at, so each line is at four offsets. We're going to copy the value inside eax. What's going to be what? More specific. How many bytes is it going to copy there? What exactly are those bytes? The direction. We're talking about endianness, right? So it'll actually be a 000. So the very first byte at fffc will be a fffd will be zero, and e and f will also be zero. So you have a 000 and a 000. When interpreted that, because of the little endian of x86, I'll be interpreted as the highest, as the least, the most significant. Yes, that's endianness. Oh, I should do that. Change these and put all the bytes on here. That actually would be very cool. Is that it? Done. Was that instruction? Was that in the last slide? Was that part of the semantics? Did we get rid of the value that's inside eax? No, the semantics don't say to do that, so we would never do that. So you just leave it in there. I'll use copies there. Anything else? It's not making sure it's not good. Is this the same as program right before we execute the next instruction, the pop instruction, before it's executing that? Yes. Yes. So now we can pop dut. So now after we execute this, what exactly happens? What's the first thing that happens? We look up the memory address that's inside ESP, fffc, we view reference it, which points us here, and then we say, okay, copy those four bytes into where? Evx. So evx will change to 0xA, and then what happens? Increment of the stack pointer for, do we do anything or change what was on the stack? No. No, we completely leave it alone, right? The semantics don't say that we do that. We could. I mean, you could change that, but why would you do extra work if you don't have to? I'm sure as students you understand that concept. Okay, every 100% solid on the stack, that masters? Push, push, push, pop, pop, pop. Yes. If you push just AX or even AH, is it still going to try and do four bytes, or is it going to go down to the lower amounts, two bytes, one byte? Push. I do not know. You have to test. It would probably work. It would probably do two bytes, but I don't know 100%. It may do four, and then if you tried to pop AX, it would take the four bytes and truncate it to the last two and copy those two bytes in the last two bytes of the AX. It could do it, probably could do it either way semantically, but what it actually does, I don't know. Good question. Yeah. Can you pop past the beginning of the stack? Is there anything about the semantics that it says you can't? No. No. Just pointers. The instructions just move the pointers up or down. You could even write assembly code by hand and sets ESP to any arbitrary value. And then you can move it, pop or push or whatever you want to do. It's not going to be not good. Everything would break, but you could do it. You could write a crazy program and do that. Maybe that's what I'll do for part three next year. You guys okay? Well assuming you guys. Any other stack questions? Stack, push, push, pop, pop, pop. Okay. Now we need to talk about, so we talked about control flow, right? So when a function calls another function, yes, a lot of you took my 340 class. So this should be burned into your brain. Brain, if not, it's going to be even more burnt. So function frames. What are they used for and why do we need them? Keeps track of whose variables. Each executing functions variables. What else? Return types. Values. Maybe it depends actually on the language. Some languages will put them on the stack, some will use registers to return. How does one function call another function? Calls, jumps, stacks, frames. Are none of you right or are all of you right? What exactly happened? Let's say you want to call... Okay. Function frames. So we already saw. We put parameters on the stack. We put frame pointers on the stack. We put the return address. We put local variables and we didn't see it, but we used temporary variables on the stack. So if you have a really complicated mathematical expression that uses more temporary variables than there are registers, they will spill over onto the stack and it will store values onto the stack. One thing in x86, the return value is stored in the EAX register. So that's the only thing we didn't cover in here, how to get the value back from a function. It's stored in EAX. So that's what you can look at. So we looked at, hopefully now, if you look at part three, it will be a lot more clear. You can see that the main program calls that check password function and it maybe takes the value in EAX and looks at it and does something with it. And so what we did is we essentially reverse engineered and came up with the calling convention of x86. The calling convention states every location of a function must do this and at the start of every function it must do this. It defines who is responsible for what. And the important thing is this calling convention. What we just looked at is only particular to x86. It's also only particular to x86 on Linux and it's only specific to x86 on Linux that uses CDECL. System calls, which are, you can think of it as a function call, use a different calling convention. So they're actually a different way to call into the kernel as there is to call another function. So just to review in this order, this is something I wanted you to burn into your brain. You should be able to any function write the calling convention and write the exact order that everything happens in. This is going to be like the most powerful thing you can do. So the caller, the person who's trying to call the function pushes the arguments off of the stack on the order from right to left. I remember it's going to push them in order from right to left. So when we go to the stack top to bottom they will be right to left. If we go bottom up they will be left to right. You then push the address of the instruction after the call with which function, with which x86 command. Which one pushes the address of the instruction after the call? The call, it's in there. The call E has to push the previous function for the, the previous point point onto the stack, create states on the stack for the local variables, ensures that the stack is consistent when it returns and also ensures that the base pointer is properly in there and puts the return value in the EAS for this. We literally just did all of this except for this EAS example. So I'll show you this super simple example of this scene program. So we just have a main function. We have a local variable A. We set call E equal to A. We set return call E looks like this. You should be able to write the assembly for this instruction, for this program. You, just what we did right now, you should be able to do 100% write everything exactly as it should be. Obviously the order of, you know, this int A on the stack, well, there's no order. So you should actually go to write this 100%. If we look at this instruction for one time that I compiled it, the other thing that's tricky is compilers can change over time. So they may change how they lay things out. They may have more optimizations. But I compiled this. You did a push EEP. It moved the stack pointer into the base pointer, which is doing what? Setting up our base pointer after we saved it. It then subtracts hex 18 from the stack pointer. That seems surprising. Yeah. Yeah. It then moves hex 28 into ESP plus 4. So let's hex 28. 40, which is which parameter? The right most parameter. So we moved down the stack 18 hex. So we're putting this where four above the stack. You should also be able to draw the stack at any single program point here. If I say it line this, what's the stack look like? You should be able to draw that. We then move 10 where directly at the bottom of the stack. So what's above that? 40. So there's 10 and then 40. What's the next instruction going to be? Right? Because we've set up the stack, right? Remember, before calling this function, we need to first push the arguments on the stack from right to left. So we first, did we do any pushes here? No, there's no push instructions here. But the stack is in the correct order, right? The first thing going from the bottom up, the first thing on the stack is 10. The thing right above that is 40. So what did the compiler do by subtracting 18 hex from the stack pointer? Yeah. So it not only allocated the space for local variables, right? It also allocated extra space because it knew we were going to use two spaces on the stack to put the parameters to this call e function, right? So rather than just giving us enough for our local variables, it gave us extra so that we could use for calling this function. So then it comes back, then what do we do? EVP minus 4. Because at EVP is the same space pointer, but EVP minus 4 is a free memory location. So we can say that the compiler chose that. So we should move EAX into EVP minus 4. So the address of A is not specific enough. You've got to say exactly where A is located, right? Which is at EVP minus 4. And then what do we do? What's that? What's the next instruction that executes in the C code? Return A. So how do we return A? We have to put it into EAX. So how do we do that? You're a dumb compiler. So you then move EVP minus 4 into EAX. So now you've done, you've put that return value there. Now that's what you need to do. We need that return. Oh, that's it. You just combined this program, right? It's exactly how we did it by hand. So what we've been looking at here, this is the prologue of this function, right? So every function has a function prologue and we have an epilogue. So you can think of the prologue and epilogue are kind of bookkeeping that needs to happen, but aren't tied to the actual functionality of this code, right? Everything else in between is the actual code of this function that gets executed. You can see the quality function is very much like this. So when we see root EVP plus C, what is this? What does this correspond to in the C code? Which variable? Sure. B. Yeah, so EVP plus 8 is A, that's the first one. You know, EVP plus C, so it's a parameter, right? It's going up the stack. So it's moving A into EAX, it's moving B into EAX, A into EVX. This load effective address is adding them together. We usually go through this to see exactly what it does. It adds one to the EAX and then it pops EVP and it returns. So I'm going to, I think actually we've done enough here. I didn't finish this on Wednesday, but I'm going to skip all the rest of these example stack stuff. So we're going to, I think today we actually did really well creating a base for that. So we don't need to step through all these tedious examples, but you should go through that to make sure you understand how it should be working.