 All right first and foremost, I know y'all have waited for this moment for a long time Here is the distribution of grades for the midterm exam Well, I'll email you we just finished grading today, so I'll email you all your grades After class sometime today, and then if you want to see your actual exam come to office hours to visit me or a TA We'll be happy to go through your exam show you what you got right didn't get right all of that questions Cool. All right, I don't really like this orientation, but I know this is good or bad usually supposed to be more of like a curve Yeah, I know it's just this I don't know. I'm not really sure what to think about it, but you know, I Think it means you studied so that's good or some of you at least All right, you can let me know your thoughts later all right Because today we are gonna cover Overflows so now we're gonna get to memory corruption vulnerabilities. So this we're gonna crank. I think we have 130 slides or something to cover Mainly just used to scare you. There's not a lot of it is animation. There's not actually 130 or 140 pages of text okay, so How are the important thing to think about is what all this stems from and all of these vulnerabilities we've talked about all stem from essentially Program like programmers making mistakes and writing what we've looked at C code So how are arrays done in C? Yeah, they'll fix size of memory. How do you specify the size of the memory? Yeah, by hard coding it as you write it you write bracket the size of how many integers or characters or whatever the data type is that says you want an array of that big site and What prevents you from writing? Let's say creating an array of 50 characters and writing a hundred characters into that array Nothing so how do you have to do it if you're doing this correctly? You have to know the size you have to pass the size constantly with the array itself, right? You have to have either The pointer to the memory and then the size so that you never go over the allocated memory even using something like Dynamic allocation on the heap with something like malloc You've got to specify exactly how much memory we need and then you need to pass that information down so that everyone else That operates on that array knows exactly how many elements are in it Why don't you need to do this for something like Java? So it keeps track for you. Yeah, so the part of the language itself says you tell me what's an array I will keep track of the size and every time you read or write to that array I will check the size to make sure you're not writing over the balance. What's the benefit of that? Yeah, the programmer doesn't have to worry about they don't worry about passing the balance, what's the downside of that? Speed is the main one, right? Because now every memory access needs to check whether it's in the bounds or not so Essentially, and this is what we'll see is this lack of built-in boundary checkings in arrays It causes one of the most common mistakes in CRC++ Applications and we'll see how this can be used for vulnerabilities. So Overflows, so what we're talking about here are buffer overflows so buffer array same general concept you're creating some memory and By overflowing that buffer where essentially the attacker is able to write more data than can fit into that buffer Can do all kinds of cool stuff will go over how you can modify the control flow of the application You can also modify the data of the application Exactly how you do this and this is one of these Types of vulnerabilities where your exploit needs to be very precise It needs to depend exactly on the architecture of the system the operating system version All these types of things there are tools that actually will try to automate this process We won't get into them here but essentially and a lot of things have been done to try to prevent the exploitation of these vulnerabilities but attackers have gotten a Lot better over the years so every time a new mitigation strategy is deployed attackers find a new way around that so Okay, but it all starts with the stack. So what is a stack generally? Yeah in the back Yeah, so last in first out so you push things onto the stack and then you pop them off they go and reverse Or so you can think of you just push an item one item two three four five six and then six comes up first when you pop and then five and four all the way that way so and The way the stack is used in almost all operating systems are on I don't know if operating systems is the correct term, but on almost our all could all architectures use the scratch as some form of Scratch memory for function. So we're not gonna get into too well We'll get into some of the details, but if you think about it at a high level, how can you have a function call itself? So you bring on a person function How can I call itself? Where does that man? Where the memory for the local variables go? Right you need to have a new piece of memory for every invocation of a function So if you have a function call itself a hundred times There needs to be some space in the memory of your program for those hundred different calls that each have different local variables So this is all done using the stack What we will be looking at all the stacks we'll be looking at start at high memory and grows down So when we say we push pushing things on the stack move the stack down towards lower memory popping things off Rises makes the stack move up towards higher memory. Is that well, okay? It's essentially an arbitrary distinction, right? It's just you're talking about how the memory moves So you could flip it up. You could have the stack grow from the bottom of the top from high memory to low memory you could do it left to right although that's definitely wrong, but as long as we're consistent, we'll always know exactly what we're talking about and As the as a function executes it can feel free to store values onto the stack pop values off the stack and Some assembly languages specifically like x86 that we've been looking at actively support this feature So specifically the register ESP holds the address of the top of the stack So what does that mean is inside the register ESP? No, not the most recent thing put on the stack very close It's a point. So it's what does the pointer mean? Yeah, the memory address of the place that it last put a thing on the stack. Yes, so And it's exactly why I want to make this distinction right when we say points to the top of the stack That means it literally has the memory address So if the top of the stack is that oh 804 24 AB that those that hex value will be inside the ESP register Cool, and we have basically two operations onto the stack pushing things on and popping things off So a push instruction push EAS Says decrement the stack pointer. So we're pushing we're decrementing. We're going down towards lower value memory Decorate the stack where then put the value that's inside the AS onto the stack in the location pointed to by the stack pointer So we move it down Put EAS there and we're done and then pop is literally just the reverse So pop you guys as take whatever ESP points to at that memory address Put that value inside EAS and then increment the stack pointer How much do we have to increment or decrement the stack pointer here so the size of memory? Yeah So if we're storing whatever's in EAS onto the stack we need to store 32 bits there four bytes So we need to move increment and decrement the stack pointer by four Cool. Let's look at a very simple example Here again, we have the stack at high numbers to low numbers We have So something at the stack. So let's say the stack currently points here Let's say this memory location is at 10,000 in hex. Does it matter exactly where the stack is? No, it doesn't really matter for these purposes But if I'm saying that the stack points to the value 10,000 hex then what must be true about the registers Yes, exactly the value inside ESP must be 10,000 hex right those things are Linked okay cool other important thing about the stack So what do I know about? Values above this number So what can I say about memory that's above this stack? It is part of the stack right do we know what those values are? No, we don't necessarily know it may depend on runtime. It could be any crazy thing But why do we know that those are in use or as part of the stack? Because it grows down which means anything above the stack pointer must be in use by somebody or could be in use by somebody What about this stuff below the stack pointer garbage? It could be anything There could be any cruft or whatever was here on the stack, but fundamentally the stuff here is not in use So if you think about it in terms of memory usage, well, we're using everything here But everything down here is free to use and take Yeah, would be other like important Program stuff below it, but then it could grow into sure, but we'll ignore that so there's I guess technically yes, but the way it works is you'll hit a page fault or a set fault before you reach anything else So you probably won't And I wouldn't say add it to the stack it's a push and pop Yeah, okay, but the important thing is everything below is garbage, right garbage memory. We don't know what it is It could be literally anything So let's say we just have an incredibly simple program Pushy x pop ebp from our intuition. What should be the result after these two instructions executed? Like what's the purpose the semantics of these instructions? So what should happen at the end? So you think at the end there would be no result? We need the value So there's no result to esp so esp is pushed and then pop right the stack pointer should remain at the same location after the fact What are the registers? So it's going to take the value from the x put it onto the stack and then what's the next instruction to do Is it Then pop that value into ebx. So then what does so after all this executes? How does eax change? It doesn't how does ebx change? Yeah, now here's the value of eax so essentially we're copying from eax into ebx using the stack But let's look at how this works. So the only important registers here are eax. Let's say it has the hex value a Ebx has the hex value zero and esp We already said by this diagram if i'm saying the stack pointer is pointing to 10,000 hex it must have the value 10,000 hex good Cool. All right, so we are currently about to execute this instruction. So we don't have an Eip the instruction pointer here, but we can say that that points there And so we say okay after this happens. What must happen? So a push eax first decrements the esp value by four pointing here and then copies whatever is in eax at that memory location So the stack so the esp register will decrement by four ffc fffc, sorry The stack pointer moved down eax will be copied there And then now we've done everything a push eax. So we've taken the value inside the eax register Put it on the stack and now the stack has moved down one. So now one more item is being used on the stack good questions Yes Yes, that's why I mean you should don't remember where it moves But yeah, so it's moving down. So pushing moves the stack when you're down and popping moves it up That's why Exactly yes, so everything exactly now eax this value is in use by the stack Right, so that value's in use and everything below it is garbage Cool. So now when we do a pop ebx now, what we're going to do is We're going to fetch whatever's in memory address fffc And we're going to say copy whatever's there into e into ebx because we're going to pop ebx And then because we're popping we're going to add four. So it's literally the exact opposite sequence of pushing right push was decrement then copy value pop is copy value and then increment So pop ebx copies whatever's here into ebx, which is now 0 x a And adds four to esp. So now esp points here So now this value that's below 1000 is garbage, right? We know it has the value x a But we don't care because it's essentially garbage at this point So and did the execution that we just walked through here follow our understanding of what would happen to the registers here Yeah, so we essentially copy from the as into ebx Why didn't the stack pointer change as a result of these two instructions? Yeah, we have equal numbers of pushes and pops, right? But because of that the stack pointer will remain at the exact same location after that's all done Questions on this Yeah, no, you're right Sure Well, why don't you go through? So pop happens. What's the first thing that has to happen? Yeah, where? Yes, where do you put it? ebx register because that's the argument to the pop instruction So you're going to copy o x. So you're going to essentially dereference ffc Dereference that say whatever's there copy that into ebx Oh Exactly Exactly And then that's why then yes, even you incremented by four in order to do the pop and moving the stack up one Any other questions this is an important base concept that we're going to build on cool all right So now that we have that we mentioned that We need some way for Functions to use the stack in order to store their local variables So it's actually I think slides I stole from when I used to teach 340 Because that's the function frame is a core concept of both compilers and also for For understanding buffer overflows So we have functions that want to use the stack to allocate space for their local variables Um So can we use the stack pointer for this? So let's say uh, we just have to figure out so Like a function executes And How do we map so let's say there's a local variable integer a how do we map that to a variable on the stack? Yeah, maybe we could push it onto the stack But then we need to whenever we reference it we need to figure out where we are from the stack pointer to reference that Like how much offset from the stack pointer we need. Yeah Yeah, so one option would be let's say, uh, so my functions are executing. I know I have two local variables So I move the stack pointer down by eight to allocate enough space for my two variables on the stack and then I can Use offsets of ESP in order to reference all those variables but this gets a little bit tricky because As the function executes the function itself can push values onto the stack for various reasons So it's a little bit of a pain to do this. So the other notion Is well, how about each function sets up a what they call a frame pointer or a base pointer That every local variable is an offset of that base pointer So this is actually, uh, so everything is on x86 There's a nice, uh register called the base pointer, uh evp This specifies the base pointer of the currently executing function. This will make much more sense with an example. So Let's say we write some code. So we have a main function We have int a int b int c. How many local variables do we have? Three how much space do we need for those local variables? Four how many how much size is an integer an int? four bytes So eight and then How much is a float? Four as well, I believe a long is eight Then a long long is more. I don't know it's a 30 32 maybe Yeah, let's look to make sure but we'll see the compiler will tell us Then we have some code. So we have the declarations and it's actually, um Well, anyways, we'll ignore that for now, but okay, so then we just have some code that's doing stuff Right, we have our computation set a equal to 10 Set b equal to 100 set c equals to 10.45 a is equal to a plus b return zero Right, so this is what we give our compiler as input It seems like a pretty simple function, right? Do some commutation and then return from the function. So what has to happen? So the compiler Needs to say, okay for this main this main function I have three local variables So I know I will need space on the stack to store at least what we say 12 bytes on the stack And then I need to figure out on my stack Where is all the offsets for my local variables a b and c for instance? It'll just say hey at Evp plus some offset a will be local variable a at evp plus some offset b Which is obviously should be different from a and not overlap Will be b Then c will be at evp plus c And then at a conceptual level all of this becomes Essentially very easy. So using this how would you compile this a is equal to 10? a is not an register a store specifically So this is like this is the important thing, right? So The compiler will use registers to make things faster, but these are all local variables meaning they need to be stored in memory on the stack But you were saying something that I think was So Close but specifically where on the stack yet. Where are the offsets for a? Yeah, so move at 10 at evp plus 10 So evp has a memory pointer. We'll we'll figure out how that gets set up earlier But assuming it exists at the point at the correct place on the stack for this function. We'll say We'll copy 10 into evp plus a yeah sorry Probably a silly question. I thought you only had to store local variables on the stack if If you made a function call something else to kind of save your place for when you come back From the other function call. I thought that you would use normal assigned memory to the program while you're still in the same phone We mean normal assigned memory to the function I felt that whenever you ran a program you got the sign section of memory or The program that you requested. Yes, so that would be global variables. So global variables gets Get compiled to essentially a fixed location in memory. So they can be referenced by anyone else Technically the compiler actually technically the compiler could look at this and say does the return of this function depend on any of this computation No, no, so you get rid of this and replace this with a function that does return zero Right technically, uh, it could also realize. Oh, I don't actually need to store anything on the stack I can Optimize these out into registers, but when you compile it specifically with no optimizations, it won't do any of those steps Yeah, that's why I'm looking at compiler output is difficult because it changes all the time So ESP can change throughout the execution of the function. So what makes this really nice is every place I see each year I'm going to replace them with EEP plus a So here's EEP plus a EEP plus a EEP plus a If we use the stack pointer We would have to keep track of if we stored any temporary variables very sorry any temporary variables onto the stack We'd have to store figure that out and adjust for that when we reference a later So x8664 I think can optimize the way the use of the base pointer and just use the stack pointer But that makes it more confusing. Yeah Well, I didn't say what it was Yeah, just some offset right now. Yeah, it's theoretically doesn't have to be a negative depending on how you set it up We're gonna get there. So you can compile these basically one to one you can say well, uh The memory of EEP plus a is equal to 10. So set the memory of EEP plus 10 equals Plus a equals 10 And we just compile these line by line b memory EEP plus b is equal to 100 Memory of EEP plus c is 10.45 Memory of EEP plus a is equal to mem of EEP plus a plus the memory of EEP plus b nothing super complicated here And so looking at one compilation from this, let's say the compiler decided that a was at EEP minus c b was at EEP minus Eight and c was at EEP minus four. Are we gonna say that? So then looking at the assembly instructions, we can then map these to these functions So this is one time when I compiled this a while back If you do this on your own, it will be slightly different depending on the compiler version um So we first have two instructions that we'll ignore for now, but essentially we can say that This subtract hex 10 from esp is moving the stack down to creating room on the stack for our local variables So and essentially what we're doing here is moving the stack pointer to the base pointer. So the base pointer is Where the stack currently is that we're moving the stack down by 16 bytes And then in all these cases So here we have our first a so a was at EEP minus c We're moving hex 10 into EEP minus c right We're then moving 64 hex 64 into EEP minus eight Hex 64 is 100 I mean i'm fairly confident. I memorized that but based on this example And then we're moving hex four one two seven three three three into ea x. What is that value? What was it? Yeah, it's not 110 but 10.45 in I truly floating for it more math so But it's not stored in memory yet. We need to actually store that in memory So we need another there's another instruction to move eax into ebp minus four Yeah In most cases yes, so when you're looking at assembly code It's used to define the function frame. Yeah I mean technically code can do whatever it wants But the convention is when you look at what we're going to be looking at x86 32 bit linux systems. They will use that Yeah Yes, we're going to go through a visualization of this and walk through exactly these statements. So that should help clear that up Um, then we need to do the compute. So now we've done up to here, right? So at the end of these instructions ebp minus c will have the value 10 ebp minus eight will have the value 100 and ebp minus four will have the Floating point i triple e floating point of 10.45 Yeah Yeah, so this is a um I remember correctly the x 80 Yeah x86 32 bit instruction format you can't move like This value literal that you're moving here can't be a full 32 bits So here's a full 32 bit number. You can't express that to move that here. So I don't need you need to Use extra steps um Yeah, I don't know the default answer is usually because like because the compiler did it that way It's sometimes difficult to define why yeah, we'll see that Again, I think it's efficiency. So instead of pushing because here so you think So this is 16. So you need four pushes in order to change the stack pointer to where you wanted it to be Local variables could be when you have a buffer. That's a local variable. That could be 512 bytes. I mean it could be a huge value so This value moves the stack pointer down In an efficient way without pushing the box. Yeah Yeah, we're gonna get to that we'll see that for a second Okay, let's look finally at the last bits of computation move So move ebp minus 8 into eax. So now we need to get the values from memory to perform commutation registers So here we have ebp minus 8 which looking at this handy chart is b moving that into eax We're gonna add eax, which is b To ebp minus c and store the result in ebp minus c So here we've done a is equal to a plus b Because we put b in the eax register and a is that ebp minus c And then there's other stuff, but we'll ignore that for now. So let's look at what's going on here. So here's the code Now here's our handy dandy stack Let's say again the stack is pointing at 10,000 Uh, nice thing here is that we only need eax, esp and ebp. So there's only three registers we care about We now which of these registers do we absolutely know the value of? Yeah esp why esp Right because this pointer here. I just said it's 10,000, right? So we don't know exactly what that value is, but we can say it's there so And before this code executes, we don't know what's inside eax. We don't know what's inside ebp We don't know what's Above on the stack and we don't know what's below the stack, right? Just somebody called this function All right So what's the first thing? So what's this instruction at the top going to do? copy esp into ebp Copy esp into ebp. So ebp will now point to 10,000 hex And then what about this line? Yeah, it's going to move the stack pointer. Did this affect the base pointer? No, right because they both have the same memory address, but you're only operating on the esp register So we're going to move down four of these values Right, we're going to subtract 16 So now I have two pointers. So what are these two pointers? ebp and esp right and I know them because they're in these registers here Right, this is a visualization of where these point to So essentially now looking at this What did this subtraction do for us? It gave us space to work with that is Inuse by the stack and nobody will mess with Right because we know now the stack pointer points down here Everything above that stack pointer is in use by the program and nobody else will mess with it So what I'm going to use about Using the interaction of zones But later you're asking about this program So say that the program below it pops out all this stuff and it's done with it You mean programs or functions or functions? A function is a totally different program Different programs are in different memory spaces. They can't mess with each other's memory So each have their own stack? Yes Okay, that's fine Yes, every process every program has its own stack You get into threading later that gets slightly more complicated But essentially all this code will have access to its own stack Cool All right So then we just walked through these steps. So now EVP minus C We know it's exactly so this is EVP EVP minus four EVP minus eight EVP minus C So we know that this move 10 into EVP minus C Will move 10 here onto the stack Right. So essentially our local variable a just got set to the value 10 Then move 64 into EVP minus eight will move 64 here onto the stack EVP minus eight And then these two operations moving that floating point value into EAX and then moving EAX into EVP minus four puts that there onto the stack I don't know what that oh So essentially you can see on the stack we have the variables a, b and c laid out onto the stack Why didn't I decide to put them in this order? Yeah, no, I don't know. It's Don't be fooled by patterns. I don't know. It's weird stuff happens now Yeah, it is Again a compiler. It's actually I think an optimization per Weird architectures. So some architectures your stack pointers have to be I think four by the line or something like that And so in these cases, they'll always allocate to that space But yeah, there's some crazy gcc forum thread about this like I don't know compiler stuff is the real answer Yeah No, I don't know. Yes. Sometimes maybe depending on the compiler Kind of it's difficult to get into really. I mean the real thing is and this is why we're looking at specifically this This is why I'm looking at the compiled binary output is so important because this is what actually happens right that c code of A, b and c is essentially a fiction as to what is actually going on But yes, I'm sure compilers do all kinds of weird stuff Yeah, yes their own memory space. So each program thinks they have exclusive access to all In this case a 32 bit zero to two to the 32 memory So you have to explicitly enable memory sharing between programs But yeah that mom This one So we're moving so what's this value four one two seven three three three three Are you coming? What must it be so what was Yeah, so 10.45 and I can really float it right. I don't know that just deduction right so we can see 10 we can see 100 This is the only other value in the program So the compiler decided the best way to move this value into ebp minus four is through eax So those two instructions The first instruction copies that value into ea x and that second instruction here copies that value onto the stack To register the two things So the bottom two of those three registers are one that has a convention to it. Yes Yeah, ea x ebx ecx vdx. They're general purpose registers They also do have some Important values later that we'll talk about return values are put into ea x, but we can Got some conventions sometimes let's say What is Because next question Say it again So you can't I believe Based on the x86 syntax you can't You can't move a so this value is 32 bits, right? It takes 32 bits to describe this value You can't move a 32 bit immediate value into a memory location. I think that's too long of an instruction It doesn't you can only move an immediate into a red 32 bit immediate into a register Whereas these other instructions are smaller. So it only needs to encode that byte and says move this into ebx minus c Yeah, it'd be much easier. I mean it'd be more clear like that, but Cool. Okay, then we'll now we perform our computation. So now we're going to move ea x into ebp minus eight So now ea x has the value 64, which is what? What variable name in our c code? What was it? b right, so moving b into Eax and then adding eax to whatever's at ebp minus c and storing the result into ebp minus c So ebp minus c is here a on the stack. So we're adding 64 to hex a And we will get hex 6 e which should be 110 And then we're done good questions. Yeah Um, so Yeah, so an ad instruction means add this value and this value is stored in the last value As there's an implicit uh destination in there. I believe that's consistent with all Or most anything else Gonna ramp up. Okay So We saw how this allows us to allocate space and memory for the function's local variables, but When we want to call another function What do we need? So think about you're writing a function to call a function What information do you need to call a function? Yeah So the address of that function. Yeah, where that function's located. What else? Where you came from where you came from you need to know how to go back after that function's executed. What else? You need Well, you want your local variables to remain the same after you call a function, right? If that function changed your local variables, that would be bad. What else? Yeah, the parameters that you're passing to the function. What else? Yeah, you need the return value from the function, right? Oftentimes you call a function You want to see what it returns? Awesome. So we need return value parameters and We want essentially we want to think of we want to call a function and have Everything be exactly where we left it, right? We don't want a function to come in and move stuff around So We need things like well, we'll go into this. Uh, anyways What we need is some kind of calling convention because we need to understand. Okay. Who does what? So when i'm calling a function, do I have to save my base pointer? Or do I expect the function that I call to Do my base point like save my base pointer for me Right, so and where does the where does the function put the return value? Right, because it's easy when we're programming we say it's a equals function foo Right, but as we saw it's all assembly. So where does that return value go when the function returns? Who deals with this return address of where to go back when this function is done executing? So we need some kind of convention Um And this is one of these things that varies based on the processor architecture the os the compiler or specifically the type of system call So even on right now, we're going to focus on linux x86 32 bit But still there's a different system call convention or sorry There's different calling conventions if you're calling a function versus a system call to the operating system and we looked at those briefly with um looking at system calls from assembly But we can ignore that. So we will be looking at cdecal This is the calling convention we will focus on so before calling a function the caller of the function first pushes All the arguments to the to the function on the stack In a right to left order in terms of parameters Right, so you think calling a function you have abc So before calling the function you first first push c onto the stack and then b and then a Then the caller is the one who has to then leave the address of the instruction after the call We'll get into this more in a second Whoa, sorry, and that's all that they need to do So that's all that the caller needs to do the call e is is responsible for everything else So the call e is responsible for saving The previous frame pointer so it needs to save the frame pointer before it uses it It then needs to create space on the stack for local variables, which is what we saw And then ensures that the stack is consistent when it returns so it doesn't Return with a weird different stack And then finally the return value needs to be put into the eax register So this is another instance of a Register having a special value in the context of calling a function. This also means that the compiler That when we call a function, we can't expect whatever was in eax to remain the same So there's other semantics here about what functions can use Or have to save what registers before they use that, but we can ignore that now It doesn't really matter for what we're talking about here Okay, we should probably look at an example of this because this will make a lot more sense so We have a very simple function main in a local variable a It's going to call a function called call e pass in 10 24 Set that value to be whatever the return value is to be a and return a call e is a simple function that takes in two parameters two integer parameters a and b And returns an integer and just returns a plus b plus one Pretty simple program and function Okay, so let's walk through the exact so now we're going to look at the exact assembly output of these functions so how they actually got compiled So here's the function main the function main now is main special as a function? Yes In the sense of what? Yeah, it's where the program starts, but is it special in any other aspect? No, so it needs to actually go through these steps as if it was called by somebody So it needs to store the previous frame pointer on the stack It needs to create space on the stack for its local variables It needs to ensure that the stack is consistent on its return and it needs to put the return value in the ex register So this is an important thing. I used to Have a crazy code on A 340 maybe it was an exam or something where main would like call itself multiple times as the program executes Just to reiterate the fact that like main is not a special function at all So first thing it needs to do It needs to save the previous base pointer because it got called from somewhere We don't know where it needs to store that functions base pointer Then it needs to create space for its stack. So it's going to move First set up its base pointer by moving the stack pointer into the base pointer This is exactly the same as what we saw before Then it's going to subtract 18 from the stack pointer Why 18? in hex Which is about 32 How much local space does this function need? How many local variables are there? One how what's the size of those variables? Four bytes why doesn't need 32 bytes? Yeah Yeah, so it's like optimizing space for the arguments instead of actually pushing the arguments on we'll see that it preallocates space, but All right, so let's look at the next so we first Move hex 28 into esp Plus four so plus is going to be which direction of the stack the stack pointer Down is negative. So plus four is up into memory that is Allocated in memory. So we're going to move hex 28. What's hex 28? 40 it must be 40 it's not 10 Okay, 10 is a so it must be 40. So moving that into esp plus four Moving zero at where esp points to So now we've actually set up the stack with all the parameters that are required for this call We've pushed onto the stack from right to left 40 and then 10 And then we're going to call this function And that's all we need to do to call a function Then what do we do with this return value? Where do we get this return value from? eax yeah, so we can just move whatever's in eax into ebp minus four. What's ebp minus four here? Hey, yeah, it's just where the compiler decided to put a little variable a And you'll see the compiler does some stupid things because it's doing this just to be correct It's not trying to optimize. So it does things like move ebp minus four into a why is this silly Yeah, I was already there literally moved it from eax into ebp minus four and then ebp minus four into a And then we'll have two instructions that we'll see they're important We're going to do the leave instruction and then a return instruction. So yeah That is just where the compiler decided to put the local variable a so in a here's a local variable compiler just decides it's at ebp minus four Okay, important things When looking at this function, what does this function actually do so looking at the actual like meat of this function? What does it do? Technically yes, but what does this program actually do? Like this function No, just looking at me in isolation not looking at the whole thing It calls another function by passing it the parameters 10 and 4 and then returns that value right so In here really the code that gets called is It's essentially in here right so this is setting up 40 10 calling the function and then setting up the return brand and the return value And so it's important to understand and look at every function has this prologue that essentially sets up The as we saw what what a function that gets called has to do it has to store its base pointer It has to set up local space for its variables Okay, and then In the epilogue when it leaves it needs to go back to where it came from and needs to make sure the stack is consistent Right, so we'll see exactly what those do you can essentially think of it as leave and return leave is essentially the Exact opposite of these three instructions So callee is much uh is simple right all it does is take its parameters a b add one to them and return Does it have your local variables? nope So callee has to store its caller's base pointer So it pushes its base pointer onto the stack It moves the current stack pointer into the base pointer It then moves ebp plus c into eax It moves ebp plus eight into edx It then adds one to them. So it's a weird uh this load effective address takes eax plus eax plus one and move it into eax They don't know why the compiler decided to do that. It just did. Oh, no. No. Sorry The one doesn't happen from here. The one happens here. That's right. Add one to eax pop ebx and then return so again callee itself has a prologin epilogue it's slightly different just because it's not It doesn't have a subtraction step. It doesn't have your local variables here So what's this uh edp plus c and ebp plus eight here? Yeah, that's where so The base pointer as we'll see what a function gets called is So we saw everything negative from that base pointer is local variables Everything positive from that base pointer is all There we go. So those are uh the parameters that gets passed in we'll see exactly why that works We can think of it as the opposite of here But here moving them onto the stack with all functions if the function takes them up by going offset of the stack Yeah Because the callee doesn't have this subtraction here So the compiler decided that it would be simpler to just do those two instructions Okay, let's step through this code So we have our two functions leave callee And now we're going to get even more crazy. We're going to give each of these Assembly instructions and address remember it's important to remember that these are just pieces bytes in memory at certain memory addresses right and the compiler These are the cq is very stupid. It essentially starts executing at 804, 83, 85 Does that does the next thing and just keeps going until it needs to move or change or do whatever so Um, we have our nice handy dandy stack And now we have our registers. We're using slightly more registers than before but actually not that much more So we're just we use ebx somewhere and eip will be the instruction pointer So this will be the pointer to the next instruction that needs to execute And we'll say the stack is up top when I ran this and this is actual output from debugging this from when I ran this like the memory locations and everything so fd2d fd2d4 will be the stack pointer cool Do we know what's in the base pointer? Or any other registers do we know? We know one or we should know one. Let's see. Yeah, we know one is the instruction pointer. So the instruction pointer is 804, 83, 85 that means that's the next instruction to be executed Now is there anything special about what's going on? All just the semantics of what these instruction needs how it operates memory how the stack moves how the registers move right so just going through them and Demonstrating what happens here based on what happens here. So again, why do we do push ebp? Yeah, somebody called us. It's that function's base pointer We need to save it so that when we return to them we can restore that base pointer So we don't know what it is. It has some value Uh, let's say it's fd2 c0. So when we do push ebp now So the stack pointer moves down so the stack pointer moved to fd2d0 And we move fd2 c0 onto the stack which was the previous base pointer Good. So hopefully if everything works right by the time made leaves When it returns to wherever it came from the base pointer will be fd2 c0 Good Now we need to set up our function frame which we looked at We move the stack pointer into the base pointer. So now this the stack pointer points to fd2 d0 So the stack pointer base pointer now points to the same thing We're going to allocate space for our local variables. We're going to subtract Hex 18 from the stack pointer moving the stack down here. The base pointer doesn't change Because the base pointer we need that for offsets of our local variables Good. Yeah Get fd2 d0 By pushing ebp we subtracted four from the the stack pointer So at this point the esp has fd2 zz d0 because it moved down one We can debug backwards in this go forwards and backwards through time No, okay All right, so next up now we're going to get ready to call this function So we move the hex value 28 which is 40 into esp plus four So esp is that fd2 b8 four above that is this memory region here So we move that there We then move hex a into esp And now we've essentially pushed onto the stack right before we call this function we've pushed 40 and then 10 onto the stack. So we've done our share for calling a function We had to push all the parameters to the function on the stack from right to left order So that the function can find them Questions on this Okay Now we want to call this function and again, we don't really know it's a function all we know is that some memory address 804 83 94 start executing at 804 83 94 But we need to be able to go back. So specifically In this example, what address do we want to be executed after this call is returned? Three bf specifically this instruction right here 804 83 bf how does The function that we call how are they supposed to know that that's where they need to go The base pointer is fd2 d0 Yeah, fundamentally. So think about this remember functions can be called from anywhere So a function doesn't necessarily know all the places it can be called from so essentially How many people know the i'm not going to go into it in detail, but the story of hanzel and gretel Right, so they leave the house and they leave a trail of bread crumbs in the forest that they go into the forest So to find their way back home. They follow those bread crumbs So we need something similar. We need a bread crumb that lets us know how do we go back to the function that called us And that is exactly what this call instruction does so the semantics of this call instruction do two things one thing says Start execute essentially change eip to be 804 83 94 because that's the next instruction to execute What it also does is it says push 804 83 bf onto the stack Specifically this because it's the next instruction after the call instruction so what the stack looks like after this Is we've stored onto the stack 804 83 bf. This is our bread crumb So that the holly function knows where to go after it's executed. Yeah, we lose the original instruction pointer that called name Yeah, eip is now 804 83 94 So there's if we just had a if this was a jump 804 83 94 we'd have no way of knowing where we came from That's why we have to build in this mechanism of these bread crumbs of how to go backwards And that's by saving the instruction pointer onto the stack There we haven't looked at that but technically if we worked this backwards We would see right above fd2 d4 Or actually right at that memory address would be the return address for me Cool now callee is executing It has no idea who called it or what the environment is right just like when we looked at main We have no idea who called it and so the first thing main needs to do is store the base pointer of the person who called it so caller callee is going to do push evp And then it's going to move the stack pointer to the base pointer creating their own base pointer So now we've basically gotten rid of everything from main We don't know where main is But we've stored on the stack enough information to be able to go back to main and set the entire environment up so that main can start executing Good questions The call instruction So the call instruction does two things sets evp to this value or whatever it's supposed to go and pushes this address of the next instruction after it onto the stack So it's like a two-parter So now callee needs to do its computation, right? It's done the prologue. It's set up its function frame Now we can answer this question. What is evp plus c? So specifically let's walk through this. So what's at the what is at evp? The saved base pointer. Yeah, whoever calls us base pointer. So we'll call the save base pointer And then what's that evp plus four? Yeah, so the saved instruction pointer, right? It was whoever called us Right, so save base pointer save instruction pointer. And then what's that evp plus eight? The first parameter The first parameter right the leftmost parameter the first parameter to the function is that evp plus eight and then what's that evp plus 12? The second parameter So When we look at this code we see oh evp plus c into eax It's moving the second parameter, which I think was b into eax Right and because of the calling convention This function callee knows that the arguments to the function the first argument will always be at evp plus eight And the second argument will always be at evp plus 12 And it can guarantee that because it knows that if anybody wants to call it They have to push their return address onto the stack And they need to and callee itself stores the saved base pointer the base pointer onto the stack So this has to be true based on the calling convention convention Okay Questions Cool, so let's walk through this computation. It was just add the two parameters together and then add one So we're going to move evp plus c into eax Oh, yeah, so before we do that now looking at the function frame We can essentially see that This function frame belongs to callee and that function frame belongs to main But going forward so we move 28 into eax so evp plus c Which is here move 28 into eax moved evp plus 8 into edx Add them together and put the results into eax. So eax now has hex 32 add 1 to eax And What were we returning from this function? Yeah return a plus b plus 1 based on the calling convention. Where do we place the return value? In eax, what's the value of eax? a plus b plus 1 Right, so we've added our two parameters together and added one to them Now we need to do this whole thing in reverse Right, so now we need to so what's this pop evp going to do? Yeah, so take whatever's going on the stack. I remember we know that currently the stack pointer Right now is pointing to the saved base pointer So by doing a pop evp, we're going to put this value back in here. Whose base pointer is this? Main's base pointer, so we're going to restore main's base pointer We do a pop evp So now main's base pointer is restored And then now you do a ret so a ret is the opposite of a call instruction. What did a call instruction do? Changes the evp and close Yeah, the next address so so Changes the instruction pointer to call the function and also pushes the save return address on the stack Now we need to do the opposite. We need to go execute. So essentially you can think of a return is a pop evp Because what are we doing? We're taking this address We're going to move it into evp and we're just going to start executing there and the stack's going to move up So pop evp How does the cpu at this point know that this is Where main wants us to go The call function had saved that evp on there, but how does it know it's exactly this value? Sure, it's done symmetric operations, but how does it know that this value hasn't been changed since we called this function? It doesn't it's dumb cpus are dumb. They don't do checks like this unless it's in the code, right? This code all it does this return function says whatever 32 bit value is here. I will start executing from In this case, it's going to go start executing here at bf and continue execution So what I want you to be thinking about as we continue this is what if as an attacker we can control and overwrite that value Yeah Technically, yes, sure But that would be more like the question that would have to be in the code itself not necessarily the cpu doesn't magically know That the value is correct. Yeah those addresses yeah Yes, I mean for all things unless there's there may be a position independent executable and with set their offsets But you can just debug the program break on main and you'll see the actual literal memory buttons. Yeah Uh, I remember initially Yeah Yes So the call instruction itself pushes the it Sets the IP of this value and it pushes onto the stack to be next Be addressed in the next function The call instruction itself you can think of it. It knows how big it is so it can tell you exactly where to go after So then from this return So this return is just pop eip Now we're here. We've changed our eip and from main's perspective. What happened? I think close Yeah, it's changed eax changed Right, but nothing else did the stack is still exactly where we left it Right, we can see that there's other values that got pushed here, but do we care about those values? Why not? They're garbage. They're all crap. That's after the stack pointer after the stack pointer, right? So it's not part of the stack. It's not part of allocated memory. It's garbage memory We don't care what those values are but we got our return result in ea x and that's what we wanted So now we're going to use that. We're going to do these two silly operations We're going to move ea x into ebp minus four and then move ebp minus four into ea x Where now we need our epilogue so now we need To do the same thing that collie did of set the stack Exactly where it should be So a leave instruction Does two things so we have our base pointer and our stack pointer So a leave instruction says Set the stack pointer to be the base pointer. So what happens to my two pointer pointers that then? So the stack pointer moves up Right, so base pointer is up the top stack pointers down here Leave says move the stack pointer Move the base pointer into the stack pointer So it points up here and then do a pop ebp. So it does two things it it Gets rid of all of this allocated space so that no matter how much we subtract from here We'll get rid of all of it Move the stack pointer to where the base pointer currently is and then do a pop ebp to set and restore our saved base pointer Yeah Yes, because collie doesn't uh subtract from the stack pointer So the compiler knows that it's pointing at exactly the same place as ebp And I think based on space a pop ebp is less instructions than a leave But I'm not 100% certain on that but anyways the compiler decided to So here we have our leaves. So the first thing that gets done is the uh Stack pointer and the base pointer gets sent to the same value Then we pop And whatever the value that was there into the base pointer. So now the base pointer is fd2 c0 And where are we going to go execute? Yeah, whatever is at fd2 d4 whoever called main and just keeps going like that So all function calls all operate on this basic philosophy Questions yes Yes, because it's just a memory offset. So Pushing and popping values on the stack Change the stack itself. So when you want to do that Uh, you can use that you can also you can just like Subtract 18 from the stack pointer to allocate space onto the stack You can Just use the stack pointer or here. We're using the base pointer. Just as an offset right so To get values onto and off of the stack similar to exactly what's happening I think here is what we're talking about of like esp plus four and esp okay, so For stack overflows This is all background to get us to the point where we can understand what's going on with the stack overflow The idea is if we are copying data without doing any bounds checking oftentimes Did anybody well, I'm sure this happened so On was it assignment one the secure house assignment one of the inputs was incredibly large So you couldn't preallocate and advance the space necessary. That was specifically to trigger like bugs like this. So So normally if you overwrite part of the buffer, you'll end up writing to unmapped memory There'll be a seg fault. You've seen this in assignment two where you found seg faults in each other's codes But as attackers We want to take over the execution of a program so What if we overwrite the return address to something we choose as attackers? It's possible. I hope we're gonna get into shellcode. I think I skipped that we'll see but Uh It's possible to then jump to code that you as a user define as the attacker And fundamentally now this code that executes will be executed with executed with the privileges of the application. So Again, this is reiterating what we just talked about safety key and safety IP are stored onto the stack And fundamentally nothing prevents a program from modifying those values Right. You're an application. We said you have full access to your whole memory space You are free to read and write to all more or less all memory spaces in that program Um, so the question is what if they did do that? So I will briefly go through this so you can do this on your own time um So we just have a function that copies a string onto a local variable on the stack So this is a string you can pass in my copy that will copy that string pointer from there onto foo Looking at this we have a very basic, uh function call So here we have pointer some memory addresses. I just want to run through this so you can check it out on your own Ah I'm bumping it against fog. All right. Well our next, uh class on tuesday will be pretty intense too. So study up