 All right, folks, let's get started today. So as you can see, the current standings on the scoreboard, nobody's solved it all the way yet, which should be encouraging, although this fine hacker is currently in the lead, although you can see it's a pretty close race here. There's some stuff going on, so good jobs. Keep packing, and let's keep going. Okay, so on Wednesday before spring break, we were talking about memory correction, vulnerability, specifically buffer overflows. So why are buffer overflows interesting? Create unintended side effects, or intentional side effects. So explain, so we can create intentional side effects when you want. Overwrite the IP, so I can go to whatever you want. Right, so we can essentially look at it by conducting the control flow of the application, right? In a normal execution of a program, when you call a function, and that function returns, where do you go? The previous caller. Yeah, the previous, the person who called that function, right, the previous caller. But if we overwrite the same VIP on the stack, we can divert the control flow to something that we control. And by doing this very precisely, we can control the code, and essentially completely hijack the code that's actually executed. So this is kind of the general flavor of buffer overflows that we can overwrite memory addresses. And so, but where are places, where can buffers be? Can they only be on the stack? What was that? It could be a global variable. Yeah, it could be a global variable, which will be in the DSS section. So what could you maybe overwrite with that? Can you, are any say VIPs stored in the global variables? Probably not. But what can you overwrite? I don't know, that's a good question. Possibly, it depends on the exact memory layout. Because if it was, you could overwrite a function. Right, so you can overwrite something in the global offset table, which would overwrite where that goes. You could overwrite other global variables, right, which could control the flow of the execution. Where else can buffers be, can they live? Where? RGV. RGV, so it could be passed into our program, so it could be on the environment, which could then maybe let us overwrite other environments variables. Where else? On the heap, could be on the heap, right? It could be a malloc data, and maybe by overwriting that. So, the way to think about this, even though we focus on buffer overflows, because these are kind of the intro to memory corruption, but essentially, we're able to modify memory in the application. So, and in the case of a buffer overflow, we're limited to overflowing continuous memory from the beginning of that pointer to the buffer, and either usually over the buffer. But, we can generalize this to different kinds of memory corruption. We'll look at another type of vulnerability where we can write whatever bytes we want, wherever in the program, and that gives us a different level of control than buffer overflows, but still the same general idea. And so, you know, we can overwrite from a stack. We know we can overwrite the same EIP. We can also overwrite the same EBPs, which is actually a pretty cool type of attack, where we can then control where the stack is located and do other types of things. We can overwrite, we didn't talk about it, but we can overwrite pointers to functions. So, in C, you can pass a pointer to a function into a function, and then you can jump and call that function. So, if there's any function pointers, we can overwrite those, that potentially diverge the control flow. We can do all kinds of cool stuff. So, you can even, you know, traditionally we think buffer overflowing to overflow the saved EIP and then get control flow. But really, you know, what if there's a variable on the stack that says if you're the admin or not, a one or a zero. So, in this case, you don't even care about overflowing EIP before the admin. So, what you need to do is overflow by one and set that value to one, and then now you're an admin. So, as we'll see, there's all types of ways to cause overflows. When we looked at copy functions, it could be custom variables with array indexing, integer overflow by forcing an integer, loops, and we talked about the different ways of what to overwrite. So, this is kind of thinking of it in a very general way. So, this is trying to generalize this concept and thinking about, well, if I want to read a file, why not change the buffer of a file pointer that's being accessed to point to some file name that I want. So, rather than necessarily... So, let's say there's a value in the program that points to slash whatever, temp slash t dot t x c, but it's a site UID program. We actually wanna read or edit ETC shadow. So, we can use our overflow to change that value, right? We can change right to the program. It will now open that new file. Yeah, so are you talking about these? Okay, so the important ones here, let's say, are, I think we'll go into this later, changing the save base pointer, but when we talk about function pointer, so the global offset table is a very bright area for overwriting if we can. So, how does the global offset table work? Why does it exist? Going back to like a month ago. It's first call, it's allows for dynamic calling of functions, right? Why do you need to dynamically call a function? To save, because you don't know where your logical memory might necessarily lay out. It saves space, right? I think you can, oh, so when you look at the code of an application, right? So, most, I believe almost all of the call instructions are actually relative offsets. So, they say call negative 23 bytes from me and start executing from there. So, in that case, it's completely relocatable. It doesn't matter where and where you put it, and it doesn't have anything to do with the base pointer. So, why do we need to global offset table? Yes, libraries, that's the key, right? So, you gotta think, how many times have you called the print set function in your C code? A lot. How many times did you write a function called print set? Zero times, right, never, right? So, it's the job of the dynamic linker. We talked about there's two types of linking, static linking, which takes lib C and basically puts it all into one binary, or dynamic linking where at runtime it has to load the lib C, the .so file, put that into memory, and then change the offsets to all the pointers. So, the global offset table is what allows this to do that because the global offset table has code that that way your program, instead of calling the function printf, which it has no idea where it will be at runtime, it calls some little trampoline function which grabs a value that's in a fixed memory location and jumps to it. And so, at runtime when lib C is loaded, those values are changed to point to the actual printf location. We'll look at this more in a second, because it's actually super important. Because the global offset table entries are at fixed memory locations, they have to be, because it has to be changed, so it is not relocatable. So, as long as you're able to overwrite one of those values with, let's say, the address or shell code, you can redirect the control flow wherever you want. So, how are exceptions done in C? Any type of exception handling? Based on the return value, so. Yeah, so normally a lot of functions, so like when you call into the kernel or something, it's the return value, right? If it gives you something that's not zero, then that means there was some kind of error. And then you can interrogate some other value to see if there's an error or a mistake. But when you use a language like, I don't know, Java, Java has exceptions pretty well baked into the language. So, what's the purpose? Why do you have exceptions in a language like Java? Why not just check return codes? Yeah, so the calling code maybe helps recover from that error, right? Can catch that error, maybe change the environment, try different things, and then maybe make a new call. So C actually has similar functionality to do this type of behavior. But it does, it's called a long jump because you're essentially, if you think about how an exception works, it doesn't matter how far down the call chain you are, you jump back up to wherever that code was that defined that exception handler, which is similar to probably worse than other languages. So there's two important functions here, set jump and long jump, so set jump. Essentially you think of kind of captures all the information that's necessary to jump back to that program state. So it actually contains things like the base pointer, the program counter, any variables that are necessary. And then long jump actually jumps back to that one again. Does it save control stack? Depends, but yes, it depends on how it's used. So we can look at a very simple program. So this is a main program, Arc-C, Arc-V. So it has a jump buff. So this is the data structure called the ENV. And then so the set jump value is ENV. And then I believe it's similar to the fork where the return value of set jump tells you if you're returning from an exception handler or not. And so that will fill in this ENV variable with everything that's needed. And so that way when you call something like here in this else block, we're printing out the value high, we're calling some function F1, where F1 takes in a jump buffer and checks if there's some error condition, then it can do a long jump of error one to E. Otherwise it can pass it to F2, and F2 can then also do an exception handler. So from both of these locations, does long jump and jump back to that set jump in main? And so looking at exactly how the jump buffer is implemented, we can see that. So the jump buffer is an array of six integers, which is actually not that big to deal with all this. And let's see. So it has all of these values. I think the base pointer and the stack pointer and the PC. So you can see that it's storing exactly on the stack these values of where to jump to when there's an error. And so essentially the thing of this is this is another place where the control flow of the application depends on values that are stored on the stack. So if we can overwrite those values and we overwrite the PC and exactly the place that we want to, we can then, we don't have to necessarily overwrite the save EID because maybe we don't have enough of an overwrite there. Maybe we can only write over a little bit. But if we can control and overwrite and change this jump buffer program counter value, then we can overflow that and divert the control flow of the application. So yeah, this is a snippet from Long Jump, which takes an environment pointer and the I variable. So we can actually walk through this code since it's very simple. It's moving I into Eax. So it's setting a return value. Remember their turn value is always in the Eax register. It's setting environment. So this is the variable env.jumpbufferjdvp. So that's yeah. What's the nature of I? Is it a register or is it a? I, oh, this is a pseudo assembly code. So without going through all of the, yeah. So it's just getting this I parameter. So you can think that it's getting whatever we can write this code. It's evp minus evp plus 12. Because eight would be the first parameter and then 12. Although the site of confusing this env, I think it's that data structure. So it's actually a longer, so I guess. But fundamentally, we can, so basically we take that I parameter, which is the second class of parameter pass to Long Jump, set that to the Eax register, take environment, dereference it and access the jump buffer base pointer offset, which I think was four. And move that into evp. So now that evp is set up, set the stack pointer to change the stack pointer. And finally, and you can see that since you can't reference env.jumpbuff, you have to set all these other registers first. So basically that's how I would do it instead of registers copy them over. And then you finally jump back to whatever value is stored in the jump buffer at the jdv underscore PC offset. So this is where we can see that control flow is diverted based on the value inside this env data structure. So, yeah. Why do you have all these same SSGPR registers instead of same all of them? Because the color, maybe the color, actually the color, maybe the color, the other stuff, and this one too, and that's cool. That's, no, so this is, that's a good question. The short answer is I don't know. So I, you've got to look at it. It's probably depending on the environment and what things need to be saved or not saved, or there may be, maybe your set jump is something you have to do at the very start of your function to be compatible otherwise you may not get the variables you expect. So, yeah, that's a good question. So if we can override it, we can then control. So here it's, we're really all over our idle here, this is really just an example to show you that wherever control flow can diverge based on data stored up to the stack, or into a value that you can control, then you can overwrite values. So, basically the key is, the key here is looking, yeah, the key here again is, same VIP is not always your target. So you need to start thinking holistically when you look at an application and you think, ah, there's an overflow here, but it's only 120 bytes and the buffer is only 100 bytes and that's 120 is not enough to get to the saved VIP. So what things can I do? We'll can see if there's anything else on the stack that you could possibly alter, change, corrupt the value of. Yeah, you do all kinds of cool stuff. There's a paper, I think it's been two years now, maybe three years now, using security that was called data oriented exploits, so automatically creating exploits for a binary application that didn't take the control flow, so it didn't overwrite VIP, but by changing the data that was on the stack, they could achieve arbitrary computation, which was pretty cool, so yeah, this stuff's pretty important. All right, looking at more things about what we can overwrite, and this is, again, why I started to harp on a lot about what are the exact semantics of things. So you see a function like gets or you see a function like string copy or you see a function like string and copy, right? What exactly does it do and how do you find out exactly what it does? Yes, read the man page, awesome, all right, good. So, we can have a program, we can have a character buffer username, character password, this is code that you'll see all the time. String and copy are V1 into password of 512 characters, so what's the idea behind string and copy? Yeah, so only copy 512 bytes. This is good, right, we're using the proper function, right, if we're using string copy, is that vulnerable? Yes, absolutely 100%, that is definitely vulnerable. Because we cannot bound the value, but here we can say we're only copying 512 bytes from RV1 into the password field, right? That's good. Then we do string and copy RV2 of 512 into username. Then we print out checking password for user and we return and we call some function check password, and if you look at check password, so we're passing our password into this function, so we have a function check PWD, character pointer, which has a buffer of my password of 512 bytes, and do a string copy now of P onto the local buffer of my password, perform some check, checking it, and then return zero. So this looks good, right? So the code here, we're doing string and copy of 512 bytes, right? So either of these string copies overflow, the string and copies overflow of their respective buffers. No, right, they only, they will copy 512 bytes into each of those buffers. But what is the exact synantics of string and copy, yeah? Oh, a copy is the minimum of 512 or the null point. Yes, so we need to look at it because I actually don't know either, or again. I raised whatever first comes. Yeah, so let's look at man, string, and copy. All right, the string copy function, beware of buffer overruns, it's very bad. The string end copy is similar, except that at most end bytes of source are copied. Same for example, right, one end most, 512 bytes. Warning, if there's no null byte among the first end bytes of source, the string placed in destination will not be null terminated. So what does this mean? So if we do, yeah. It means that if you put in 512 characters that none are the null terminated, then it won't. Right, so you have a buffer of 512 characters, right? And you do a string end copy of 512 a's, which in rv1 would be 512 a's followed by null byte zero. So string end copy, will it copy 513 into your buffer? No. No, it'll copy 512 a's into there, but it specifically says that it's not gonna put a null terminated character at the end. So you'll have a 512 of those with no null termination. So why is that a problem here back to our program? So what's the possible size of this character pointer p that we're copying onto my password? Until another, this one. The size of the stack. So let's think about this. What is the size of the envelope or why did that to be? 512. 512, but what happens if I copy, assuming that the buffers are in memory as they are on the stack? So there's 512, so essentially character password buffer is 512 characters behind character user name. So what happens if I pass 512 characters as password? Is that a problem? Yes. Is it yet? I don't know, I don't know. Do I stop the memory yet? All right. Yeah, so let's say, okay, so let's think about this one. So I have 512 a's, I'm gonna string end copy into password, 512 a's into password, right? Fills up that buffer. It puts no null terminated by the end. And then after there I have 20 b's that get put into user name. What's now if I calculated the string length of password, what would it be? Yeah, 532, right? I'm gonna show you 20. This is, I wonder if it gets more, right? Because how to string length is the length of a string, right? It keeps counting bytes until it gets to a null character. And so it's gonna count the 512 a's and see is the next character a null byte. No, it's not, all of those bytes in username are part of the string length of password. So by using this, an attacker can fill the password field with 512 characters of whatever they want. Another 512 characters of whatever they want for username. And then when that's copied onto the buffer in my PWD, that can potentially be 1024 bytes, which is definitely enough to then reach the IP. So this is like crazy. I don't know about you guys, but I think this stuff is super awesome. And the fact that it comes from code that looks on surface level correct, right? Because if you're doing this, you'd say, great, are there any string copies with user inputs into buffers? No, they're using string and copy. And they're specifically putting the size of exactly what it should be, 512. But because of the way that buffers are aligned and because of the fact that string and copy does not multirminate the string, we can get the total size of password to be 1024 bytes, which overflows this my PWD buffer. So, so let's just recap basically what we're talking about. This is the entire idea here. This is why the semantics of how things work are so incredibly important and knowing them. And even if you've seen a function before, I do this now, you see a function, just look out the semantics to re-regit in your mind. Because this actually helps sometimes if you, let's say it's using get versus using RG1. Gets, let's see, oh, can't have new lines or yeah, can't have new lines or golf characters. Or can have new but not new lines or something like that. You just have to look it up for each possible usage. Okay, so you should always make sure that strings are all terminated and you should be very defensive about how you're doing this. So you can actually, so how can you change this code without changing the size of the buffers to make this safe? Start in copy. What was that? Start in copy. Say it again? Start in copy, start in copy. String copy. Oh, string copy? I don't know what that is. Does that see function? Instead of copy, the whole thing, you can limit it to 512 bytes again. Yeah, that's what they're doing. I'm gonna get the 512 bytes. You say not to change the proper size? Yeah. Okay. So you set the last one to a null. Right, so exactly. So what you should do, if you know the buffer should be 512 bytes at most, you should set the bracket 511 to be null after every operation, right? And that would definitely first fetch this because even so your string end copy will never overwrite that buffer. Or other ways to fix this is you can do string end copy of 511 to do 512 minus one and then make sure you set that last one or you could zero out the entire buffer first. But it's always good to be very defensive and say you need to understand the way strings operate. And if you want to make sure you have a null terminated string, you should make sure you have a null terminated string. Which one is better, zero out first or just set it in the last one? I would say set it in the last one because I think it's better to be explicit. That's kind of more of an implicit check because if somebody let's say changed it from 511 then case to 512 because they're like, why aren't we using up all of our buffers and your protection is basically eliminated. But doing an after the string end copy, you're making an explicit declaration in the code. I want the string to only have this one. Okay, all right, so other types of overflows. So we've looked at basically things like string copy which basically just concurrently or consecutively overwrite memory onto a stack. I'm pretty sure we did this, but so we know that there's no boundary checks, right? Which is clearly the problem here. All right, so it's approved that I did talk about this before, you can tell me the answer. So what's, so we have a character pointer A, we have an int i. So if I say ai is equal to 20, what does that actually mean? What do I put from A? Yeah, so take A which is some character pointer. Do pointer arithmetic so that it's i length away from A. So in this case it's character pointer so it's a byte operation and then dereference that. I'm pretty sure we talked about this where this is the reason why you can write i bracket A because it's just purely pointer arithmetic. So this means if an attacker can control this value of i, can I put negative numbers in there? Will the CPU gladly add a positive number to negative number? Yes, absolutely, and the CPU will also gladly dereference. So the idea is we may not be able to get an arbitrary or we may not be able to complete, let's say buffer overflow, but if we can control the index variable, we now have a fixed offset of the buffer that we can maybe write into depending on the the vulnerability. So here's an example, so we have an int main, arc z, arc v, we have an array of size eight of int. On the stack, we have an int index, an int value. We then set index to be string to L. So this is a libc function that converts a string into its long representation, which I think we'll do on the 64, 32 actually. I actually don't know, that's not that important for it here, so it's fine. So convert that in base 10, so whatever value you put in there, then convert value to base 16 of arc v2 and then set array bracket index to be value, and then return zero. What's the problem here? Yeah, we fundamentally have a 32 bit right, arbitrary right, wherever we want in an offset from buffer. So what would be possible targets? Yeah, EIP, so we actually know exactly the array. We know that array is that EEP, let's say minus eight in this example. It could be minus 12, it could be whatever, but we know it's down, located down in the stack. So this means if we overwrote, let's see array plus eight, what would we overwrite? So let's say it's at minus eight. If we did array plus eight and overwrote that one, it would be there, EPP, so we could maybe fill the base pointer, and then if we did array plus 12, we could overflow EIP, exactly. So using this, we can play with this, and then you can overwrite arbitrary values here, and so this is an example where you don't have a secluded buffer overflow, but we can still control EIP here. So where should I put my shellcode? So all that has to happen is my shellcode needs to be somewhere that is executable now, right? And we know that the stack is executable in the examples we're looking at, and we know that all of them our V-framers are on the stack, data on the stack, and it's using RG1 and RG2, but it doesn't do any RGC checks to see is that the only argument variable. So we can put it in RG3, we could put it in the environment because the environment is passed to the program, right? We can do any of these things to pass our shellcode. So even though we're not writing it out of the stack, this is another important concept, right? In a lot of our reports we look at, you have your shell, you have a not-sled, you have your shellcode, you have the address you want to override when you jump somewhere in there. But that doesn't have to be the case, your shellcode can basically be anywhere. Right, so you should check, I mean, this is the entire problem, right? With using C is that there's no bounds checking. So you as a programmer when you're developing your applications, you need to be very careful about making sure that you're checking the bounds in everything. Okay, so loop overflows are another case where we think the control or the attacker's input controls the number of loops that we execute. And so, in a special case, so has anybody ever written a loop that added an off by one problem? Long time ago? I would actually probably have a little bit of a closer look at that. I think there's a couple variations of this joke, but there's two things that are really difficult in programming. It's caching, naming things, and off by one errors. And so, yes, thank you, I'll be here twice a week. So, similar things can happen when, so it's very easy when you're writing a loop, like a for loop, instead of doing a less than, doing less than equal to, and so the loop can maybe execute more times than you were expecting. So, this happens a lot in off by one, so let's look at, ah, yes, this is a really good example. Okay, so, we have a function called funk that takes in a character pointer, sm. We have a buffer that's length 256. We have an int i. We then have a loop for i is equal to zero, i less than or equal to 256, i plus plus. Buffer i equals sm i. And that's it. So, we have a main if argc is less than two, so just check that we have two argument parameters and call funk with argv one. So, what's the problem here? Yes. Yeah, the problem is, we're overriding in our loop, right, how many times does this loop execute? How many times did the developer want the loop to execute at most? 256. 256, because that's the size of this buffer. But, the problem is, is the way it's written it actually can't execute up to 257 times. So, let's draw a staff because this example is very cool. So, let's look at, we have our nice off by one, our code, so we have our buffer of funk. And this actually comes up a lot in more advanced exploitation techniques that we might want to spend some time on it. So, thinking about this, so let's like, so we're gonna, we, so why is it important that this loop executes 257 times instead of 256? What does it actually allow us to do? How, yeah, so how much of the program's memory can we actually corrupt? One single byte, right, we could only control one byte of the program. So, let's draw this. So, we have funk. So, we know funk is, so we know we're gonna draw our nice stack. We know that the EVP is gonna point here and right where EVP points, we know we have saved EVP, which we actually know is the EVP of main, which is gonna be some value located higher up on the stack. What do we have above save EVP? Yeah, save EID. And then, what's at EVP minus 256? Yeah, buff, right? And we actually need to verify this with the binary code. So, we wouldn't want to just look at the C code and draw the stack because the C code is an illusion, right, the C code is what the programmer wants and Tyler actually makes the stack layout. So, it could be, you know, could put an I above that in which case we only can change one byte of I, which may or may not give me what we want. But, let's say for argument's sake that buffer's here and we know that the distance here is 256. So, by overriding this, right, we do byte, byte, byte, byte, byte, byte, byte all the way up. So, in this diagram, what do we control? We can control the least significant byte of EVP. Or, say again? Why isn't it the most because isn't it going, but you're saving reverse? Yeah, so it's going up, so we're going up. So, we will overflow EVP plus one and the way the byte layout is, is EVP plus one is the most significant byte of the save EVP. Yeah, it's because of the end-end. So, if this is a big end-end, then we would be controlling the most significant byte, but because of little end-end that's flipped around, we're controlling the least significant byte. So, can we control, so how many bytes are on this save EVP? Four bytes. We only control three. How do we actually leverage this into execution? Because it's crazy. We only get one byte of data, of memory that we're changing. But it's 256 actual byte values that we actually have access to, which is actually a fairly large window, right? So, of that, you maybe identify where in main you could actually change things, right? Yeah, but how's that work? Okay, so let's actually extend this stack up a little bit. I want this, okay. Getting rid of all this since we know this. So, what's above save VIP? So, this is all, we know this is all funk, right? F, U, N, C. So, what's above the save VIP? Yeah, so the stack frame of main. So, what does main stack frame look like? Yeah, so we need the char SM. So, we need character pointer SM. And then, what's above that? What if we shot the stack? Did it push anything? So, save VVP, whatever what main was called, the save main, save VVP. So, we actually know that right here is main, main's VVP, so we actually know that this save VVP points into here. And what's above the save VVP in main's function frame? Right, save VIP? Exactly what we like. And we actually know above that is int arc C. Above that, right, we know more and more about, we know that character pointer arc V. We know about that as the actual arc V data. Cool. So, how does changing the save VVP help us? Only change one byte, least significant byte. So, let's think about this. So, what happens when funk returns? We do the function prologue, just like always, right? We set the base pointer, now the stack pointer, the parent base pointer, then we pop EVP. That's the leave instruction. So, the base pointer will then point to whatever is in here, which is main's EVP. And then it does return, so it jumps to the save VIP. So, can we change the save VIP? No, fundamentally we can't change that save VIP. We can only change the least significant byte of the save VVP. But then after funk returns, what happens in the program? It does return zero, so main's going to return. So, now the base pointer's here, so now it happens. So, now main's going to return, so the same thing happens. So, main does a leave, which sets the stack pointer to main's EVP, does a pop EVP, which pops this save EVP off the stack, so that'll point somewhere higher up. And then it does a return to jump to this save VIP. So, how did it know, how did it get to this save VIP on the stack in main? So, not main's VIP, but the function that called main. How did it get there? Because of main's base pointer, because wherever main's base pointer points to, when main hits its epilogue, it's going to basically essentially jump to whatever's at four plus that. So, if you could control main's base pointer, where would you want to put it? What was that? In the buffer, where in the buffer? Yeah, so what if, let's say, so let's ignore the one byte problem for now. Let's say, we could put, we could just completely change this save EVP. So, we want to put it in the buffer, is there a special place you wanted? What was that? Right below, maybe here. Right below, maybe here. So, let's say we're able to change it here. What do we want to be at that location? Shell code? It's our buffer, what do we want? So, address their points to the shell code. So, the address to the shell code? We'll say we'll start our shell code off at buffer. Right? So, we overwrite it, say, we completely control this save EVP to point to here. So, then funk returns, so it's going to set main's EVP to not point to here, but instead, main EVP. Then main does its function epilogue, which does a leave, which says the stack pointer equal to the EVP, so the stack pointer's going to point down here, it's going to do a pop EVP, so it's going to put the address of shell code in EVP, and then it's going to return. So, what's actually for above shell code? Yeah, you just cause the program to crash because it's going to go to garbage. So, you actually want it is to point to, well, you need your address of shell code, and then for below that, you just need some junk, and if you set the address of, if you change save EVP to point here, now when main returns, it will do a leave, set the stack pointer equal to main EVP, it's going to be here, pop EVP, which is going to move the stack up one, and then do a return, so it will return to the address of your shell code. So, can we control all of the save base pointer on the stack? Can you go through that again? Yes. Set address of shell code and then work. So, what we need is we need, whatever we change the save EVP to, we want four bytes above that to be the address of our shell code, because four bytes above that is where it will return when it does a ret, because the function epilogic main will be a leave and then a return. But the question is, how do we actually do that, because in our previous example, we controlled, well, we controlled all four bytes of save EVP, but we actually only controlled one byte, at least a good byte. How can we do this? I think we can do it, which is actually happening. In the end, it's the most significant byte, so we have to control it over a high range of the opposite. That's the other way. It's specifically the least significant byte, so it's the highest, it's definitely not. EVP plus one will allow us to control the least significant byte, I mean, for that, I'm sure, because I know this may, I mean, very recently. So it'll do the least significant byte, let's just leave that for now. Yes. So what do we put that value to be? Because we have 256 values we can put in there. Do we put FF in there? Was that a bad idea? You'd have, I would look and see what window you have available with that. Why do we need a window available? Because you have 256 bytes that you need to know which of those, of that area can you actually can manipulate, right? If that window is in some part that you can't change and you can't take advantage of it, right? Yeah, so we can play a little bit of games. We know that we can actually change the stack pointer or we can move the stack down by changing the environment or adding more rp variables. So we can actually control it and we can use GDB to control it. But even just thinking about it, right, odds are, so odds are this value is somewhere between zero and FF. If we change it all to FF, will that get us to our buffer? So then we have 256 cases to think about in your head. So we need this pointing into our buffer. If we change this to all Fs, will that work? It's gonna be less than one before and that can never be less than what it was. Exactly, so fundamentally, if we think about it as FF, so there's only either that last byte was FF, in which case we don't move the pointer at all. The other options are values less than FF, but that will actually just increase the pointers. The pointer may need to be, by changing this to FF, we can only go up the stack, which is not what we want because the second could be what we want if we can get it to point into one of the rp parameters we control, but likely we can't get there. So by the other logic, what about zero, zero? Or just zero? Yeah, it'll be a negative offset based on however far up may need to be is. So it's likely all we need to be is four, eight, 12, another four, 16. All we need is negative 16. So even just putting zero, zero, we have a, I don't know, I can't do the math. There's 256 minus 16 over 256 chances of actually being inside of our own buffer where we want. And then we can put address of shellcode. So we put junk here underneath address of shellcode. We can actually just spam address of shellcode everywhere. So as long as EDP points somewhere in there, then we'll go to the address of our shellcode here. And so basically what was our shellcode, like 32 bytes or something? So we have 32 bytes of our shellcode, then we just fill it up with the address of shellcode, then we put the last byte to zero, and then that will hopefully, whatever, only 16 times will that fail. And then if that does fail, we can change our stack layout to try to get that to work. Does that make sense? But the key, the really important thing is, so basically, and it's also called the other technique, get wave, think of this, it's called pivoting the stack. So we're changing, essentially we're like shifting the stack, but instead of actually going up and down through function calls, we're shifting the stack down into our buffer. So it thinks, we're basically faking it to thinking that main's function frame is inside of our buffer, which means if we can fake it to think that main's function frame is inside of our buffer, we control that buffer. So we can make a fake function frame with the return address of wherever we want. Where this gets tricky is let's say, what if main did some other stuff after funk? What if it had local variables that it set or other parameters? What if it accesses RV1 or RV2 or RVC? Sorry, so the question for that, where would the shellcode be? For this, you can put it actually here in our buffer. As long as we know the address of the shellcode and we fill up the rest of the buffer with that. So what happens if after funk, in main, there was check RV2? Why not? It doesn't have a base pointer. It doesn't have a base pointer. Ah, so it has a base pointer, right? So remember, when we control save EVP, we're gonna change main's base pointer to be somewhere in here, inside our buffer. Which means that after that point in the function I called us in main, any references to EVP will be RVP. So if we do any of these kind of things or dereference things, this can cause the whole thing to crash. So you may actually have to very delicately construct your stack frame to make sure that everything actually works. But you can actually go. As long as the values don't have to be certain things, you can just put any addressable memory in there and that would hopefully work. So it all kind of depends on what goes on. But anyways, this is just trying to hint at the actual problems that can come up here. But so fundamentally though, if you can control a single byte of the EVP, you can often turn that into complete arbitrary code execution. Which is insane. And this still happens. So there's actually a, is it XM? There was a recent thing where it was a heap overflow, it was an off by one heap overflow in the base 64 decoding of a mail server. I think it was XM. Is that what it was? Yeah, that's what I thought. So it was base 64 decoding in the XM mail server. And it overwrote one more byte than it allocated. And so they actually developed a whole unauthenticated remote code execution through this by carefully constructing values on the heap and freeing them in the correct order just by using this off by one. So it's the same type of ideas and principles and techniques that apply there. But obviously the byte that you're overwriting is different, so you have to be careful about that. But anyways, this is one of the coolest things I like to love this stuff. All right, so, oh, we talking about that? Yeah, this is all just, I'm gonna skip over these diagrams. You can use this at your ledger too. But this is really just exactly what we went over. Cool, save it, dragging it, exploiting off by one. There we go. See, look at this, you guys did all this. Okay, so we should probably go over this, I guess. Cool, okay, so, no, okay, this is fine. All right, this is all just stuff about how to get this and calculate it. Anyway, we need to be very precise. But we basically already did this. Okay, so, so for loop order to close, so if the user thing controlling a lot of times a loop executes, that is a huge problem. If you, even if the user can't, but for whatever reason that loop allows them to overwrite into arbitrary memory, that can also be a huge problem. And oftentimes off by one, vulnerabilities are really tricky because they may not crash your program. Like, you may not test the input necessarily with however many bytes or the exact byte that actually crashes the program. So, it's definitely bad. Okay, cool, format strings. So this is the other cool thing. So, basically, in all the things we've seen so far, so in buffer overflows, you have continuous memory that you're overriding in an array, in an array index, you control a relative offset. But a lot of these, especially like, let's say a buffer overflow, it's kind of like a hammer. Like, has anyone ever destroyed something with like a sledgehammer? No. Maybe they're like kitchen renovations or something. It's like one of the funnest things you can ever do. Maybe they ever fight you, it's like, oh, okay, I'm ready to re-do my kitchen. Ask me all in the demo because there's something awesome about taking a huge hammer to like a cabinet while wearing safety glasses so I don't get hurt. But it's super awesome. So, I think a buffer overflow like that, if you're just, you can make a huge knot sled, you just throw addresses in there, you start overriding, you can overwrite megabytes of memory or whatever. You just don't care, you're just taking a sledgehammer and just breaking down the door, destroying all kinds of memory. The difference between that and let's say like a cool spy movie, when the spy gets there and then they have that like little gadget they put on the glass that like carefully cuts the glass without triggering the alarms and then they use that to like go inside, right? That's more like a format string. So, a format string, we'll see, you actually can arbitrarily write bytes into memory of bytes that you control and you decide what they are. So, it's like that, it's like a scowl, like you make a tiny incision to the program and you can completely, again, control the execution of the program. So, what is a format string? How does print set work? So, let's look at printf and it's two, no three. All right, printf. So, the function definition of printf, it's, again, we know it's a libc function. It returns an integer. How many variables or how many parameters does printf take in? It's one because it takes in the format string, right? So, and this isn't just, you know, it has a very specific format. That's why it's called the format string. And the dot, dot, dot in the printf means that it takes an arbitrary number of parameters. How does printf know how many parameters it takes? Yeah, so you can think of actually the format string language as a domain-specific language for printing things. And so, every time printf takes to a value, it, so think about every percent d that it sees while it's parsing that string. It knows that the first percent d is the first argument, the second percent d is the second argument. And so, how does printf know how to find those values? Goals and c's on the stack. Say it again? Goals and c's on the stack because the arguments are pushed on the stack. Yeah, so it knows the first argument, the character, the format string, is evp plus eight. And it knows that the next one after that is that evp plus 12. And then, depending on that size, it'll be evp plus 16, I don't know, I'll say 18. And so on and so forth, right? So, up the stack. And it just can keep looking up the stack. And so, this is actually, one of the reasons we talked about why does cDecl do push things on the stack in right to left order? It's so you can have functions that take a variable number of arguments because you know how to access any end argument, right? It's used, and you know how to access the first one since it fixed location. So, the first argument can tell you how to access the rest of them and how many there are. Whereas, if you have the other way, if the first argument was the first one pushed on, you'd need to know the exact number of arguments in order to fetch that last one. So, on first to printf? It is byte. So, the printf, the first argument to printf is a character pointer. So, a character pointer is the size of memory. So, it'd be four bytes. Okay. Does that probably do know how many columns? Yes. So, not by the size, but by parsing the format strings. That's what we're gonna talk about in a second. So, format strings, if you look at it, is anybody, so probably not. And you've hopefully never maybe thought about reading the main page for printf. There's actually a lot of joyful functionality in here. So, with printf, not only can you print out, let's say an integer in a value that you can understand, you can prefix values with zeros. You can, oh man, there's so much stuff here. You can, let's see, a character. See, and there's even flags that you can set. You probably did not know that. You can set a percent followed by zero or more flags. Zero means it's zero padded. Dash means it's adjusted, either left justification or right justification, assigned conversion. Let's see, I'm just reading through these things because I don't remember all these things. Some of them are actually important, as we're talking about. But, let's see, precision, length modifier. So, you can, if you've ever wondered, how do you write out, maybe a character in terms of a characters. So, you can do signed character, short int, long int, LL, a long, long int, so on and so forth. D and I are what we normally think of as decimal notation. X is actually, reading through this, I found the P value, which is actually very nice to print out pointers, doubles. Okay, so. Okay, P, yeah, P is a void star, so it prints it out as, with percent X, like zero X, so it's nicer to see. And, as we know, based on a lot of things, because the percent is now the special encoding character in this special language, we use percent percent to output one percent character. So, let's say you're writing an old school application and you're trying to make sure that, let's say your output is, I don't know, lined up correctly in terms of tables, so you output something and then you output the user's input and then you need to know how many dots to put to make sure that it's nicely aligned with everything. So, how do you know how many characters of print depth train in outputs? Possibly do. What? Possibly do. Percentage N. Yeah, does anybody actually know that or ever use that? What? Percent N in a print depth? So, the N here, so it looks very benign. So, it says the number of characters written so far is stored into the integer pointed to by the corresponding argument. That number, that argument shall be an int star, which makes sense because you're writing to it, so it has to be a memory address that you're passing in. Or variant whose size matches the optionally supplied integer length modifier. No argument is converted, the behavior is undefined the convergence investigation, who's any cycle? Okay, so why is that important? So, let's look. So, couple things. So, how does print depth, we just talked about in this video, how does print depth know how many parameters it's passed? Based on the format string, right? So, every percent, D or percent, whatever will be an argument on the stack. Awesome. And so, whose responsibility is it to make sure that the number of arguments passes matches the format string? The developers, right? The person writing the application. So, think about this. Similarly, it's kind of similar in terms of buffer overflow. So, a string cap from a user input string is unbounded to something that is bounded like a buffer will always cause a buffer overflow because it's actually can control that string. So, if we have a format string, what if the attacker can control the string that is passed into print depth? The format string, the first argument. Let's say, as anyone that's been lazy and done that before, where you do instead of printf percent s, some username, you just do printf username, it's okay. You can admit it. So, what if an attacker could do that and could control that string, that format string? What could they do? So, they can pass in a format string. So, what? If all the past percent sd might look on the stack. Yeah, so for every percent d or percent whatever that they pass in, right? Printf does not know how many parameters are passed in printf. All printf knows is that I'm supposed to parse this format string and for every percent d, I look up a stack, an argument on the stack and I keep looking up. Does printf know how many parameters were passed in? No, I just said that. Does printf care how many parameters were passed in? No, again, just like a buffer overflow, right? If cv doesn't care, printf doesn't care. All it knows is, okay, for every percent d that I see, I look up from the frame pointer on the stack and I keep printing out values. So, what this means is that, hey, I think it's one of the things that should be clear right off the bat is clearly somebody could just print out your whole stack, right? They could pass in a bunch of percent d, percent d, percent d, and print that out and steal information out of your stack. So, if you have passwords sorted memory, they could steal those passwords, all kinds of bad stuff. So, if you have information leakage vulnerabilities, this actually, sometimes you'll have some vulnerabilities where you use a printf vulnerability or a format string maybe can't get arbitrary code execution but you leak a pointer, which is then used to defeat SLR and other types of vulnerabilities. So, if you say, hello, percent s, name, here, in this example, can an adversary control the format string that's passed the printf? No, so it does not matter. They can completely control whatever they want for the name. There's no possible way that this code is vulnerable, right, because the format string itself is constant. But, if we do printf buff and the attacker can control buff, is fundamentally 100% insecure of vulnerability because an attacker can completely control the contents of buff. So, we can set buff to be percent d, space percent d, and then it will start printing elements off the stack. We can set it to however many we want. So, some cool things about printf, stolen from the man page. So, how do you reference a specific argument? Like, how do you say I want to print out argument one and not just percent d, percent d, percent d? Like, how do you say I want to print out argument three? I know how to do it in AUG. No idea if that's the same. No, I'm just gonna say no, it's not, but maybe. Actually, maybe it is, is it dollar sign? It's dollar three, and then the last few is dollar enough. Interesting, okay. So, yeah, so you can't do the last ones, it doesn't know if I'm the last one, but you can actually use percent i, where i is some value, and then dollar sign p. So, this will look up the i-th element, the i-th argument using that notation, and it doesn't cause an argument to be popped. So, you're not actually moving, so when you use percent d, you can think your popping values are looking, you think there's just, printf has a pointer, it's moving up the stack. When you do this, it does a calculation to figure out exactly which argument you want, and returns that without actually moving up the stack. Is that why printf is such a big, hidden byte function? Yes, this is why printf is a huge function. You never want to use this in like, inventing since it takes up a lot of resources. Yes, printf is, this is one of those things where you add features over time, like you think like, oh, you have this, and then it wouldn't be nice to do positional arguments, and then wouldn't it be nice to do percent n, so you can know how many characters you have. I think of this big ball nut that nobody really knows how it's done, which you can use to completely own programs. And, so the other thing, so we can use this to reference arbitrary elements off the stack. So, we can actually already use this to leak fixed elements off the stack, which is cool. We can also, so if we have a percent n that writes out how many characters have been written, if we want to use that to our advantage to write out arbitrary values, what do we need to control? That's where we need to write it. We need to, well, maybe an address, but would we want to be able to control the value of every writeout? We can use a big string, yeah. When you start talking about addresses though, you can start with like EFFFFF or FFFFFF, right? Those are pretty big strings. But, fundamentally, we want to control that value, and we can, we can use the padding notation. So, we can use percent k and p or some other x value, which says left pad with this many spaces, I believe. So, total output, we could say 10,000 characters, and in that little part of our formatting, we'll then increment the number of characters output by 10,000. And the other thing we can use is percent n, so this will, wherever the next element of the sack is, will output write the number of characters that have been written so far to that output, to that address that's at that location on the sack. Love that argument, I'll print that. Okay, so for instance, we have hello percent n, so this, if we wrote this code, so is this code vulnerable? Who controls the format string? It's a constant, the developer controls it, right? So, fundamentally, this is not vulnerable. It's doing exactly what it's supposed to be doing. The developer assuming len, len is an integer, we're passing in the address of len to get that. So, we can look at a super simple vulnerable program, so this is in main. We have a file pointer f, we do f is equal to f open slash temp log for appending, we call add log, we call f close, we return zero. Then in add log, we have a character buffer of 65,000, excuse me, characters. It, I equal zero, results, while one read in from into the address of line i one character, if result is equal to zero, so read will return zero when we either get end of file or there's an error, I believe. Increment i, if i is equal to 65536, exit, if the line is equal to new line, then basically exit this, read line and break. Otherwise, f print f to line and then return zero. So, it's a pretty complicated function, but if you look at this, you should be able to convince yourself that there's no overflow here, right? This custom loop, even though it's reading characters byte by byte, is actually doing it correctly and is not going to output any malicious characters. So, but we need to pass some standard input here. So, if we pass this standard input and we can see that every one of our input is going to be passed through f print f, every part of our input is going to be passed through f print f. And so, we can say something like, what about four a's, four b's, four c's, four d's, and then eight percent p's. So, that will, so what are these eight percent p's? So, it's everything that starts with OX in the output. So, first of all, where is this a, a, a, a, b, b, b, c, c, c, d, d, d? What is that in this output? That's the constant part of our string. That's the first 16 bytes of our format string. So, what about everything else? What's all that? A, b, a, b, b, c, c, d, d. Hex art is 24. Yeah, this is hex one, hex 41, 41, 41, 41, 41, hex 42, 42, 42, 42, 42. Although, we think about it. So, this is a call to f print f. f print f is in the, is called by the at log function. So, when f print f is reading values from the stack, where does it start from? So, let's draw the, draw the stack, stack, stack, stack. So, let's look at at log. So, at log, we know has a, all right, has a parameter of a file pointer, f and then below that would be saved b, i, p. Below that would be saved b, b, p. And then will be the huge line. And then we're gonna call at print f. But after this will be line. This is at log, yeah. This is at log and then this is going to be print f. F, and then we know this is saved b, i, p, save b, b, p. So, in terms of f print f, so f print f has save b, b, p here, save b, i, p here. It has, its e, b, p is here. Where does it think? So, how does it know where f is? No, this is the wrong point. It should be save b, b, p is here. So, where does it know where the f parameter is? The file pointer. The first argument, so it's at what? e, b, p plus a, always e, b, p plus a. So, that's where it's gonna read. It's gonna read bytes from there. And it knows line. So, sorry, it's gonna print out to f. So, it's gonna figure out how to do that. Then with line, it's at e, b, p plus 12. It's four bytes above that. It knows it's a second parameter. So, then when it starts, when it gets to the first of this execution. So, it prints out a, a, a, b, b, b, c, c, c, d, d, d. And then it prints out the first percent p. What does it think the first argument of the parameters are? Yeah, e, b, p plus, what, we did eight was here, 12 is line. So, it's at e, b, p plus 16. Right? So, that's the first percent p, and then print out increments up, and then it's percent p, percent p, percent p. So, we can actually see that this first percent zero x one, that is the i parameter in our loop. Then, we can see we're printing 41, 41, which is what, in hex, or ASCII. A, or capital A, 442s, 443s, 444s. And then what's this? The 70, 25, 70, 25. Percentage p, yeah, 25 is definitely a percent, and p must be lowercase, 70 must be lowercase p. So, what we're doing is we're essentially printing our, we've gone into our format string. So, we'll pick this up on Wednesday, but you can see that we can control these four bytes that we're printing from as 41. So, if we put an address here instead of these four a's, we can print out what's the content of that memory location. Or, well, capital A's, we'll get there. Yeah, that's all.