 Good morning everyone! Let's get started for today. I hope y'all had a very nice, quiet, reflective, re-energizing spring break. Yes, I can tell by your laughs that that's exactly what happened. Good, good. Awesome. Okay, so I want to first talk a little bit about assignment two and the unintended vulnerability. So, this diagram that Eric made shows how this process, how that files that you created, that files that you got, example.py, example.py.hash.gpg, how these were created. So on the server, the server took the check sum of the example into this example.py.hash, signed that with GPG with the secret key of the server. And then that was encrypted and that's what you got. So, the idea was you shouldn't be able to change anything in here because you don't have the secret key of the server. The way it should have worked, the server should check when you submit the hash. It should decrypt that with, so if it's signed with the secret key, then what should it decrypt it with? A public key and only the public key. And if it used the public key, then it should decrypt that and compare the file that you sent with the check sum in the file. If they match, then it executes the code. I made a mistake when I was setting up the server end because I imported into GPG the public key, which it needs to decrypt the signed private key. But I also imported the public key, because I was thinking both the keys should be in there. That's exactly how things should work. What I didn't realize is that GPG, when you tell it to decrypt something, it tries something very user friendly where it says, well, I'll just try every single key you have basically until I can decrypt it. And so if you took a new file, created a new hash, and you signed that with the public key, it would, on the server, it would use the private key to decrypt that and say, oh yeah, decrypting went successful, check that they matched, and then would execute the file. So it's definitely unintended on my part. I didn't mean for that to happen. But it's got a cool way of example of how things can very easily, you know, like the code, if you look at the code, the code is fine, right? It's like decrypt it with GPG and then check that the output was zero, which means it decrypted successfully. So I thought, okay, at that point, yeah, there's no way you could forge that because you don't have the private key. So I'll have to be more careful and probably automate how I set this assignment up in the future, right? So I accidentally have that key ring in there, or that the private key in the, but it seems ridiculous, right? Because that's how you'd want it to be, because the server should have the private key in there. That's the only place the private key should ever be. Anyway, so sorry, I was frustrated with myself. So the way that, so there's a couple of ways to do this, right? Another way was, well, you can create your own example.py, right? So how many different check sums are there? How long is the check sum? Two to 32 bits. Yeah, 32 bits, right? So there's only two to the 32 possibilities, which is what, four billion? So that seems like a large number, right? But it's actually really short, so I think you said you were able to do a billion in four hours or something like that? Like 1.7 billion? Yeah, I think it ran, yeah, it was 1.7 billion people. Yeah, I mean, and that's written in Python, right? This isn't even like an optimized C version. So that's actually one of the things I wanted you to kind of learn from this is, A, the computers are really fast, right? So your computer can calculate check sum very, very quickly. So you can go and iterate through all possible check sum values to find a collision. So that was one way. The other way the way a check sum works is it's basically a CRC32 hash, but it's that CRC32 hash, hashed in with the length of the input file. So CRC32 is very, I actually don't remember exactly how it works, but it's fairly simple, right? So you can actually iterate and try to break it, I don't know. I think it was Navin wrote on the mailing list about how he did that, so that was very cool. So you can check that out. So any questions on this project? Yeah, what the unintended vulnerability was and why it occurred? And there's no, I checked, there's no way to specify to GPG, only use this key. Don't look at any other keys. You cannot specify that on the command line. It's incredibly upsetting. Did you find any other way to do this that anyone wasn't able to find? Not that I know of, I guess I didn't. I haven't read through everyone's submission yet, so I don't know if somebody did something really crazy. Anybody do something really crazy? That was not one of those two methods. Those three methods? Yeah. I was trying, but if the assignment 2, part 2 was on the same server, using the assignment 1 part 1 data expectation, I would have let my code run and that code would, I simply fin my server and I would write a receiver which would reformulate the system.exe. Yeah, they're running on separate machines for that reason specifically. And even that one, part 3, part 2 was on a different CH route, so it was even on a different server internally and inside a CH route. So even if you broke out, you wouldn't be able to affect the rest of that server. And the file permissions were supposed to be set up pretty securely so that you couldn't mess with that original code or you couldn't inject things into the GPG key ring or something like that. Anybody else do anything? Try anything crazy? I mean, crazy, that could have possibly worked, not just like... So I figured out that GPG actually tries to depict with any keys in the key ring, so I actually tried to set up my public key using assignment part 1 on the key ring so that when I ran part 2 it would use my public key. Yes, different machines and every submission is in its own CH route and then gets totally blown away, so the next one is in a fresh CH route. Some of the changes you make are permanent or should be. Cool. Okay, assignment 3 is going to be up on Wednesday. I had a little, still setting things up because there's a lot of you so I need to make sure we have no free sources, but we'll do that on Wednesday. So today I want to get back to command injection vulnerabilities, which is where we left off on Friday before break. Somebody you refresh our memories, what's a command injection vulnerability? Put something else to run. Actually, that's a very... That's actually a very good, broad description because command injection vulnerabilities literally come in all kinds of forms. Here we're specifically looking at what's called OS command injection vulnerabilities. The idea is your application doesn't want to implement every single possible functionality itself. So it wants to call out to external commands to do things. And so one of the ways that we do this, there's many different system calls. One is the system call, which executes this command in a string by calling slash bin slash sh space dash c string. So the dash c says, hey, sh, run this command that's in this string, interpret this essentially as if this was typed into the command prompt. And so what this is going to do, it's going to open a process, it's going to create a pipe, it's going to fork this new process and it's going to exec this new process and then it's going to return all the results, the output of that results basically from this pipe so we can read and write to it. So we kind of looked on Friday right before, but so how does sh or bash, how do they know what commands we type in? How do they know what's a command? What's the program? What's the argument? How does it know? Yeah, right? So we know on the command line, if we type in ls, space something. We know that bash is going to interpret that and say, okay, that first part is ls. So use paths to try to find a binary executable called ls somewhere in the path and then exec that program and the first, actually the zero-width argument is going to be ls, the first argument is going to be that second thing that we passed in and so bash is doing this parsing and actually calling those low-level system libraries for us. So what if we can control this string that gets passed in here? So what if this string comes from us? We could run any command. It's just as if we're at the command line of that system. We could run any command. So the basic idea of command injection vulnerabilities is if I can control this string, can I somehow trick bash or sh to execute whatever I want? So what can we do if we can execute, let's say, whatever strings in here? What are some of the malicious things we could do? I'm thinking evil. We can talk about evil. You could read the ETC password. Yeah, you could read the ETC password. So if you're remote, that will tell you how many other, it'll tell you all the user accounts on the system. So you could maybe try to brute force either SSH, try to brute force their password from the outside. What else can we do? Format left. Format left. The system? Two people. I think like a virus. In biology, the virus doesn't want to kill its host first. It wants to get them sick so they spread the germs and then maybe kill them later. So we're kind of the same way. It depends on what it is. Maybe we make a bank transfer out of our account and then while that's still pending, we kill that machine so there's no record of that. So now we have the money. That could be an instance where we want to kill a machine. Yeah, so we can do Rn-Rf slash. We'll try to delete everything on this computer that this program can access. Whoever is running the system command, whatever privileges that user has, will delete all their files. It'll be pretty cool. But what if we can't control the whole string? Then what can we do? So let's look at an example. So let's say we have our C code, which is a function. And so we have some command. So this is a buffer of characters, right? 10,024 characters. And we're going to printf. We're going to use SN printf. So what's SN printf? String, not secure. Yeah, another instance where terrible naming. So this means we're going to output to a string, right? What does printf normally output to? So file descriptor one. There's another way to think about that. So what's the end mean? Yeah, limit the output. So this says the end means only output at most 1,024 characters. So it's going to do this printf. It's going to output the result of this printf into the string buffer cmd. And it's going to do this only 10,024 characters. We'll see exactly why it's important next. But for here, we know, OK, this buffer is only 10,024. So we know we should never write any more data than that. So this is the printf directive. So what's the printf going to do here? So what is the programmer trying to do? Is it legitimate? Is this like a little thing? What are you nodding for? Yeah, we can read different log files. It's a program to read different log files. The user can specify what log file they want to write. And maybe this is running a setuid, because maybe log files are protected by the user. But hey, we want to give away for people to read their log files. Maybe not write to them, but read their log files. So this is going to create in cmd. The result is going to be cat space slash bar slash log slash whatever the user types in. And then because we're really good programmers, we know that z strings have to end with a zero. So here we're setting the last thing in cmd or the 1023rd byte, the last byte to be zero. So we're making sure we're null terminating the string. And then we're going to output system calling in with this cmd. Seems reasonable. This is how we would write. So is it secure? So is there a cmd injection vulnerability here? So A, what can we, the attacker, control? Location would be reading from. Yeah, so we can A, control what location we're going to be reading from. So what can we use, maybe that we've already talked about to be able to, can we read arbitrary files with this? We can use directory traversal. Yes, we can use directory traversal. We can use dot dots to output any file on the system. So we do dot dot slash dot dot slash. That'll allow us to output any file on the system. That's pretty cool. But we don't control this whole string, right? It's not like we're passing Rv1 directly into the system. So we don't control this whole string. So can we get this to execute arbitrary code that we want? Can you just do a semicolon? Can you do a semicolon? What's a semicolon mean? It means the run the command forward and then after that run the next command. Yeah, but where is that, who's doing this semicolon nonsense? Is that Linux that's doing it? Is that the operating system? Where does that behavior come from? From batch, right? Yeah, the shell, the system underneath the system, right? It does the equivalent of bin sh dash c with that string that you passed in. So sh is will, right? One of the ways on the command line that you can execute multiple commands is with a semicolon, right? Which means first execute this command, then execute this command. So what if, let's say, you guys want to read? Let's say, I don't know, the ETC shadow file. Yeah, could be fun. Okay, so what if we run this program? So what string would we pass in to output the ETC shadow file using a semicolon? You have to be perfect. So some variable, which is maybe one string, semicolon, cat, slash ETC, slash, that's fine. Yeah, so we're going to want to execute this program, right? And then for argv1, we're going to put something in there. So we may want spaces in there, so we're going to put that in double quotes, right? And so we can just output kind of anything, whatever we want, food. And then semicolon, and then a space, and then cat, slash ETC shadow, right? So just exactly as if on the command line we said cat, slash var, slash log, slash food, semicolon, cat, slash ETC, slash shadow, exactly like how bash is parsing our command lines on the, when we're on the command prompt, this is exactly what system is doing, what SH is doing is parsing this string. So it's going to say, hey, log food, it's going to say file not found, right, because this first part isn't found. And then it's going to execute that next part and give us the root, whatever ETC shadow is, right, and it'll keep going. So what are some of the other ways that we can execute? So what if this program is like, aha, I've taken a security class, I know that semicolons are bad. So I'm going to filter out all semicolons, replace them with mg with nothing. Does that fix this vulnerability? No, there's and and you can use. There's and and, so what does and and do? It executes the first command. If that returns a success, only then it executes. Yeah, so double ampersand means execute this, and then afterwards that's successful, execute the other thing. What else? Pipe. Pipe? Yeah, pipe is output this program and then execute the thing after the pipe and make the standard input effect, the standard output of the other one. You can use, I think it's slash b, but basically create backspaces. Escape, sequence your backspace over the original command in the right way. Backspaces, I don't know. I don't know if they would interpret that because... Sprintf probably would. I think it treats that as a byte. Oh, it may, if it is using Sprintf. I don't know if it still works, I know that it still works. Yeah, probably our case is where it does work. I actually don't know if that'd be really interesting to look at. Yeah, so if we could get printf to kind of delete these characters, then we could just use backspace, if the backspace character allowed us to do that. We could put in backspaces to be here and then replace this whole string with whatever we want, and then we're not even using semi-colons or ampersands. What else? Could we place a symbolic link at that door? Yeah, so we could put, because we can control this, if we can control that, then we could put a symbolic link to make it output any other file. What about other ways we could get code to execute? What was that? If we encode the semi-colon into something, like URL encoding or something... How, what encoding would you use? Any kind of encoding, simplest is URL encoding. URL encoding. But if we think only semi-colon and not the encoded version of it, then it might execute it to decode it. But will it execute it, right? That's the question. Will bash execute a percent encoded semi-colon? So I would say no, right? Because bash predates URLs. Well, it's downloading a file. Yeah, like a software that will also have more access. Like, if you have a server, send something like Wget and some C program and then compile it through here. Wait, is that okay? We can download a C file like that. Oh, yeah. So once we're here, right, once we get arbitrary execution, we can do whatever we want. Yeah, we can download a C file just like the Morris one did, right? It downloaded a C file, compiled that C file, just ran code. We could, once we have access here, we can try to create a new user on the system, change keys around, change passwords, basically use this to do whatever we want. Yeah, we could create like a reverse shell that'll come back to us and so we can type in commands if we want to do it more interactively. But for bash, so what are all the other ways? So we talked about ampersands, double ampersands. We talked about semicolons. What are some other ways? So how would we find out? Maybe, I don't know. Yeah, it would be, those could be what bash reads in from standard in. It may be that it's, when it reads those characters, it does something, but it may not be putting them in the command. Yeah, where would you find out about that? What was that? Yeah, in the bash documentation, right? So let's do that now. Umand sh, which is linked to bash. Let's see. So invocation, so we can see this is very long. So we can see shell grammar, so simple command to command is a sequence of optional variable assignments. So we have pipes, so we talk about pipes. And interesting, I didn't realize this, right? But a pipe can be specified by pipe or a pipe and an ampersand. I think that's kind of interesting. A list, ah, there we go. A list is a sequence of one or more pipelines separated by one of the operators, semicolon, so single ampersand, right? So if we, they filtered out double ampersand, we could use a single ampersand. Or the double bar, right? So what's double, two bars? Or, right? So another way we can get another process to be executed. Let's see. Yeah, so the single, what's the single ampersand do? Runs in the background. Runs in the background, which could be nice, right? Then we may not get the output, right? But hey, at least it's still running. So here it shows all the things. So we can do compound commands. Ah, there we go. So we can use, we may be able to use the double brackets to do expressions. For, I mean, now we're getting kind of crazy. So where's, I can't really name that. Oh, actually this is important. So we can define new functions, which is kind of interesting, which we'll see in a second. Ooh, interesting, I didn't know that was. A total process. You learn something new just by skinning through documentation. Okay, there's another one. Oh, quoting. So yeah, maybe slash B would do something. That could be really interesting. Parameters. Oh, there we go. Ah, command substitution. Does anybody use backticks in Bash? So backticks is execute this command and then inside this argument put the results of that command there. So we could use this too. We could use backticks to execute whatever we want. The equivalent to backticks is the dollar sign left, per end, and then right, per end. So anything in here is going to be interpreted as a command. So, really, what's happening here is that Bash is taking in this string that the user gave it, and it's parsing it trying to interpret what does the user want to happen. We can see that the Bash language is crazy and very long. It has all these cases of different things that it will do or not do. Yeah, so let's see. Arithmetic, process substitution. All kinds of cool stuff. Oops, that's the point here. Is it dark? Okay, so the root cause here, right, is that what did the developer want to have happening here? What program did the developer want to be executed? The CAP program, right? And what was the first argument that the developer wanted to pass to that CAP program? The name of the log file? Yes, the name of the log file, which happened to include user input, right? So did the developer intend for semicolons to be used to execute more commands? No, right? And so this is the main problem, is that essentially the developer is concatenating strings together, right? So it's concatenating a string that it wants, right? CAP space bar log, and then concatenating that with the user input to create a new string that it then passes to bash and says, hey, bash, figure out what I want to do from this, right? Whereas the developer knows exactly what program and how many arguments they want to pass to that program, right? So really using system in this context with user input is incredibly unsafe because so what the programmer should do is use a different function where they can specify like one of the exec commands and say, okay, I want you to execute this program and these are the arguments of this program and that way, no matter what the user can do here, right, extend this string, change this string, that's only going to be passed as an argument to the CAP program, right? Bash is not doing reinterpretation of that string. And so this simple vulnerability manifests itself over and over again and even recently. So anybody remember the shell shock vulnerability? So back in September 2014, there was a huge uproar in the security community because there was a bug announced in bash based on how it processed its environment variables. So as we've seen, right, when processes are invoked, they inherit the environment of their calling process, right? And so in bash, you can do this as well, right? Bash can invoke new bash instances and have new environments. And this is kind of nice because you can pass variables to your subchildren and subprocesses. It turns out that bash has this crazy little known usage to where you can pass a function definition into your child bash, which seems legitimate. That seems like something that could be useful in some circumstances. And so what you do is you create a new environment variable and that environment variable, you start with left parent and right parent and then everything after that is a function definition. So by itself, that's fine, but what bash does is when bash is started, it goes through all the environment variables, looks for many environment variables that have the value that starts with these parent and then will execute that code there in order to create that function, right? Makes sense, that's how you create this function. Who controls the environment variables? Yeah, the attacker can, right? They're user control, right? And what makes this so? You can put commands in the function definition to execute arbitrary code. So you can get it to execute arbitrary code by creating these function definitions. So there's a way to, for instance, anybody, hopefully a lot of you use GitHub, right? So when you pull down a repository from GitHub, oftentimes you're actually using, so you've uploaded your SSH key, right? So they're giving you SSH access to one of their machines. Doesn't that seem crazy? Big company, lots of people source code, they just give you SSH access to their machines. So it's going to try to SSH into, like, GitHub machine. I say GitHub, let me give up. Does it work? Kind of, yeah. So actually there's a way in SSH to restrict the access that an account has when they SSH to your machine. You can even restrict it to say you've only allowed to execute this specific command. So in this way they only allow you to execute whatever the git server command is. So using this, you can actually, if you have, so SSH when it gets an incoming connection, it creates a variable in the environment called SSH original command. So if you, and that value of that command is then created by the attacker. So they could create this SSH original command with the parentheses and then arbitrary code, and they can break out of this and execute arbitrary code here. So you would try to execute something like this and this would get executed on the system. Because the SSH puts this into an environment variable and then calls a program. Because that should be safe to take user input, create an environment variable, and then just call a program. But this cat command is going to be output. So you're breaking out of there. But if this was just the case, it probably wouldn't have been such a big vulnerability. The big problem was, so we're actually going to see when we get to web, CGI web applications, except any user input. So anybody, maybe taking my 591 class, remember how are variables passed into CGI web applications? Through the URL is how it gets in, but how does it get into your program, the CGI program? Equations. What was that? Equations. Yes, supposed to get is how it gets in, but how does it get passed into your program? Close. Headers. No, not quite. Well, I think you do get the headers in your program. So a CGI web application is just a program. It executes. And it was created sometime in the late 90s, I'm going to say. So it was intended to be very broad. So you could write literally a web application in any language that you could make a binary executable. And so one of the easiest ways to pass in arguments to a program is to use the environment. So it defines specific variables. Like if you want to get the URL, you have to get the environment variable called, actually I don't remember what it's called, like URI or something like that, or URI location. It's actually a list of about 20. So doing this, right, and when the web server is invoking your CGI program, it's essentially infancing a new version of BASH, which is now going to look at all the environment variables, see if there's any with these parentheses, and then execute that function definition. So you could get remote code execution on a program just by issuing a single web request, which is why this was such a big problem and a big vulnerability through a web request. You don't have to be, even if that web request is going to give you a 404 not found, or maybe a 404 wouldn't work, but like a not authorized request, right? Even if you're not authorized for that, it's still that CGI application has to execute to tell you you're not authorized, which means it's going to parse your function that you gave it and execute your arbitrary code. So huge problem. There's a huge chunk of the web that was vulnerable to this. All right, so what do we learn? Never take user input if you can. Never take user input if you can. Yeah, it's kind of an element to the most secure computer is one that's like unplugged, locked in a room, right? No access. It's very easy to make something secure like that, yeah? But if you can help it, right? If you know what lock value wants to be output, just write it, right? Tell me if the user put it. What else? Anything else? Try to white list and not black list, like... So what I mean is that don't try to prevent all the possible ways a user can give bad input, but check for only the correct ways that you are expecting. Right, so that would be one defense, right? Or one layer, just kind of like as we said, okay, block semicolons. Well, then I'm going to use double ampersand. Well, block all ampersands. Okay, then I use bars. Okay, block bars too. Okay, then I'm going to use ticks. Okay, block ticks. Oh, I can use dollar sign parentheses, right? And there's probably still other things that you can use. Yeah. Well, in that case, you want to have a filing, so you can at least limit it based on what a file can have in it. Absolutely, right? So it should be A through Z, uppercase, lowercase, digits. What about spaces though? Spaces would come with this. Can we do spaces? Yeah, so you... But if we allow spaces, right, then they can pass additional arguments to cat to output additional files, right? Make sure these spaces has a backslash before you... Yeah, so maybe you could do that transformation, right? To sanitize it by replacing all spaces with backslashes. We still get into tricky issues. Now we have the, you know, directory traversal attacks, right? So I don't have to worry about all the dot-dots. Yeah. I don't know the term, but it's... You're giving... You're using a command that's too powerful. So maybe there's a more effective command to use than just system quality cat versus... Yeah, so that's actually a good point, right? We just want to output this file. Well, how do you output files in C? Or how do you just read the file, you open the file and you output it, right? So you can actually do that functionality yourself. But there's probably less room for error. And on that same note, right? So we saw that system, and if you look at P open, it works the same way, right? They are taking that string and reparsing, batches parsing that to figure out exactly what you mean by what number of arguments you want and everything. So we should use alternatives. So we should use the exec system call, which allows us to specify exactly how many arguments and what those arguments are, right? So if we were to use exec ve, right? Then what we pass in as rv1 there is only going to get to that program as rv1. It's never going to be interpreted as... No matter what semicolons or whatever jump we put in there, it's only going to go as the argument to that function. And it's not going to be interpreted. So yeah, it's in a sense... These programs are very... The system call is actually very flexible and powerful, right? It does some cool stuff for us, and you can do very cool commands. The problem is once you allow the attackers to influence what commands you execute, they can usually trick it into executing arbitrary things. And yeah, so we should always, always, always be thinking about sanitizing the user input. Users... Users are dirty, right? Make sure you sanitize all the input. So another thing, right? And this is going to come up again when we look at SQL injection cross-site scripting, right? When we get to the system call, does system know what part of this input string came from the developer and what came from the user? No, right? Whereas this string came from... This part came from the developer, right? cat space slash bar slash log. So that should be safe, right? The developer's doing it. It should be safe. But it's that other part at the end that's unsafe, right? But from systems perspective, right? And once it gets to sh, system just calls slash bin slash sh dash c with that parameter. So when it gets to sh, sh has no idea where these bytes came from, right? They could come from the user. The network, it could come from a file. It could come from, you know, a hard-coded string, right? So this is another problem, is you think about it. So, yeah, okay. I don't get into it too much. We'll revisit this again, this concept. Okay. So, now we get into kind of the root or one of the coolest, oldest vulnerabilities in binary applications. Something that's, we saw the Morris worm used buffer overflow of vulnerabilities and there's still, these types of overflows are still prevalent today in today's binary systems. So this is why it's really important to study and learn these things so we can understand what kinds of essentially memory corruption vulnerabilities can happen. So these are, as we'll see there, I think in them separately, than what we looked at before. We looked at kind of how to manipulate file access, right? By using directory traversal attacks and sim links. We saw the command injection, which were able to influence some other program to execute arbitrary commands on our behalf. This class of vulnerabilities is really about the attacker being able to change memory and alter the execution of our program itself. So we're going to look at exactly how this happens. So, what happens when you copy one buffer to another in C? What are buffers in C and C++? Just arrays, right? So it's an array. Yeah, just continuous set of memory, right? How do we know the size of that array? Who specified the one? What was that? Yeah, so we saw it on that last example. We declared a command, that's 1024. So either it's at compile time, or the programmer says, this is an array of characters, it's only 1024 bytes. What's the other way we can allocate memory in our program? Maloc, right? But then how do we specify the size of that? Cast it into the command, right? Maloc takes in a number of bytes and returns the address of those bytes in memory. So we could call Maloc with 1024, and that'll give us 1024 bytes so we can do whatever we want with it. So what about when we write to an array? Who checks that we're not writing more than 1024 bytes? We have to, the programmer has to, right? When you're writing Java, Java internally stores the length of the arrays. And so every time you write to an array, when you say, hey, write to index blank, it's checking, is blank greater than the length of the array? If it is, then it stops that, right? Blocks that right. Is anybody checking the length of your indexes where you're writing to and seek? Absolutely not, right? I mean, I guess it makes sense historically because if you're checking every time you're doing an array right, that's as a performance cost, or a performance hit, right? Because you're slowing your program down if conditioned every time you write into memory based on an array, which could be a lot, right? How many times you iterate over an array doing something to an array? So if you're iterating over an array of 1024, right? Every time you write to that array, or read from it, right? Every time you write or read, you have to check that length. And so C and C++ applications don't have boundary checking. So they don't check the bounds of the arrays that get written to. And so this leads to, as we'll see, basically overflows are writing more than the size of that buffer, the allocated memory for that buffer. So once you're able to do that, we'll see how you can control the program's execution. These are very tricky attacks because they're very architecture and operating system, even version dependent. So these will typically, a specific exploit will only work on a specific version of the operating system with a specific compiled version. They are very cool because they're going to be executed both locally while we're on the system, but also sometimes externally if they use external user input, which can be super cool. And we'll see that they really, our goal as an attacker is to modify the control of the application to get it to execute whatever code we want to do. We can also, a new type of these attacks that are happening is, we can also change the data of the program. So the program has a bit that says whether we're an authorized user or not, and that's at zero, and we're able to overflow some buffer to change that to one. Now we're an authorized user. We've tricked that program. So there's actually a lot of work on automating this process of doing these buffer overflow attacks, which is super cool, but we're going to be doing this by hand. It's really important that you learn how to do that. And so there's actually a lot of research into automatically finding these vulnerabilities, automatically preventing them, and developing detection mechanisms. So there's this arms race where we develop a new, we say okay, they're exploiting the program like this. So now let's do a new type of defense. Let's add, I don't know, SNET, Stacconaries or ASLR or whatever thing you want you've heard of as defense as we'll talk about. Let's do that, and then people do that. And the attackers are very smart, so they get smarter, and they say, oh, but now I can do this other thing using this same vulnerability. And so the defenders keep trying to go up and up, but attackers have a financial and, I guess, a weakness incentive to develop cool new attacks. So attacks always are getting better, and now they're incredibly complicated. And actually, so some of this research has gone into actual quality operating systems that's on all of your phones, devices, and operating systems. So to understand this, we first have to understand the stack. So what's a stack generally? Data structures, stack, yeah, last thing first out, push things on, pop things off. What is it used on for x86, for programs? Store and return values. Yeah, so it's essentially, I have to think of it as scratch memory for functions. So each function, their local variables are stored on the stack, parameters are stored on the stack, and this is how we can have recursive function calls. So I hope, this actually, I'm going to go through this fairly quickly because this is review from my undergrad 340 class, because we talked about the stack there to understand exactly how local variables are laid out in memory. So this is where this comes from. So all my stacks start at high memory addresses and grow down, right? Just like we saw, the process space starts at high memory and goes down. That's completely arbitrary, right? You can start at low memory and grow up, but whatever reason I got stuck in this way, so this is the way I'm going to do it forever and so will you. Or at least for this class. So functions can push registers or values onto the stack or pop the values from the stack into the registers, so this gets back into the x86. So we're going to look at actual x86 code. So in x86, ESP points the address of the top of the stack or the bottom if it's growing down, right? So logically it's the top, because when you pop things off, it's going to move, and when you add things, it's going to move, right? So the instruction push ex first decrements the stack pointer, right? We're just going to move it down in our thing because we're going to go from high to low. It's going to decrement it and then store the value of ex at that location. And conversely, pop works exactly the same way. So it's going to take that value that's at the address pointed to by ESP, put it into ex and then increment the stack pointer by four. So let's look at what this looks like. So we have our stack at top, we have low memory at the bottom, and it's just a bunch of bytes, right? So there's a bunch of stuff in here. So the stack pointers, let's say here at ex 10,000, below the stack pointer, right, is garbage, right? It has absolutely no meaning to our program. Obviously if we stop, if we go down far enough, we'll hit the heap, but for now considering just the stack, right? Anything below the stack pointer means nothing. So if we have a function like push ex, pop evp ex, if we have registers, if the value of ex a is in ex and zero is in ex, and the stack pointer, right, has the value of 10,000 hex. So the only way we know this that 10,000, so the way we know that this pointer here to the stack, right, the top of the stack, is here is because there is the value 10,000 hex in ESP, right? And so if we said the stack started somewhere else, then there must be a different value in ESP. So this pointer, this arrow just represents the current, where ESP is currently pointing. So if we do push ex, right, to execute that instruction, what's going to happen is ESP is going to be decremented by four. So it's going to be fffc, eax, so it's going to move down, right? And then eax, which is, this is 10, decimal 10, so hex a, that's going to be copied and put into there. And now if we do pop evp, right, we're going to say, okay, take whatever ESP points to whatever the value is there, which is hex a, copy that into ebx, and then increment the stack pointer by four, which means it's going to point to here. And so essentially we've been able to copy the value in eax to ebx by using the stack, right? Now we know that fffc, there's now the value 10. We know that, but it's garbage to our program, right? At this point, it's essentially been deallocated. We don't really care what that value is. Okay, so thanks for being patient. All right, on Wednesday we're going to get into, we're going to finish all of this background material so we can understand exactly the stack and what's happening so that we can understand buffer overflow vulnerabilities.