 You don't want to touch it next to you. Hello. Hello. All right. Looks like you all made it. That is awesome. I barely did. So now we have a very nice projector with a screen in a strained room that will be here for a day and then we'll be back for one day on Monday for our regularly scheduled program. So on Monday you have, and I'll send out a bio before class, but you'll have a guest lecturer on Monday, Joaquin Fuentes, who is a, leads a penetration testing group at a company called Early Warning here in town. So he has hired, I am right now. Yes. Consider this your warning. So Joaquin is awesome. He's an amazing speaker. I've seen him speak many, many times. He's done both physical penetration tests where he has to break into buildings in order to get to a certain floor and approve that he can bypass their physical security as well as like digital pen tests and more of the things we think about of security, computer security pen tests. So he's going to be awesome. He's a great resource. Those of you who are interested in a security career, he is a hiring manager at his company, so he knows what things and what skills it takes to get a job. So that's going to be awesome. So that'll be on Monday and then on Wednesday we'll have a CTF. We'll be back in the room that we were in for the original CTF. What is it? EDC 111? 117. 117. Yes. But we'll send out information about that and I want to kind of prime the pump for now. And I'm super disoriented because none of you are in your same spots. And I'm on the other side of the room too, so it's like everything's backwards now. All right. So let's dive in with the time that we got left and figure out what we... We've got some really cool attacks to get to here. So we've been talking about a number of vulnerabilities that can occur when a C code is... We looked at file system attacks, so we looked at what kind of attacks can occur against the file system. We've looked at how manipulating the environment and attacker can manipulate the environment of a set UID application in order to trick it to do things that it's not supposed to do. The other major class of vulnerabilities, so command injection, we're going to look at it right now in the context of binaries. But this is a class of vulnerabilities that exists in all kinds of places, in the web, in a lot of different locations. And the idea here is we've talked about, so how does the system libc function work? But how does it do it? What does it actually do? So you call system with some command. What does it do? So I'm going to say, you're going to have to write your name. I only blew it in the top left. So it runs that command shell, and then it returns the... So it actually ends up calling execve of slash bin slash sh, sh dash c, and then the command. So those are the exact... That's including arg0. So arg0 is usually the name, so that's sh. So the idea is what happens... So we've looked at cases like file system. So we looked at what happens if an attacker can control part of the path where you're looking to open a file. And we saw that with using characters like dot dot, that are special in the context of looking for files in a file system that the attacker can trick you into opening files that you didn't think you were opening. And that's really the theme behind a lot of these files of vulnerability. So for instance, here for command injection, the idea is what would happen if you take user input and you pass it to construct the command that you're creating with system. So when would this type of thing happen? You're writing some application. You always had to put yourself in the mind of the developer. When would they pass user input to a call like system? File names, maybe? File names? So what program do you want to call and pass a file name to it? Let's construct an example. I don't have to keep. Like a contact book. Like a contact book. So some kind of contact book application that you pass in a username and it may be, and it has a search functionality, so it's using grep for, this would be two cases, one, the name that you're looking for and two, the search terms are part of that grep command. So when you're writing this, you'll write, usually a developer will, right, just a second, I need to pause this. Okay, so the idea is we're going to write something like system, we're going to do, so we'll put it at the bottom. So we want to do some command and so how do we build up command? Well command is basically going to be, and now I'm going to write pseudo code, so I'm not going to write C code because I don't want to have to do all the string copies. I'm going to pretend we're in a language that has string concatenations as a default functionality. So let's say we want to grep, so what's the syntax for grep? So we want to grep-r, what do we want to search for? We can do dash e and then we will concatenate this with, let's say there's some search variable, some search variable that came from the user and then we're concatenating that again with, what's the file we want to grep? Skeleton, wait what? ETC Skeleton. Okay, sure, we'll do ETC, well, let's do like var for address book. Okay, so here we have our command where we're concatenating strings together in order to create the value that we want to pass into system. So the first part of command is the constant string grep-r, space-r, space-e and then appended to that is search and then which is some user input which is the search term that the user wants and then after that we're appending the folder we want to look in which is like var slash book. Is it difficult to see the screen with the lights on it? I see heads shaking in no direction, raise your hand if that's not the case. So let's say actually let's make this more clear instead of search we'll say this is argv1. So argv1 is going to be the first argument that's passed into the function on the command line. What's argv0? Name of the file, it's normally the name of the file but we've seen when you call how you execute, what's the function like a system calling it's invoke when you execute a new process. Execve and what parameters does it take? A filing, so an absolute path to a file and then what? An argv vector and then an environment pointer vector. When I say vector I mean a character pointer pointer. A pointer to an array of character pointers. Both of those are an alternate. So when you go through it you can actually you'll see that the execve doesn't say anything about argv0 having to be the name of the program it can actually be anything because whatever is invoking it when you use something like system when you use a limc command it's going to make sure because that's a convention but it's not a standard it doesn't have to be that way. But when you're writing code and you want to use the first argument on the command line you're going to use argv1. So what characters are special in a... So and we know, okay so we just said let's go actually a little bit forward so this is the user code right so we know that this code is written by the user right is in the program is compiled into C code and we know based on our study of the man page of system we know that system will do a bunch of stuff and then eventually call execve with bin slash sh and then it will call the argv vector will be I'm going to draw it like this so it's like brackets so a bracket syntax so the first argument will be the string slash bin slash sh without the trailing 8 slash then it will be sh and then I believe dash c as the second parameter and the third parameter will be our cmd and whatever it will be for the environment I don't know I think it will copy it from whatever the current environment is So what does bin sh dash c do? What does it normally execute bin sh? It opens up an interactive shell it's essentially the shell that's what sh stands for most systems are linked to dash or bash so they're essentially actually the same thing So then what does the dash c apply to do to sh? What was that? Does it prep it for a command? Yeah so it means that the next argument after dash c is some command executed as if you executed this on the command line and then exit So I think we can look at this real quick Let's see if my host of course this doesn't work 40 154 so we're doing man what is sh? So it's definitely lowercase c based on what I just saw So dash c says read commands from the command string operand instead of from the standard input special parameter 0 will be set from the command name operand and the positional arguments $1, $2, etc So what this means is that the shell will parse whatever that string is according to the shell parsing rules to figure out what things to execute So what happens if we put let's say not this example what happens if we put system ls how does sh know which ls to execute What was that? which environment variable uses path So how do you execute multiple commands in one line on back semi-colon multiple ways one is semi-colon So you separate each of the commands by a semi-colon what are some other ways? So double ampersand right so like an same thing with double pipes is an or what else do we say somebody said something else there's backticks have you seen backticks what does that mean? not quite I'll get the whole screen back this is awesome we've done this earlier so we can do ls so what does pwd do? present working directory and how does it look this up? yes it uses the current working directory of the executing process so that's how it figures out the current working directory so we can do pwd if I do ls backslash pwd essentially what these backticks do is it means to insert the result of calling pwd and put it and execute it as the first parameter to ls so it'll be ls so this should output we'll do dashla so it'll tell us where we are so this is the same thing as ls-la slash home slash atomd with these backticks so it's doing a substitution of the resulting command inside that result so how does the shell or bash do this? it's just magic stuff? how does it know what execcve how does it know to in this example how does it know to exec eventually call execpe with pwd and then take the result of that and use that as the rv1 to execls when it had to look up all of those in the environment too it was a complex way to see improvement how? isn't it looking for those special characters and then acting on the fact that you see backticks or ands or double pipes what if I wanted to list out a directory that was backtick pwd backtick why? why does it know why is there a difference between backslash before this to who? the system the system? the system the shell because it has to you've taken 340 or compiler courses and some of you took it with me some of you are probably maybe taking it as an efficiency course right? what sh is doing is it's all about parsing it is parsing based on the syntax so the bash input has a very specific syntax it parses your input string into its constituent parts so it can figure out where are the semicolons so I know there's 2 different statements where is the backticks? what about so I actually don't know if it's in here if we do mansh there may be a grammar somewhere so there's a whole section in the man of sh called lexical structure so I knew that this is a language and sh has to parse this language in order to understand what the programmer wants it to do so because of this when we go back to our example what's the intent of the user writing this command? back to our grep they want you to execute not just a single grep command but what do they want all the parameters to that grep command to be how many so they want 1, 2, 3, 4 so they want 4 parameters to grep they want the first parameter to be dash uppercase r how does bash know where the parameters are? isn't dash another reserved character? that's only convention when you write a program you can use anything you can use pluses you can use what it's looking for for arguments it does but we saw that execve takes in an rv vector that gets passed in the program all types of commands have kind of a parse tree yes but how does it build that parse tree how does it know to separate the grep from the dash r so that grep is rv0 and dash r is rv1 spaces so this is again part of the syntax because the syntax of the sh command say that the arguments are separated by white space what if you want to include white space in your argument you have to either escape it you can use backslash space and that will be passed in directly or you can include it into double quotes and then you get a whole host of other issues like what if you want to include a double quote inside of a double quote so then you can use single quotes instead but what if you want to use a single quote inside of a single quote so there's a whole escaping scenario for that this is actually a incredibly complex language so it seems like I'm so happy there's a lexical structure here so I can just scroll through this a little bit you can see quoting, backslash, single quotes double quotes, reserved words just like a programming language aliases, commands, simple commands redirections look at all these types of ways you can do redirections this is actually so it's often very nice to actually look at these things from time to time and refresh yourself on all the different types of redirections the one we're most familiar with is redirect output and then append but there's all different kinds there there's search and execution so it's telling you exactly how it's going to look for the programs probably it's going to talk about the path path search another thing we talk about command exit status what does it mean for a command exit status complex commands this is what we talked about of using semicolons or pipes pipelines, what do pipelines mean we didn't even talk about and the crazy thing is an ampersand character can mean different things depending on the context that it's used in so in this command where it's command 1 2, right angle brace that means redirect standard error which is file descriptor 2 to file descriptor 1 which is standard output but an ampersand by itself after a command means run this command in the background and return immediately super complicated and so essentially what's happening here rather than the programmer whoever wrote this code that we just wrote instead of telling execve or the system to say hey I know this is the program I want you to execute grep so we already talked about how this is a problem because of path injection because of which grep but even if we put that we're basically relying on sh to parse our string later into what we mean for the different arguments so the developer who wrote this code wanted there to be 4 arguments 5 arguments to grep the argv0 being grep argv1 being dash r argv2 being dash e argv3 being what? yeah the search string argv1 from the original program and the 5th one to be slash bar slash book argv4 which is the 5th one the question so now if we can control argv1 what can we do we can add more parameters to grep by adding spaces so if this was somehow sensitive right in the sense that it's searching for a password if it doesn't find that password it doesn't give us access we can just add spaces after our search term to give it the directory to do the grep in what else can we do yeah so using semicolon we'll end the previous command and allow us to start executing our own command what else using fact slash ah you already answered somebody else now nobody wants to answer you can steal his answer we didn't hear all of it we didn't hear all of it see there you go so now come up with a new one you can re-root commands into different files and you can start creating files of your own yeah we could so not only with so we can create our own command on the system using semicolons we can create a new background process on the file system for instance listening on a specific port and giving us what's known as a reverse shell like waiting for us to connect to it or if that's not a reverse shell that's a normal shell we could even create files on the system and we can completely control the content of those files on the file system because we can do something like echo whatever string we want and redirect that to a file and that will create files on the system of our own content so if this was the web server or something we could create PHP files on the system that allow us complete control over it we can delete any file that the person running this command or the application running this command essentially we have full control over this system and we can do anything with the permission of the person running this command so what if I so what if instead of writing system the developer had written execve I actually don't know where grep is which grep is it bin grep slash bin slash grep as the first argument and then now we need character pointer array so then we do slash bin slash grep as argv1 and then we do we do dash r and then we do dash lowercase e and then we do argv1 and then we do our book and let's just put null for the environment because I'm tired of writing so so now what is the operating system going to do with this execve system call that we just wrote grep that's going to move grep and what are the argv what's the argv vector it's going to pass into grep what if we add spaces into argv1 yeah grep will figure out what to do with that but will we be able to let's say here we have we have five parameters right so here here we have five parameters can we add spaces to trick six parameters to appear into grep why not what's the fundamental difference between these two examples I sketched a white space a space as the a category that separates it all comes down to the fact that execve is the thing that essentially creates the process invokes the process and passes the environment variables essentially when we call system we're telling sh to do the parsing for us and then call execve if we call execve directly we know the operating system is going to do no parsing on our input the operating system doesn't have to think about the different arguments because you've literally already broken it up for it it's not going to go and re-parse things because that's an sh thing so this is actually something to think about when you're on the command line you're actually talking a bash you're not even really talking to the operating system the operating system doesn't care that spaces delineate arguments all it cares about is what gets passed into execve so this is the idea behind command injection if you're building up a command by concatenating strings together and the adversary or attacker can control one of those strings then they can completely execute any commands on your system so this ends up in a lot of places system p open if it's not if anything is doing any additional parsing this can even happen in custom code you can have custom code that is looking for spaces and breaking it up into rd vectors so here's a simple example so here we have a main function so we have int main int argc character pointer argv and then inside this we have a buffer of command of 1024 and we're doing an sn printf which is a printf with a fixed length so the output will not be larger than 1024 then we print to command and we print cat slash bar slash log percent s and then the final argument is argv one so the idea is printf will then substitute inside that string and copy into the cmd variable on the stack the string cat space bar slash log and we zero it out because we're good programmers and then we call system command this is something that what you'd want it to do is that it would cat out whatever log file you wanted this is what we were talking about it's actually very similar to the example we were talking about you can compile this run it and you can do something like foo semi colon space cat slash egc shadow and it will give you the entire egc shadow file right if this is a set uid program which is owned by root so you can read any file in the system you can create any files on the system and you can execute any commands that you want on this system so this is on terms of severity this is super very bad always always a vulnerability cool so a real example of this that's super interesting is shell shock so does anybody remember hearing about this back in 2014 so in September of 2014 they found a new bug in bash based on how it processed environment environment variables the idea was a bash program could essentially pass its environment to another invocation of bash and so what bash would do is every time it would start up it would look through all the environment variables see if there's any function definitions in the value of the key value store in any of the environment variables and then it would essentially execute that and interpret that as a function so this is a way of passing a function from one bash invocation to another which sounds like it maybe was a nice idea but it had to have a certain syntax so the variable started with two parentheses then it would assume that the rest of it was a function definition and the function definition was passed to the bash interpreter doing exactly what we talked about of doing this and so you could append commands to this function definition to execute arbitrary codes just like we saw so why is this a big deal or is it a big deal bash run as a so is bash set you ID yes no sorry it's you ID that's the entire point the bash is running with your your user ID it's owned by a root but it's running with your permissions so doing this yourself isn't interesting because you're already you're running bash, you're already executing manage, so is this even useful I hope so I'm talking about it four years later it has to be a hit or maybe I just never update these slides is anybody use github what are the two ways you can check out code from github don't know that this github HTTPS and SSH how does SSH access work isn't SSH the same thing you use to get into a remote server yeah but what happens with SSH to a remote server like general yeah it checks the keys and then what does it give you it gives you a shell which means it's executing bash and it's setting the standard input standard output of bash to your terminal right so it's creating a new terminal for you with bash but here now you have SSH access to github, does github want you to have an account on their systems no absolutely not, they do not want that they don't want that hassle but why can you use that to check out code check out a git repository that's shell, reporting something to you so you can use shell shock and add in more commands to your runs yeah we're talking about why it's interesting so in your SSH authorized keys files you can actually specify that only certain like when somebody SSH is based on their key they can execute a specific command and only that command however it turns out that bash was being invoked and it turns out when you SSH you can pass environment variables to be set in the bash on the remote system so basically you can get full execution privileges as this user on all the github machines and I mean so this is anybody who offers any kind of we'll call this like a limited access SSH account so you can break out of that access the other case that this comes in so we're not going to talk about CGI applications for a bit but CGI which is the common gateway interface is a really interesting web API basically the idea is you can write a web application in any language you want as long as it's an executable and so there's a well-defined API on how an HTTP request is translated to your program and it's like the standard output of your program becomes the HTTP response that is returned from that request so it turns out that all of the I believe it's all the query strings maybe something else maybe the arguments they're all passed in as environment variables to invoking your program in CGI and when it invokes this it's invoking bash is getting invoked and so you could make a request to any CGI web application and get full command injection from without any authenticated web request so this was a huge deal yeah so you could execute arbitrary code through a web request and it caused a lot of panic in the security and the web because now you have to update all these systems and I believe that this bug was around for like 20 or 25 years like bash had this functionality for all this time and nobody noticed until a few people started looking at the code it was another case of essentially user input being parsed and executed as a command so don't use system MP open there especially with untrusted user input so let's say let's go back to our example that we had of execve been grabbed sorry the system command can we make this safe? and if so how? it's a little tricky though because then we have to implement all of bash's parsing logic in our program just to try to see if that thing is safe you could just make sure rv1 is alphanumeric yeah so we can use our whitelist approach and just make sure that rv1 is alphanumeric which I think should work because none of the alphanumeric characters will allow you to execute any commands but what if we want the users it's a search term so we want them to use spaces to look for maybe two keywords at once like foo space bar you want to return all documents like foo and bar like when you're using search on google or gmail so we have to then take a blacklist approach and then try to filter out all the things that are bad so we talked about one thing we could try to do is we could essentially try to add double quotes slash double quote and a double quote so we can try to add double quotes around I think we need to be switched it's double quote and then slash double quote so we can try to add double quotes around that string which means that spaces are no longer a problem because the spaces will be parsed as saved as double quotes as part of that single argument but then is this foolproof so we can change it like this so I sit back and I go it's done I fix this new line is tricky I don't know how bad parsed that that can be a way around it so it'll be another way around you usually could write a single quote double quote in the middle of the string then your resulting string would be grep-r-e double quote and then a closing double quote that we input and then a semi-colon and then whatever we want to put and then another semi-colon and then some jump after it that maybe won't parse but you can fix it up to make sure that it does parse or you can use new lines or some other kind of tricks so this won't work so there is if you want to do this there is so the other way is we could use essentially double quotes around the input and then call a function that is known to be good that sanitizes and changes all the the strings that are important in a double quote context into their escaped equivalents so for something like a double quote it'll change it to slash a double quote so that it can't break out of our outer double quotes possibly it would also do that with new line it would escape new lines as well but so that would be sanitization so the idea being is we're trying to sanitize the user input I don't remember the function in all the languages but there is most especially C doesn't have it because it doesn't have anything but most web languages have a function to sanitize the input but you have to read the documentation very carefully because sometimes they assume that the argument is surrounded by double quotes so if you don't have that you're going to have massive problems or sometimes they'll add the double quotes for you which can also get you in trouble so you just have to know exactly what's going on and test it there but fundamentally the best way to go is to avoid using system p open any of these functions and use exact be and call one of the exact family of functions that isn't going to do any parsing and so you know there's going to be no parsing involved the arguments because like we said in the exact be example we had literally whatever they put in for our before they can put in whatever garbage they want with unicode characters and new lines and double quotes and it doesn't matter because it's not being parsed it's just being the operating system just passes that to grep and then grep has to interpret it so alright so now we get into the most classic of all security vulnerabilities so the buffer overflows overwrites and it's going to be super fun and this will be like a hundred slides but I like saying that because there's like 70 slides of animations if not more to kind of illustrate what's going on so how are so let's start at the high level so how do you write buffers when you're writing C code what is a buffer? stupid thing on YouTube won't load your video wait a fixed size memory location so how do you specify it in your C programs what was that? arrays yeah so you say whatever type memory type you want in that array and enter a character I actually don't know exactly it's on my head it's like character name a variable and then brackets with the size so if you put that as a global variable what does the compiler do when it compiles that program yeah so don't put it in the global so it'll create a single address and it will allocate exactly that much space and it will be in as we saw the BSS segment right what happens if it's inside of a function so you define some function and inside there you have a character buffer of 50 or something does that get put in a fixed location in memory it gets put on the stack so we'll actually see exactly how this happens but essentially the buffer is going to get put on the stack and so when you're writing to this buffer where's the length of the buffer stored it's stored nowhere I guess technically you could say it's stored in the compiler at compile time but once it's compiled because you can use things like size of with a buffer but that's only at compile time and once it's there right it's just a memory address and we saw that the CPU just takes things from memory addresses into registers, computes on them and then copies them back into memory so it's important to understand that there's fundamentally nothing stopping us from let's say writing a buffer of length 50 and then saying the buffer plus buffer bracket 100 is equal to 20 the other thing we should get over is so let's say we have some C code we'll have a character buffer called foo of length 50 alright so if I say star foo star then in parenthesis foo plus 1 is equal to 10 what does that mean? first one so foo is at some memory location we'll add 1 to that and then it will use that at that memory address and write 10 there which is a character so it's 8 8 bits yeah, byte, okay I wanted to say bytes what does this mean? foo bracket 1 is equal to 100 what does that mean? the same, what's the difference between the two? absolutely nothing there's zero difference actually the second syntax foo bracket 1 is just syntactic sugar that gets translated to the other way this is why if you want to do super weird C code you can do one bracket foo set that equal to whatever you want, 1000 and that compiles just fine there's no errors here because it literally just compiles it to the other thing and the other thing in addition it doesn't matter which order that happens in and so that's why this happens so anyways this is an important thing to get out of the way and I can obviously write I can overwrite it's just pointer arithmetic which is just memory address arithmetic so I can do 100 bracket foo set that equal to 10 and what's it gonna do? is this gonna throw a compile time error? probably not depending on our compiler is it gonna cause a runtime error? why? not necessarily it depends on what is foo plus 100 if it's been allocated to my program now you're making me want to write a program that does this so we're writing int main and then we have character foo is this the right syntax? like this right? and then we'll do character foo a 50 buffer of size 50 we'll do since I told you you could do this we'll do foo 100 equals 10,000 oh no that's not right we'll do capital A and then we will return 0 alright I'm doing gcc-waltest.c so interesting yeah it's a warning but it's saying foo is set but not used but we definitely used foo interesting I guess operator to operator overloading I think it's trick so that's C plus plus so C doesn't have any concepts of operator overloading so it's literally just a syntactic translation from one to the other the compiler doesn't even know how to handle it essentially it just first translates it to this other format and then it compiles it so that way they're essentially the same thing so I can execute a.out it doesn't crash no seg faults nothing it's crazy right but I just overwrote a buffer but why didn't it crash I didn't access memory that wasn't that I wasn't allowed to touch let's uh this is actually fun let's um we will take our nice character buffer we will declare another variable I and then we will do 4 I equals 0 I is less than we'll do a really big number I plus plus and what we're going to do is I bracket foo is equal to B and we want to know where we crash at so we'll have to do printf percent D slash N I so this will print out what iteration we're on because we want to see how much we can go past this and we can actually we can do some other cool stuff let's look at we'll first print out printf percent P yeah percent P is looking at pointers so this will give us the address of foo where it's executing at and the percent P is nice because it gives us like 0x so it looks like a pointer so we can print this out and then we will access it the other thing we have to make sure we're doing is flushing this buffer I think the man page 0 will flush everything says other atom so we'll see if he's right I do yes not after this I want it before this sorry I don't usually use the vim but I think it's the only thing I have installed here so is that you or me alright let's see where this crashes I need to include standard IO come on where was nobody got that no warnings even this time so we can access and if we go all the way up wow I shouldn't have done like this okay scrolling for days you can tell everyone you watch your professors scroll for a long time so you can see that and this is tricky okay let's actually change this I want a 32 bit program now I'm going to pipe it to less because I learned my lesson so we can see that foo is at this memory location and you remember we looked at the memory layout right and I mentioned that on 32 bit running on 64 it's going to start at the top everything starts at the top so we'll first have the environment which is actually going to be in there and then we have our vdata and then we have the parameters rc, rv, environment pointer and then we have we'll have this buffer so that is memory location so we can access all the way down to this time 7000 past May before we get a segfault which is pretty cool if you want to dig into it more and see exactly what that address was you could run so this is gdb but I'm running this gef plugin that you can look up which gives you this nice output here so I can see that it's segfaulting here when it's trying to copy uppercase b which is x42 into eax and you reference it and the thing you should look at is this is the other syntax but if I say if I print out slash x percent eax is it dollar sign? yeah so we can see we can access up to fff f 4f's and e and then 0 so that's how far we can access here if we look at the memory mappings we can see that that was the segment that was mapped to us and the other stuff is not mapped to us which is why we get a segfault so fundamentally what this means is we can write code where we define our applications that a buffer has a certain size but there's nothing in either the run time or the compiler that enforces bounce checking that we're not accessing a variable that's outside the bounds of what we've declared why is that? do you have to any program in java? do you use arrays in java? do you worry about writing outside the arrays in java? yes? yeah so arrays in java if you try doing this example in java it will throw an array out of bounds so you can try to do it and you as a programmer have to be cognizant of the length of the arrays but if you try to access or write to somewhere out of those bounds because in the I don't know if an array is a class technically but in every array access it first checks is this access within the bounds so what's the downside there? what was that? you have to remember your length which takes space it's slow you have to do this check on every single read and write operation to every single array you think now that's something we don't even think about but back in the 70's and 60's when they were writing C this was a huge deal so how do we get around that in C? so we don't actually store the length of let's say strings right? so how do we know the length of a string? so we have a standard where we say that strings are null terminated so essentially a character in a string could be anything from 1 to 255 but you can't have 0 because 0 essentially signifies the end of a string so this is the key problem so this 1 and this is actually kind of crazy when you think about this so this 1 essentially you think of language it's a runtime design decision that boundary checking of array access would not happen at runtime because of that there's been huge, I mean this is still a massive problem today in real application they find these kinds of things everywhere oh a good sorry I just remembered going back to command injection there's a really good bug recently where I think it was KDE environment that had USB drive that had back takes in it it would execute that as a command because it would for some reason pass the name of your USB drive to something that called system which would then get you to execute anything on their system so I'll find that link and I'll send it out but yeah this is a really good example of this case coming up in weird context anyways back to overflows and overwrites so the idea is as we'll see so if you think about it essentially what this means is if you let's say are iterating over an array and you use the length of that array not as your own value but as an attacker controlled value then the attacker can get you to go beyond the bounds of that array where you can start either reading or writing arbitrary memory of the application and that's really at the core of what a buffer overflow is in its essence and the ability of an adversary to essentially think of as changing memory in your program and rather than just crashing it because as we said yes an availability attack yes that is an availability attack yes you got it to crash but it's much cooler to be able to take over that machine so we'll see that by specially crafting what we change and how we change it we can actually get it to get complete control of the program I don't remember the details of that vulnerability I'm going to go ahead and say yes because it probably is it's either this or a heap overflow but heap overflows are just overflows but that happen on the heap instead of on the stack that's the same type of idea yeah this is literally the basis of almost every single modern attack they're just the modern ones aren't as easy or simple as the ones we're going to study at the start to study what these things look like so we can practice doing that okay as we'll see so these are very architecture OS version dependent because you have to think you are overriding fights inside the process so have you ever just changed random memory values in your programs not on purpose have you ever had it happen though yeah weird stuff happens not what you'd expect to happen at all or it could crash so we this will require successfully exploiting these require a lot of knowledge about what the system is is it 32 bit or 64 bit exactly how does it work and the super cool thing is we can we'll be able to modify both the data of the application and the control flow so what's the control flow can execute right and if we're really clever we can get it to execute whatever we want and give us access there's actually a lot of super cool research work in this area of creating automatic automatically creating buffer overflow exploits so this was there is a anybody heard of the DARPA cyber grand challenge so this was have you heard of the DARPA grand challenge if you go one up we've heard of DARPA go one more back DARPA yeah they're the people who gave the money to invent the internet so the DARPA nobody does robotics research the grand challenge yeah what's the grand challenge it was like 10 years ago basically I think it was 2014 or 2013 how do you describe the grand challenge where there's an awkward one there's six maybe I can hear them grabbing one yeah so think about this so back in 2006 they had this challenge of who can build an autonomous vehicle or even like system I don't think they were very big that can go off-roading like I don't know five miles or something and I think the very first time they did it almost nobody was able to finish the race like it was pretty sad but you think about that in 2006 what do you see driving around everywhere in Kenby and Phoenix fully like anonymous driving vehicles that are on our roads so and really that was because DARPA I think they gave a prize of like a million dollars to the team that competed that the grand challenge first I think they fund the people who build them so DARPA is more like you can think of a VC like a venture capitalist of research so they give money to fund ideas that will be good in terms of helping the military achieve their objectives so so what they realized is what if we had a grand challenge in security a cyber grand challenge so what they did is they said okay so they they looked at capture the flag competition which are where people humans try to understand a new application like a binary analyze it for vulnerabilities and write exploits that they can fire against the other teams which is stuff we'll be starting to do more and more of is where you've done one CTF we're building up more and more we'll get to this attack defense stuff later but the and what they wanted to challenge the research community with was can we make can somebody develop a fully automatic way of playing these CTFs of analyzing a binary finding vulnerabilities but not just finding vulnerabilities actually developing exploits and launching it at those at the other teams completely autonomously so no human components so this was a two or three year long project I actually don't know all the details so some of you probably know or maybe have heard of Professor Yan Shoshish-Dashvili you can call him Professor Yan if you ever talked to him you don't have to try to pronounce this last name he actually was leading the UC Santa Barbara cyber grand challenge team so they ended up placing third in the competition and receiving $750,000 and he likes to say which is true that they were actually the best at attacking so they found the most exploits and on attack score they were number one but the way they structured this competition you got like you could make patches so they had a defensive component but there's a whole game theory aspect about your patches would cause you to lose points for one round and so you had to think about was that worth it and he said if they had just not done any patching they would have won based on their attacking score alone but you know that's you can't change the results now and so there's a whole series of tools that can find these types of overflow vulnerabilities and automatically create exploits but in order to even do that I mean the still there's very complex if you look at like what these applications were a lot of the applications weren't able to be exploited by any of the teams because they require like human intelligence and human thought so yes so this is a and not only so there's been a lot of research on the attack side but there's been a lot of research as we'll see on the defensive side so really what you see when you look at the history of this cat and mouse game where attackers say aha I can do this exploit thing and they say aha but now you won't be able to guess memory addresses because we use ASLR and the attackers will say aha well I'll break it I'll jump to live C and they say well we'll randomize live C and we'll throw in some sack cookies and they say okay but I can still get around that and all these other things and they envision Roth and anyways it's a continual arms race that goes on that's super interesting so what we're going to start out here we're going to start with the basics we're going to lay down some foundations I'm going to go a little bit quickly through this basic stuff because this is stuff I've actually taken from my 340 class so but I want to make sure we're all on the same page here and then we're going to go through like a historical development of this is how binaries were originally and then they added these defensive things and these are what attackers did to go around that and so forth so it will be really cool and you'll be doing these developing identifying and writing exploits as part of this course so it all comes down to the stack so fund yes so Brian these attacks assumes that we have the application post-development in languages like CC versus yes so it does it requires that the program that you're attacking is developed in C or C++ but there are these exist in I mean you think about what are large C or C++ applications browsers very large applications I mean lots of applications are still written in C and C++ even so you can find vulnerabilities in the runtime languages so flash which is a huge target because it's written in C the JVM the Java virtual machine has had bugs and vulnerabilities in it so yeah you can you're not safe and you need to learn all these principles so you can understand them and even so the other thing that I'll throw out there is that a lot of applications like even Java applications for your phone like Android apps use native code even though they're written in Java you can easily include native C code into your app like a library and you can call it through JNI and we did some research to show that like 35% or something of apps actually use that so yeah so it's a big deal I mean it's still a problem yeah I mean like a DLL like a DLL but like an SO because of the Linux system yeah so yeah it's like a library call but instead of calling it to an Android app it's calling it to a C, Brian I have a question like all this stuff that we're learning don't we have to be on the same network as the machine yes how is this applicable to protecting someone coming into my network or how would I go about getting into another network I mean I'm already on the network with the other machines what's cool about hacking into that so one of the so it kind of depends so some of the attacks we looked at like path and home those attacks do require you to have an account on the system but still those that's like I think of it as a privileged isolation vulnerability right you have access to the system as user foo and you want root access or you want user atom access because I have all the answers to all the tests so by identifying these and writing exploits for them that allows you to escalate your privilege so in the context of a pen test if you get to the point where you're on somebody's system it's much cooler to say I mean saying like yeah I broke in and I'm the web user on your system like that's okay but if you say yeah I'm root on all of your system on your system like that's much cooler oh and by the way I can get all your bank account credentials that you have stored in there the other thing to think about is so other vulnerabilities like command injection and specifically like buffer overflows here those are things you can exploit remotely and often they do so this is actually how you get into a machine when you're completely remote is you analyze it you import scan it, figure out what it's running see if it's running any custom applications or if it's running older versions of some applications and then sometimes you'll have known exploits and proof of concepts that you can just take and fire at it sometimes you have to write your own sometimes you have to find brand new novel vulnerabilities so it's all stepping stones so you start external you scan usually you'll be able to hit like the web server you'll get web server access and then you want to escalate that to root access and then escalate to other machines on the network all it takes is practice alright so we'll stop here and then we'll go back to the stack in a week and five days from now