 folks let's get started today we so you should have come back your homework two grades your midterm grades so you should be completely up to date on that I'm in the process of having all the submission stuff ready I'm very very close and it doesn't matter but I made some mistakes and so I'm fixing my own bug so you can submit your code and get it automatically tested for part three yeah that's good so we're going to finish up this I think we'll finish up this whole section on class on Tuesday and then that way the last week of class you can go over on stuff I mean it's all fun stuff but other other types of fun stuff all right any questions okay so we started talking about on Tuesday how does our we started to it's a repeal back the curtain on how the code that you write in C or C++ ultimately gets actually executed by the CPU on a computer right so what are the steps that are involved in there we write the code and then what yeah the compiler takes the code and turns it into assembly the compiler takes the code and turns it into assembly then what happens like that assembly turns into binary and then we want to run it memory and then what we start executing so yeah very high level steps so what we're looking at now is how does that process of binary getting loaded into memory actually work right the question is is the binary code just the bytes on the disk how does the operating system know how to take this file and create a process with code and data out of it and that is all containing the elf file format which is a super cool name for a file format it's the executable and linkable format so I have a nice name but also a cool elf name I'm really jealous there's somebody wrote a paper a research paper called how the elf ruined Christmas and it's not like weird vulnerabilities and ways to misuse elf headers which is such a good name for a paper so the idea is so Elf is an architecture independent file format so it doesn't matter if it's x86 x86 64 whatever most Linux based systems use the elf file format and so they'll either be relocatable which means it's a library that means to be you that can be used by the programs it may be executable where hey I'm ready to rock and roll I have enough information in this binary to actually execute and run it may be a shared library or you can actually generate maybe excellent generated like a core dump on a Linux system before and you get your directories littered with like core dot process ID like what the heck is this actually an elf file format that you can load up in a debugger to see what the state of your program was when it crashes when it crashed super good pro tools to learn how to use here the file command what does that do this before see what type of file it's how does it work does it have a machine learning deep neural network algorithm that uses on the files to see what kind of file type they are we're doing probably better than I was I mean some files have secret numbers other files have extensions maybe it'll just guess randomly say I think it looks like this yeah so okay there's no random guessing okay be in Linux we have to get rid of all notions of file extension extensions don't matter at all all the care about other bits on the file system and the third point is is is 100% correct so what file does is look it has a database essentially of known magic numbers and metadata in file formats that tell you what file format it is so for instance gifts always start with the first four bytes I think it's like gif a that's the first four bites of a gif file format I don't know all the ones off the top of my head and so it looks for these magic bite sequences to say oh this looks like it's a gift for a PNG or an elf an executable elf header let's that's working doctor breaker post grass point of post grass all right cool so let's say we wanted to know what is this challenge file okay so this actually tells us that it's a shell script if we look at the challenge you see that yes it's a shell script so it actually looks for the hash bang at the beginning of the file if we did file this week we did file file it'll tell us that it's an elf 64 bit LSB executable and it will tell us more information about it that it was built for this version of Linux it has this build ID it's been stripped of symbols all the school stuff and it's dynamically linked so these are all things that come from the elf header formats but there's a more specific tool read elf I think that's a is to read all of the this capital so this actually extracts a ton of information from the elf header about everything about this binary so when we think about it what kind of thing so this will tell you that what os it's for what the type is what the machine type is this is x86 64 bit it'll tell us all kinds of flags super important thing entry point address so this basically says when you start this tells the operating system once this file is all loaded once this program is all loaded in memory start executing the CPU and memory address 401de0 and so and so there's a lot of sections defined in the elf header we're not going to go super adept in all of these but it's important to understand the concepts here that there is typically a dot text segment which is your code this is where your code goes dot data segment is the data the global data that's used by your program dot ro data would be any read only data for your program the bss is another one and in it and finney are pre and post code you can get the details here for how to go to that but the super important thing is that these headers specify how to map actually let's go again so these are the section headers and so we can see here when I rerun this that let's go to the text right so we can see here the text segment is uh you have to look at this table to figure out exactly what it does basically it tells you how to map the bytes in the file to where that should go in memory when it's executed so this is going to tell us to load the text segment starting at 4 0 1 2 3 0 and it's going to be this many bytes large and that maps in the program itself like on the bytes on the file to 1202 and it's going to be this big as well so that's how it maps the raw bytes in the file to the program code and if we really dig into this we would hopefully find out that the entry point is somewhere inside the dot text segment right because that's where the program's going to start executing from the really cool thing is that the operating system can put different permissions on different segments in memory so for instance we want our data segment to be executable right we want the cpu to be able to execute that code do we want to be able to write to our text segment no why not i mean it'd be great if the program could write but i mean because once the program's written it doesn't we don't want are we rewriting the code like the the code that was written by the original person right so that would be changing basically the code that got assembled from the c program like how many programs do you write in c or c plus plus they're like okay execute this function and then change this line to this the second time you execute or something crazy like that or at runtime completely replace this function with this other function there definitely are uses for that but in a general case you would not want that to happen right and so you can have so that was we can think of this again another instance of policy and mechanisms so you have the policy that hey executable code shouldn't be able to write and change their own code the text segment and so the the mechanism is to insinuate this section in the program as only execute not writeable but of course we have our data segment do you want to be able to write to your data you want to write to your data change your data as your program executes so the data segment is writeable and then the RO data is read only data so you could put any constants you want in there and then that's not going to be changed and that will be enforced by the hardware so you can think of like no matter what happens that can't be changed which is kind of true change the memory protections of your tables you can actually call a system call to change your whatever your dot text segment to be executable or something so okay but all that information is contained in that file that the operating system knows how to parse the elf file format and put everything where it needs to go in memory with the correct permissions so we're going to look at here we're going to look at x86 code this is going to be the assembly language we're going to look at to understand how these vulnerabilities actually work against these applications so x86 has a long history of models it is one of the i'd say largest standards now and we know how do cpu just the cpu actually execute x86 directly for modern cpu's let's say so you studied MIPS machines right you saw the MIPS machine has all the stages of pipeline and as instructions come in it's effecting off the instructions decoding the instruction and then doing whatever it needs to do from that so you can think of it directly executing those MIPS instructions so do modern cpu's work back all right the short answer is no this is something you could definitely look into if you're into computer architecture essentially there's this crazy process that translates x86 assembly to what they call micro code and that smaller instruction language that the cpu actually executes and it's even more insane because intel actually has a way to update the micro code after the fact so you can think of it as another abstraction layer we use a program where you never get to see that because you only send x86 instructions there so all our processing assembly as we know happens in registers right we need to get will we work with x86 all of you are making parameters or like let's say we're making a new driver no you would not you'd only mess with micro code if you're trying to hack it and break it or if you're an intel designing chip so what they basically so what they realized is there's a probably basically if you like have a bug in the silicone itself right if you have some operation that's messed up that bug is literally there forever right that's the one of the key differences between hardware and software but if you have a layer in between that's converting between x86 instructions to maybe a lower level thing if there's a bug in there you can actually fix it at after the fact and maybe if there's a bug in the hardware you could actually change the micro code to work around it or something like that you could just do more work for that one instruction which the processor yeah but i don't know if you call yeah it's not like i don't know how it actually happens the update process but yeah it's pretty crazy and you think if you could take that over you literally have full access to the cpu anything which is cool cool so with x86 there are four so one of the important things to always remember when we're talking about these kind of things that we're talking about a 32 bit system how big are the registers 32 bits 32 bits how much memory can we address two to the 32 we need to register and effectively okay that's actually the the short answer there's a much longer answer that technically you can do more with segments and multiple registers but that's why there was a big push moving from 32 bit to 64 bit you can get much more memory and more memory is addressable so there's four general purpose registers ea a b c d it's actually incredibly easy to remember the whenever you looked at any of these things we look at a real world system it was not x86 was not designed from scratch it evolved from various other languages so instead of just having a register called a and a register called b a x i think x what was the x stands for i don't know extended no i think the e stands for extended so the eax means that it's 32 bit the a x only refers to the 16 bits and then there's other things in there so by convention actually this but the idea is depending on this actually a crazy thing when you're looking at assembly you're used to thinking of variables mapped to a distinct thing right so you would think that the variable eax would be different than the variable a x but these are actually different ways of referencing the same register so when you're talking eax you're talking about the 32 bits inside the a and the a x register if you won't if you say hey move five into the a x register you're referring just to the lower 16 bits of that a eax register which actually will not touch anything that's in the the upper 16 bits furthermore it's split even more into a h and a l where a h is the higher eight bits and a l is the lower eight bits yes it is confusing yes you do get used to it and similar things with with other registers like esi so esi and edi are used to transform memory and there's two incredibly important registers of esp and evp where esp is the stack pointer which points to the current location of the stack and evp which points to the current base pointer the frame pointer of the currently executing function we'll get into what these do but it's important to understand that these are actually very important registers all kinds of other registers i'm not going to get into it the other important one is the eip which is the instruction pointer why is the instruction pointer important it tells the cpu what to do next yeah if you are a cpu right you whatever and this is important to remember right is would you call a cpu intelligent what would you call it obedient that's a good one what else fast obedient stupid and fast is what i think about it i like obedient better it actually gets so stupid obedient and fast right it is incredibly literal right that's the whole thing that you're learning throughout your programming career is if you write dumb code the computer will execute that dumb code right because it's that's what it does it just obediently executes whatever you tell it the instruction pointer register is super important because whatever value is inside that instruction register is the address that will be tried to execute next no matter what it doesn't care how it got there it doesn't care where that data came from it's going to go execute from those instructions the nice thing or the bad thing depending on how you think about it is it can't be read or set explicitly it's not like you can say hey set eax to 20 the results start executing from 20 so then how do you change it so when you execute instructions it'll automatically increment to the next instruction and another tricky thing about x86 is pretty good i think mix is fixed length instructions right eight bit or 16 bit instruction width it's a risk i think it's eight bit but x86 is variable length instruction so um but you execute one instruction and that implicitly increments eax to whatever the next instruction is decode that increment eax to the next instruction and so on and so forth so it's definitely going to change through instruction how else can does how else is the programmer influence or change that value jump instructions if you think of if so the other important thing to keep in mind is your higher level language constructs don't always exist in the low level languages right so there is no if instruction well not exactly there's no while instruction there's no do while instruction there's a jump and a conditional jump that says if this value if this register is zero then jump a certain amount of bytes or back up so you can do loops like this you can do if else branches you can do switch statements all of this all inside here the other important thing is call and return instructions right because a call instruction is hey i want to call some other function so start executing this other function which means the the instruction pointer has to change to the start of that new instruction and then when that instruction is done executing you need to come back to where it was called right there's a whole bunch of floating point stuff there's all kinds of weird jump that the cb can do and super important point that we touched on earlier with networking this is incredibly important when you're looking at binaries and trying to feed input to cause it to do a buffer overflow or do some kind of exploitation you need to be what so what is indian this i wouldn't necessarily say in the register although more about computation when you take so when you're computing values which bytes so you have four bytes for a 32 bit number which byte is the most significant or the least significant byte um maybe uh maybe it does happen when you go to register interesting i'll just think about that i know sometimes i'd say when it goes into the register because if you debug something yeah it makes sense because the registers are just literally 32 bits so there's so you're going to compute with them yeah it's interesting i don't have to think about that maybe i don't know if it's like technically correct but maybe effectively like it maybe because you only use stuff in registers to compute so cool so the problem is intel uses little indian ordering ordering so if you have the bytes in memory of oh three oh two oh one zero zero starting at address uh f six seven b 40 at 40 you'll have zero at 41 you'll have one 41 two you have two 43 you'll have three this means that the most significant bit is basically so if you said hey move 32 bit from 56 70 b 40 into register eax you'd have the value oh three oh two oh one zero zero in eax even though when you look at it incrementally right the address you wanted plus one plus two plus three you can see the bytes are in the reverse order right because the byte that you're actually referencing is 40 but that's zero but that's the least significant bit so it's super weird yes i agree if you agree with that we are definitely in agreement um you also have to worry about two's problem when you're looking because remember when you're looking at a register or memory all you're looking at is bytes right there's no oh this is a negative one you have to say well what is the program going to interpret this value in this register as is it a address is it going to try to interpret it as a negative um assigned integer in which case it's going to use two's complement which hopefully you know about we don't have to do that um super nice to have a calculator like i mean it doesn't have to be a physical one i don't i don't have that all the time but uh at least i know the map has a calculator app that has a programmer mode that has hex to decimal to or base 16 to base 10 um and also has two's complement functions so that's handy to keep around it to think about so this was kind of the primer and background on x86 and some of the trickiness anybody have questions on that so so we talked about so we you've um coded some MIPS assembly but very few things actually use MIPS the two main ones are x86 x86 64 and arms i like there's three but uh six x86 64 is very similar to x86 so um the incredibly difficult thing about reading and and dealing with assembly is that the syntax change can completely change and that changes the meaning so when you have let's say a move instruction of move the value in the eax register into the ebx register um in at syntax you would say the source would be move eax into ebx whereas dos intel syntax is the opposite move ebx into eax so literally completely opposite ways of thinking about things but it's really just a syntactic difference so uh whatever tool you're using to look at assembly you can actually specify this option to get whatever you want the other really tricky thing when you're dealing with x86 assembly is so we how do you say hey give me whatever's at memory address x1000 and move it into eax it sounds like it would be incredibly simple just move at that memory location into eax the problem is is that there can be but the syntax is actually more complicated than that to allow um yeah i mean to allow different types of things so there's actually here i'll show you an example so um so this example would be let's see uh so we're going to move whatever's inside eax is the memory address that we're going to first reference so take 1000 that's in eax and then subtract 20 from it so this is a displacement so take minus 20 off of whatever's in eax and then add whatever's in ecx times four so it's kind of ridiculous but if you translate this into your mind of like iterating over an array and ecx is your index and everything you're trying to reference is four is four bytes so that would be 32 bits large then all you have to do in this code is eax is the base of your integer of your array and then ecx is your index it'll be zero one two three four and that way you'll reference all the elements in your array and the displacement is really used when you're um usually when you're looking at variables or arguments on the stack local variables or arguments to a function on the stack cool so you do things like this like says copy the contents of memory pointed to and the other important thing is when you see parentheses like this you know it's a dereference so it's saying take so inside ebx is a memory address it's saying take what is at memory and whatever's in ebp plus sorry ebp minus eight and move that into eax the other way so just like this this would be basically dereference you can think of this as a pointer dereference dereference whatever's in eax and copy it back into eax and we can flip it so in a move instruction the source of destination can eat one of them let's see it's an exclusive order so one of them could be a memory dereference but you can't directly copy from memory to memory it has to go into a register first so you have to say so if you want to copy from one memory location the other you have to say copy that into whatever ebx and then copy ebx to the destination memory location cool so you have like this which is a similar thing which says take eax and put it wherever ebx points to plus whatever's inside ecx times two and you can actually have constants so you can have constants in here prefix by the dollar sign so this says move oh sorry this is yes okay this is going to be a constant so this is going to say move the value a04 a0 e4 into ebx you yes so let's let's draw a picture all right so here's a difference so this says the difference between these two is what's the value of ebx going to be after this instruction it's going to be 804 a0 e4 or what's the value of eax after this instruction whatever's located at memory address 804 a0 e4 so that's the key difference there so and it actually gets way more clear when you start looking at like c code and then the assembly version of it and you start going like oh yeah point operation get it right over here or you have a structure you're accessing offsets the structure you have an object and you're using like pointer the arrow operator so there's instructions to move things either between registers or memory with move or onto the stack with push and pop which we'll see in a second we also do all kinds of binary arithmetic logical operators most of the things you'd actually expect of a programming language so it's really just kind of getting familiar or getting familiar and mapping what you know in one to another the things we talked about control transfers jumps calls returns interrupt handlers is these irats jump if this is kind of there's all rather than having which would be what you think you want is one type of jump if instruction and then just translate every call to that there are a ton of different types of jumps instructions and then the compiler figure out okay which ones are actually faster on most hardware's and they look at your c code and try to figure out which one is more like is the branch likely taken or not taken and it's a whole thing that you get into so input output and knobs are kind of cool just do nothing so we talked actually don't even know what this was I think it was during networking we talked about system calls what are system calls calls the operating system to do what input output networking opening sockets all types of crazy stuff so we actually need to call into a library call so if you actually take a program make some library calls you can or make some system calls like open or whatever usually you'll see that they're invoked through libraries through libc but how do they actually then get to the kernel right it has to be some way of a user land program to be able to tell the kernel hey I want you to do something for me right and so that so how that happens is essentially arbitrary there's no proper definition so each operating system can be different each different architectures can be different so x86 is different than arm about how they invoke system calls so on Linux and x86 there's the int command is an interrupt which what happens when a program calls the int or what happens when it interrupt fires look up what the interrupt is looks up at the operating system the kernel will look up at the look up at its system called table right or the interface yes so the the kernel will have an interrupt handler that it defines based on the hardware so the hardware so it tells the hardware hey when a call when a when any interrupt happens it doesn't have to be a syscall when an interrupt happens execute this chunk of kernel code so the kernel will then be able to see what was the parameter passed to the interrupt in this case it's hex 80 which it Linux has established that an interrupt of hex value 80 means a system call and then it starts to decode and pick apart which of the insane number of system calls that Linux allows is it is it and so eax will contain the system call number and then I believe it's ebx will then include whatever the first argument is ecx will include whatever the second argument is and so on depending on the system call so we can look at a quick example of this yeah there we go okay cool so we are writing x86 code we have don't don't worry this is for hello world not for homework we have a string we're calling hello world we have our tech segment which we're creating a global symbol called main so this tells the linker to expose this main symbol and we define a label called main where we move four into eax we move one into ebx we move what's dollar sign hardware what constant value is that probably zero why zero report it's not so why otherwise you do dollar sign zero okay yeah so we're we're basically saying whatever the compiler decides at where the string hello world is located put that constant value here and then this instruction will copy that into ecx cool then we move 12 into edx and then we're going to call in 80 and then so this will then trigger the system call or Linux will figure out which system call we just wrote uh or which system call we just called then we'll move zero into eax and return so this is um as we'll see is basically like defining main function the very last part is doing like a return zero um okay so decoding this so we actually have to we have to look up and figure out that uh system call for is the right instruction so we're basically calling right we're writing out to the first parameter right is a file descriptor what's file descriptor one standard out what's file descriptor zero standard in what's file descriptor two standard error yes these are also important things you should burn into brain standard out standard standard out standard in standard error cool so it means we're writing out two standard output and the second parameter is the buffer a pointer a character pointer that we want to actually print out which will be this address and then what do we think the last parameter is we want to print out right so read doesn't actually write doesn't actually care you know because we may be writing to standard out we could be writing to a file it actually doesn't care all it cares is how many bytes so if we didn't I believe hello space world with the new line should be 12 bytes um then we'll call me an 80 which will actually write this out for us and then we will return you can play with this you can take this put into a dot capital s file and then you can compile it with gcc and run actually let's do that right now I think it would be good so of course though I know what's going to happen is the powerpoint is going to have really all of this so let's see if you tap uh hello dot s do I want to paste with tabs it's converted to spaces oh that works okay oh snap do you think this is going to work what kind of file is it let's look it's an l 64 bit executable I promise I did not point on doing that I just decided to okay actually let's look at something else object um so as we'll see later this is how we can actually see all of the x86 code that contained in this binary so if we look for our main function we can see move four into eax move one into ebx move six oh one oh three into ecx move what's hexy 12 into edx call in 80 uh move this and then return so we have everything we want in there so we just wrote a hello world binary program okay as we've talked about I talked about so when the program is loaded and execution starts by the operating system the l file the operating system parses the l file format if you want to the slash prop file system is a really cool file system if you're on a Linux machine that will show you information about all the running process processes on the system so the super cool thing is you can look at like the memory mapping of the file so you can see what parts of that program's memory space map to what files on the system so you can see shared libraries that it's using you can actually use this it's actually pretty powerful you can see you can use slash command line I think it's slash process slash process ID slash command line that'll show you the command arguments that it was executed with um we sometimes try to use this on like capture the flags when we can't when we can read files or we can't execute any files yet to try to figure out what everyone else is doing so you try to see what other programs everyone else is running to maybe try and give you a hint as to what's going on it's only actually been useful once I think but so if we look the basic first for most 64 bit programs if you're running sorry 32 bit so running on a 32 bit os the kernel reserves the top one gigabytes of memory for itself and your application essentially starts with the top address of b f f f f f f so um why this is important is because the process structure so when you start when you write a c program what's the function you start writing first or what's the let's say the first function that gets executed main what are the arguments to main argc argv and there's actually a third one you can use if you want the environment env uh the environment pointer where does that data come from it comes from the command line yes and the environment comes from the environment variable where your program is executed in but where does that come from you're not are you actually literally reading from the command line yes so it comes more specifically from the operating system well okay there are steps involved we'll step through some of them but let's say when the operating system says hey I want uh gets a call it's actually a system call as we'll see in a little bit that says hey I want to execute this binary and I want to pass in this environment uh these arguments this argv and this environment pointer what the operating system has to do is actually create space in the program for all this data right because the process can't access data that's not in its memory space so at first um okay and this is I do this in most of my classes but from here on out all memory that we look at will start at the top highest highest memory at the top and lowest memory at the bottom so the up at the top it will be like f f f f or in this case like b f f f because we know that the kernel is going to take f f f f f all the way you uh to this b f and so what the operating system will actually do first is put all the environment variables and the argv variables in memory at the top of the memory location of the memory layout and then it will place argc and then the stack begin so the stack is then when main starts executing the stack is where all the local variables are stored and everything else and the stack the important thing grows down so the stack is drawing from high memory to low memory and there'll be other things there'll be shared libraries in there the other the important thing to remember is when we write like c programs there's two types of ways we actually allocate memory either on the stack by a local variables or on the heat through malloc and free and so the heat actually starts at a lower place in memory and grows up and then towards the very bottom of the program we'll have the data the vss and finally our text segment but this is kind of the standard layout of course as we saw an elf file can completely change this and do whatever the heck they want but that doesn't necessarily need to be how we think about things so this is important to remember because it's a little bit more detailed we're going to try i'm trying to give you all and this is actually the high level part this is not all of the degree details but the interesting thing what this means is the stack will start wherever let's say depending on the environment and the RV so if you actually pass in more parameters as a command line argument the stack will actually start lower down which if you're writing an exploit that depends on the location of the stack that means that will change that location so it's a important thing if you mind and to think about cool all right so disassembling so the idea is we want to actually we have some binary application we want to understand what does it do right oftentimes if we want to find an exploit on a program do we have the source code maybe it all depends right i mean it's open source thing yes if it's internet explorer you've downloaded from microsoft the idea would be definitely not right so but we want to realize like that binary code is just the compiled version of some other high-level programming language so disassembly is the idea of how do you which they actually are very stupid as we saw dump is very i think the proper term would be basic is it literally just tries to interpret the bytes in the binary as x86 assembly so when you actually do the dash d is trying to disassemble every possible thing you can find in this binary and that's why so when we went to main we know this ret queue is the end of our code that we wrote but object dump doesn't know that so it has found a not w which i have no idea what that is um and then not pl i don't the not makes me think it shouldn't do anything so i have no idea what it's doing here um but yeah so this may or may not be good code it may just be jumping the file who knows so object dump is actually you can get pretty far using object dump and just looking and looking at this and saying okay what is happening here so i'm this is kind of a a pointing you to some tools uh so right there is a program analysis tool it has reversing capabilities disassembles binaries uh cool thing is it supports scripting i believe with like python go and lula i want to say so you can write plugins that can help you disassemble or do things with and try to understand it's also free which is always a plus uh so that's cool uh idopro is the state-of-the-art tool for reversing and understanding binaries this is what like professional malware analysts and reverse engineers use um it so it supports disassembly as we said so the important thing is disassembly is going from the binary to the assembly but that's not terribly helpful because what you really like is the high level c code so idopro has decompilation through the x-rays decompiler it can be integrated with gdb it's a commercial project product it's incredibly expensive in that and when i say incredibly expensive i mean i think idopro by itself is one two thousand dollars three thousand dollars and then the plugins for decompilers to try to decompile it to c is what you guys are telling us like 1800 it's like 1800 per architecture so you need to buy an arm and x86 and x86 64 and like i'm maybe i have a mitch one too i don't know but yes they do have a and i think it's an older version that's available for free so you actually can go download it and start playing with it i will tell you it is also the most horrible like in terms of uh ease of use tools you'll ever use like it's just it's highly a classic old school windows application where it's just like here's various menus that do various things like good luck figuring it out there is an idopro book by chris eagle who uh he's a really good guy but he's in no way affiliated with the idopro people so he just realized how terrible the documentation is on this tool and so he took upon himself to write a book about how to actually use it hopper is a new one that actually so i have used idopro i don't use it so much anymore i mainly use hopper it includes actually a decompiler inside of it which is decent it's not it's a not quite the level of idopro's decompiler but it kind of gets by it can be used for free with time limitations i think whether you can't save your like your 60 minute limits or something like that but it's actually not very expensive it's like 90 dollars which is for compared to hide it it is like particularly cheap and it works really well on the mac that's kind of why i also use it cool so these are your toolkit to figure out what are these applications doing because in order we've we've talked about it before right in order to find vulnerabilities we need to actually know what the system is supposed to do right specifically in this case we need to know what the code is actually doing so when we so we're going to look at actually we're going to look at three attacks against unix systems and three different types of vulnerabilities yes okay cool so as we said so there are different types of attacks we can have remote attacks against a web service right where we are not as i think we mentioned on tuesday where we don't have local access to the machine we can't control what things execute but we know the remote machine is running the ssh sshd server so if we can find a vulnerability in that sshd server that will give us access we may be able to attack the operating system so we may be able to find an vulnerability in the operating system or actually now there's another level uh i tell you guys about the poem to own competitions they started as competitions where they get like a brand new top-of-the-line like windows laptop i think they had they started with windows and mac laptops and then they said okay it's a brand new completely up-to-date machine if you can take control of this machine i think usually by visiting a website i don't know the terms of work but you would basically get that machine and that's how this competition started every year they would they would be owned but the problem was that actually the value of these exploits far exceeded like the price of a computer so they especially because they started doing mobile ones of taking over like a cell phone like a cell phone like six hundred dollars or seven hundred dollars when when this exploit would be worth i think i want to say it's from anywhere from ten thousand dollars to maybe fifty thousand dollars but uh somebody i know who i used to hack with on the shellfish team he found a vulnerability in the baseband of some android i don't know what specific phones an android or those phones and so he went to this poem to own thing it was able to use it to take over the phone and get a nice amount of money so the baseband if you don't know so most of your i think everybody's phone has two cpu's at least one cpu's the general purpose cpu that runs your either ios or your android operating system it does all that stuff the super interesting thing is it does not place phone calls like that cpu even android itself does not know how to place phone calls it doesn't know how to talk to the 2g 3g lte or whatever that's what the baseband's job is it's a completely separate chip with different firmware that's running on it so in your operating system says i want to make a phone call here's the address it calls to the baseband to go do that and connect and it turns out this baseband chip has full access to everything on the phone so what the this person did is found a vulnerability in the firmware that was running on the baseband i don't know if it was through text message or through phone call or some kind of but they said he wasn't able to take over that firmware and then use that to give themselves full root access on the android device by changing the memory and stuff so yeah so this is why you should not consider this an exhaustive list like wherever where we want to go does exist you can exploit them you can even try remote attacks against the browser so if you know people are running certain browsers you do that you can have local attacks against set uid programs as we'll see you can locally attack the operating system to try to elevate your privileges from your current user to the root user so most local exploits will be exploiting trying to get root privileges so usually it's exploiting and set uid root program a very small fraction actually will target the kernel itself like a vulnerability against the kernel is a super awesome thing and happens rarely so so really the problem comes out to how we attack an application and it goes back to where we first talked about this right what are the inputs to this application so an application that you're executing what are the inputs to it now more specifically rather than our abstract model it depends on the application but in general for Linux commands what would be some of the inputs standard input any networks configuration I also throw the environment variables in there as well because your environment variables can influence that execution and files so any file inputs any sockets any interaction with the environment file systems that it's created it can actually be difficult to define boundaries of an application because they can be very complex talking to a lot of different systems so okay so as a I guess a brief advertisement not like a real advertisement but I'm teaching a grad class next semester where we go just focused on exploiting vulnerabilities so I go over all of these things plus more so for this class we're only going to touch on three of them that I think are a good smattering to give you kind of appreciation of different types of vulnerabilities and exploits but you can take my grad class if you want more okay so now it's how do we actually try to attack these applications so how do you access files on a Linux machine seems like a silly question but this is actually you can actually learn many things about security or find new vulnerabilities or exploits by rethinking through your assumptions and asking the stupid basic questions how do you access files on a say Linux machine give a quote from it how do you specify the file path slashes do what we're really bringing it out tells you the directory what else permissions for that directory permissions yeah so the permissions of that directory will influence whether you can actually open that file or not but just so think about it this way like how do you identify a file on the file system and how do you know what let's say you open file food how do you know later to open that exact same file food so we just talked about but we can like cap a file so how do you do that so let's say you want to output file food right and we actually know that cap is just a program that's a thin wrapper around basically an open system call and a read system call so it's going to call something like open etc slash food right would this output the same file so what's the difference between these two statements these two file references wait look very back yes one is absolute and one is relative right this and how do you know the difference absolute references start with a slash right this starts with a slash so we know to go to the root of the file system slash and then look up is there a directory called etc if so try to look in for that for a file called food right awesome this would then be a relative file reference right which means so this would mean what to the operating system because it's still going to go open food so what is the operating system that do looks for food in the current directory of wherever I'm executing this file so I think the os keeps track of the current cwd the current working directory of all the applications and so it knows to look there so this will actually depend on where I execute this file right where what my current working directory is because you can think about it as basically being equivalent to open a current working directory plus or append food this is definitely not valid c code but I think you understand well what if we wanted to open a file in our parents directory dot dot so if I wanted to dot dot then what food slash food so this is going to be open current working directory plus dot dot slash food which is going to say go to my parent directory and open a file food there awesome what would this do yeah so it's actually the same thing as dot dot slash bar right because you're going to go up one directory dot is the current directory so it's going to try to go into the food but then go back up I actually don't think it'll try to go into food I think it'll come out like this and get rid of that so we'll go back it'll consider this back up so we'll try to access bar in the current working directory so the important things here so what controls how the operating system parses what files we want to open the current working directory so the current working directory definitely what what other characters like think about it in terms of parsing like what characters are influenced like if I wrote so what's it what if I wrote cat uh dash dash raise let's say I like change it like this right dash star food what am I doing here star dash dash star bar what's the difference between these two yeah yeah I have to cap in your read only a single dash close around it it'll accept it what's the difference between these two yeah characters what characters why are the dots and flashes important right so they have special meaning not not to cat this is an important point we're just using cat as a wrapper around the operating system the sys call of open um it's to the operating system right the operating system has to figure out what file you're trying to open right if I replace all dots with dashes and slashes with stars it would tell me there's no file called dot dot star whatever dash dash star dash star it would actually try to look for a file in this name in the current directory which you can do if you're crazy um and so the important part to hear is these dots and slashes right are important and change what file is being reference um a trusted application that's accessing the file system if we can control how it looks up and if we can control that path and and provide input of our own we may be able to trick the application into opening files that it didn't expect and the class of one of this is the dot dot attack so if the application ever builds a path that it's going to open a file and uses our input as part of the prefix of that path so for instance if the path is like a string cat of the initial path with the user file and then it opens that path we can actually get it to open any single file we want by putting dot dot slash dot dot slash dot slash we could go um if it's going to maybe output it to us we could do dot dot slash dot dot slash dot slash etc shadow and then it would output us the etc shadow file which we should not be able to read there's also called a directory traversal attack this is happens all the time and in many different contexts not in just c context so this actually happens a lot in web applications when you are reading a file in php and you're using the user's input to figure out which file at that point the user can get you to read any file that your web user can read cool so it makes sense other things that we can play with that influence so so so dot dot so dots dots are important slashes are important what do the things have special meaning in the file system stars actually not so those are only for bash actually bash interprets the stars to try to figure out it's called glob look at glob globbing and bash we'll figure that out but that's actually not an os thing do something not special back slash actually not special although it is on windows but because windows that's like the pack separator squiggly line what are you squiggly line for to go to your home go to home yeah so actually the squiggly line is actually important and is used and so the super interesting thing which maybe you've never realized before so you should run the env command on a Linux machine because it's super interesting so these are all of the variables that are defined so i'm hosting is defined this terminal variables define this pwd by present working directory and the home variables this is actually how when i do like cd tilde it actually knows where to take me half one used for so here wait let me show you something so if we do export home equals what do you want my home to be so now if i do cd this i'm actually in barlog so whatever that that value is uh cool okay so what's the path environment variable used for it determines where your system looks for executables right so this is exactly when i type in ls how does it know sorry how does it know what program to execute yeah so it's going to first look in slash user local s been ls nothing there user local v in ls nothing there user local s been ls nothing there actually and you can actually figure out by saying which ls so you can that'll tell you the path to the ls program which actually comes up sometimes because sometimes you want to use a specific a very specific executable so and you can see this is the last thing that's being checked for in the path all these things we can control our environment that we execute other applications in so for instance there's the function calls uh i think these are libc calls exec lp and exec vp these use the path variable to look up and do this look up of which application are you trying to execute so if you have a trusted application that runs exec lp cat whatever whatever if you change your path to look up in the current directory first you can create executable called cat and it will execute that instead of the actual system cat similar things with the home environment variable you could modify dollar sign home to get them to trick them to think that home is a different okay i know you want to go but we're gonna go over this so we can do buffer overflows all day on Tuesday i mean all day we're gonna have a lot we have like 150 slides but a lot of them are animations so don't worry okay so the other so these are all different types of file system attacks if you can control and influence the path that the application executes or if the application is implicitly relying on path and home you could potentially try to exploit it so often applications don't want to read so we just mentioned cat like why write code in your application to read in the file and output it if you can just call it the cat to do it for you right that seems silly like just linux has you and unix has all these awesome system utilities why not reuse them and their functionalities right so there is a system called this is confusing there's a system libc call so system is not a sys call but system is a libc function that executes the command specified in the string as if it was called slash bin slash sh dash c and then the string that's passed in so this is literally exactly the same as if you are in bash typing in this command that's passed a system so oftentimes we want to use things like you may want to remove a file by just calling system and then string and the thing that you want to remove p open is another function that ends up using system underneath so the idea behind command injection is if i can influence and alter that string that you're passing into system i can maybe try to get you to execute additional commands how could i do that so let's go back here so you can think of it literally like if i have something that says system whatever system cat and then your input that this is exactly the same as if i'm at my terminal typing cat and then a space and then you want to take over my system what do you have me type in next t what do you want i'm literally just going to type what do you say uh i do like cat it's something like bogus file name and then like a semicolon and then rm dash rf slash uh with a space after rm i think it'd be fine it's a so you just got me i wanted to run as a developer i only wanted to run cat you just tricked me into running rm right by using a execute additional commands so it's going to happen easily in a c program if you're creating the command like cat slash bar slash log and using rv one and returning system now i can actually output whatever the contents i want i can literally execute any commands as if i was sitting right at your terminal um so you can use this to get super cool stuff and very quickly there's a real world example shell shock all right we'll go over this or look at this what you all saw