 Hey folks, Adam DuPay here and this is a recording for CSE 365 fall 2019. So at this time I am traveling again so and we just released assignment six so it's here on the website and so in class today the TAs went over how to approach assignment six making sure that everybody understood how to get on the server how to access all of the challenges how to be able to get files onto and off of the server so that you can analyze them there's a lot of really cool stuff here I think you guys are going to be have a lot of fun here with this assignment and so what I'm going to do today since we did that in class I'm going to record a lecture continuing on application security so that you can watch this and we'll get prepared for the next time. So yeah if you have any questions on assignment six if you weren't able to go to the office hours feel if you weren't able to go to class feel free to come please come to office hours we'll we'd love to get you through it the important thing is that to incentivize people to actually finish it early we have an extra credit option of if you finish five of the levels by the December second you'll get an additional 10 points on the assignment so this will help boost your grade so and this is to really incentivize you to start early so not all of the levels we haven't covered everything you'll need to break all 10 levels right now that's what we're going to be doing through with this application security part and so yeah let's get into it so as Jan kind of went off and talked about previously so the endianness is a really important issue in in binaries and understanding what's going on so here what we're doing is we're kind of stripping away the layers of the abstraction that we've been dealing with to try to understand okay what is actually going on in these binaries and so what we're going to get dig into is what does the CPU actually execute right so it's important to remember that processors don't actually execute what we think they do right we write our program in a high-level language like c or c++ but that processor that bit of silicone doesn't know how to execute c code and it doesn't know how to execute python code or ruby code or c++ code or anything all it knows how to execute is specifically assembly code and actually what I'll mention here is what we're going to cover here is basically like crash course high level what you need to know about x86 assembly to understand a binary understand what's going on to be able to analyze it for security vulnerabilities but this is a really deep topic so I'm going to say some things that aren't actually true like the CPU executes x86 assembly code this is actually if you study and get into computer architecture this is actually not true for performance reasons modern processors will actually translate from x86 code to a microcode that the cpu actually executes and that gives them a lot of interesting benefits but for right now we'll ignore that complexity but this is something that if it really excites you if you really want to find out what's going on how computers work this is something that you can go do so x86 assembly language it's a slightly higher level language than machine code so what does that mean well it's important to remember that again just like a cpu can execute ccode a processor needs to execute essentially ones and zeros right so it interprets those ones and zeros as certain instructions with data and then figures out what to do from there so assembly language is kind of our human view of those ones and zeros that the cpu executes and so basically assembly language is like another type of programming language it's just really really close to the assembly to the to the cpu and what's actually going on so we'll see how those differ so basically we can have directives we can have some commands for the assembler to specify data regions that we need we can specify actual operations and we'll look at what are the different types of x86 assembly operations so here is and this is where getting familiar with hex is really important this is a jump instruction that says jump to start executing from 08048f3f and so the cpu when it reads that remember it's not reading this actual thing this gets compiled into those ones and zeros that the cpu actually knows how to execute but it really is a one to a very easy mapping here so one thing that will be very annoying is that because the assembly language is so low level there's actually two different types of syntax of how to represent these instructions with different orderings of the operands this may be very tricky and as long as you so the important thing to remember is every tool that you use will allow you to choose what type of assembly instruction to look at so translating between them for automated tools is incredibly easy so it really doesn't matter one way or the other so AT&T syntax is the one used by object dump and the GNU assembler and really once you start learning this more and more it becomes it really doesn't matter exactly what it's in the important thing is that's the instruction mnemonic so in this example it was jump and then the source and then the destination so if you're going to add you say add this register to another register and put it in the destination which we'll see in a bit the other type of syntax is typically called intel syntax and this is what's used by the microsoft assembler nasm and nasum and idapro so this is actually what a lot of kind of a lot of people actually prefer and use intel syntax but for this course we're going to use AT&T syntax and it's a little bit backwards so the only thing that's different so the mnemonic the instruction mnemonics are basically the same the difference is that the source and destination registers are flipped so that you're actually putting the results of the calculation in the first instruction here so again you can change it in gdb you can switch between these you can every type of even object dump you can specify exactly what type of output you want so it really doesn't matter don't let this hang you up um i'll show you as as we see some code examples ways that you can actually just easily tell from looking at code exactly which syntax it is so okay so what's the point of of assembly so essentially as we saw the cpu has registers which provide for fast computation right on the processor but of course this actually isn't enough we a modern you know x86 specifically has i think i don't know exactly how many registers let's say eight but it suffices to say there's a fixed amount of registers so if we only use those registers we wouldn't be able to access more than eight times 32 bits so that's the only amount of data we'd ever be able to compute with of course we know that modern machines have way more access to way more resources than that and that is the memory so we need to be able to access memory right so that's thinking about now we were extending our model from just the cpu and accessing its registers that are literally on chip to how do we access the memory which could be you know a vast amount of memory so to do this we're so this is the specific x86 format of how we access memory so we'll go by these step by step so there'll be a base operation of where we're addressing so what's the base of the memory what's the index the offset of that base address so the way to think about this is in terms of a an array calculation so you have a base address of an array and every element of the array is some index offset of the base but different sizes of arrays so if you're looking at an array of characters a characters of byte whereas a let's say an array of integers and integers four bytes so there your scale is going to be four um displacement is a constant offset and so anyways uh okay so it's important to look at these as an example i think you can go through these and kind of uh so basically what this is going to to indicate is the base so the syntax is going to be like this so we have the base the displacement so surrounded by parentheses will indicate that this is a memory access so this means this is something that talks to memory we're either going to take something and put that in memory or take it out of memory and load it into a register so here we have the base address the index the scale and some displacement off of that so for example here is a move instruction that is the size of the reference so we're moving along so along i should be 32 bits i believe here and we are going to do now here for the base we have we can either have a fixed base which would be a fixed memory location or here we're using the register eax so we're using whatever's inside eax that's going to be our base so then we just follow this formula so we do the base and then we're going to do plus ecx is the index here scaling by four and sorry you can hear some maybe you can hear some construction happening outside the window sorry it's something i can't control so i will just keep talking uh over this so we have the ecx register is the base the index is ecx so you can think that we're already getting some interesting things about this instruction so again this is at and at and t syntax yep so that means we're going to take from memory and put it into edx so we're moving along from memory and where is that memory whatever's in eax is our base then we're going to add to that ecx whatever's in there times four so that will give us some index off of that and then that whole thing subtracting 20 from there 20 hex and we're going to move whatever's in that memory location into edx so again it's important remembering memory is addressable by the byte so we're addressing four bytes we're moving those bytes into edx so super important thing before i go through and talk about this actually i will um look at it here i suggest you look at these slides and based on what we just talked about go through and think about each of these examples okay what is going on here where is memory flowing is it coming uh so key question is is it coming from memory into a register or from a register into memory and then what's the exact memory offset here so here we have the base ebp and we can see that because there's no so we just at the base we don't have the index or the scale that these are essentially zero so we can ignore them so the syntax helps here so we're taking ebp and we're subtracting eight from it we're moving from that memory location and putting it into eax so copy the content the memory pointed to by and i won't say that again now here is a simple memory reference and the really important thing and one interesting thing to think about here is what does this look like in terms of the c code that would generate an instruction like this so here we're we're and another thing to note is do we have a problem here we're using eax and then we're writing to it right so we're fetching whatever's at the memory location represented in eax we're taking that memory and we're storing it back into eax effectively clobbering and overwriting whatever was in eax but that's okay maybe we don't need that again so we have our register eax it's on our chip we're going to look at what's the address that's inside that register that's going to point us somewhere in memory we're going to grab four bytes there we're going to put it back into eax so so this is exactly like c code that does a dereference so when you do a pointer dereference what you're doing is you're accessing what does that value of that memory that that points to where is that so that's what this looks like so again remember with a move instruction the parentheses here indicate a dereference so we're going to go get that memory at that eax is pointing to and put it into eax okay now here we have the other way so here removing whatever's inside eax to somewhere else and what is that somewhere else so this this instruction means take whatever's in eax and put it in memory what location in memory whatever edx is plus ecx times two so now we can see with this dollar sign in front this is the syntax of a hard coded memory address so this is take fetch whatever is that memory address 804 a0 e4 and put it into ebx so this is oh i was wrong okay this is a good thing we're going over this okay so scratch what i just said this is actually copying the value so this is a hard coded value of 0804 a0 e4 and copying that into the ebx register so this is how we're able to load a fixed value this value never changes into ebx now the difference is in this where i got messed up is the dereference occurs with the parentheses so if we have the parentheses then we get a dereference so now this says instead of so we know after this instruction executes the value inside ebx must be 0804 a0 e4 but in this next line the value inside eax is going to be whatever is at the memory location 0804 a0 e4 cool okay so these are all the ways we can access memory now what do we do with that data so we looked at how we moved information into registers and out of registers back into memory so there's a number of different instruction classes some of which we saw so there's some that transfer data so we have move exchange push and pop we'll look more closely at push and pop later so we don't need to go into them now we have ways of doing binary arithmetic so we can add subtract multiply divide increment decrement we have logical operators to do logical ands ors xors not we can transfer control so this is how we get and how we get that power of programming of for instance if conditions right so well i think actually we will have oh no okay right so we can have jump conditional jump commands call in return we'll see how those work later but essentially you can think of as this helps us make function calls right because when we call a function we need to know how to go back so call gets us into a function we can do some computation and then we go back to whoever called us int and iret int is an interrupt so trigger and interrupt this is one one way for this is to call system calls from our user space program into the kernel we can compare values using the comp instruction the cmp and the way this is done it's the eflags register that we talked about earlier sets a whole bunch of flags after a compare instruction like a zero flag a cf flag anyways all kinds of crazy flags and so by using different types of jump operations like jne jump if not equal this checks if the compare was equal or not je only jumps if the previous compare was actually equal um and where to jump to can be direct so it's a constant address like we saw or it can be indirect you can say jump to whatever is in the address of eax so this would mean you calculated that address of where to jump to dynamically uh input output way to talk to peripherals all kinds of stuff and not so we have instructions that basically not just means no operation don't do anything do nothing okay so i mentioned it a bit earlier but how do we actually invoke system calls right we need to be able to call the into the operating system because remember the operating system does many many things for us right as a user space program we actually can't read and write files or do anything ourselves we need to say hey operating system please open this file for me and we talked about why this is is because we need to be able to we need to be able to the operating system does all kinds of permission checks right so we can't do it ourselves we need the operating system to do that for us usually most times that you think you're using um and also this also works for um writing out to even standard out so there's a right system call to write to a file descriptor we write to standard out and that's how we actually get output from the console usually though we use things like libraries so we use the libc library that has functions like printf which are much nicer to use than just the right functions so and we'll see why this is important because as we write an x86 assembly language program if we can't invoke a system call we can't see the output so we're going to actually see the output here so here i have a linux x86 so the way to invoke a system call is by int so the int instruction is an interrupt and what interrupt number you're using is hex 80 and then when you do it you need a calling convention so how do you call a specific system call in the operating system how do you tell it you're reading from a file descriptor writing to a file descriptor versus any of the other 256 system calls so what you do is you specify an eax eax contains the system call number and there's just a table you can look up for linux or whatever operating system that says okay eax is this this means this system call so we can do that we can write a hello world x86 assembly program in our data section we have a label called hw so hw colon is a label just like on c this is how you do labels on c so we're telling the compiler hey in in data we have a string that is hello world slash n so very similar now in dot text so dot text means our code segment and we're going to tell the linker that we have a function called main that we want other people to be called so this is how again how everything ties together when you write a main function and see it gets compiled down to this so that the operating system specifically linux knows to execute this main function as the entry point so our main function will walk through this so we'll see what's going on here so we're going to move four into eax so again okay so important things here first thing the dollar sign means a constant value so we saw that so move l dollar sign four into eax means move the constant value four into eax now as i told you before there's a cheat to understand exactly what the syntax is right so the difference between one of the key differences between intel and at&t syntax is the order of operations so if you had no idea what types of syntax it was here looking at this instruction actually tells you exactly everything you need to know at least in terms of where the destination operand is so how do you move so there's only two possibilities either removing four the constant value four into the register eax or we're moving whatever is inside the register eax into the constant value four now we know that this move operation is not in parentheses so this is not a dereference so there is no four there's no way we can move eax into four that's essentially a meaningless operation we can only move the constant value four into eax so we know that the destination register must be the last one which is at&t syntax okay so that aside we go we move four into eax we move one into ebx we move the address of so hard so now this is some nice thing that why we're not coding directly in machine code is that the compiler will be able to substitute in the address of hw wherever it puts this string in memory it will put that address here and say okay move that value into ecx and then move the constant value 12 into edx then call in 80 so whoa what instruction is this being executed so let's figure this out let's look up x86 linux sys call table okay so there's great resources on the web i did not preload this so i have no idea if this will work 32 bit says it's coming soon that doesn't seem promising so let's look at this linux system call table this for the linux 2.2 kernel we want to make sure we're in the right place this number will be put into the register eax great okay so we put four into eax which means that it is a right system call how do i know what the parameters are it's the right system call i can do man to write so section two is specifically sys calls so here i can read this i can say write write to a file descriptor what's the arguments to this function write a file descriptor a pointer to a buffer and count so what's the description write writes up to count bytes from the buffer starting at buff to the file referred to by the file descriptor fd great all right so we know we're putting okay so we know we're calling write so this means an ebx is the first argument which is the file descriptor in ecx is the buffer that we're writing to and an edx is the size so we can now go back to here actually let's just go here so we can now go back here look at this and we can see okay moving forward to ex so we are calling the right system call w rite then we're moving one into ebx so this means that we're writing to file descriptor one what is file descriptor one again it's really important to remember your file descriptors this is something that uh has got to be kind of in your core so zero is standard input so reading things from the user is file descriptor zero file descriptor one is standard output so this is for you outputting and file descriptor two is standard error so that's where you usually print error messages okay so we move that into ebx so we're going to write out two standard output and what where's the buffer that we're going to write out so it's at the address hw so this will be a pointer to this string that'll be he ll o space w orld then the question then is the last parameter was how many bytes are we writing out so that's the count so the count is 12 so this will do one two three four five six seven eight nine ten eleven twelve so do we do everything hello space world and a new line and that'll happen after we do the n80 then what happens afterwards so the system called returns we can see that it returns some size value uh what's the return value on success the number of bytes written is returned uh it's returned in the eax register but let's ignore that we don't really care then uh here then we have we're moving zero into eax and returning so we're returning zero from this function which is essentially main so what does this look like let's write this up and see what happens and i'm going to call this uh hello dash s so capital s is assembly now you should be able to just do gcc tell it to compile hello dot s i'm going to do dash m 32 and of course i didn't try this out in advance before making this video so i apologize for that i wonder if this will work awesome uh let's go at trying to look for syntax yeah i don't think that's gonna work okay the problem is the loader so i just need to compile x 86 on 64 bit for an x boom too so promise i don't know the right libraries to compile 32 bit apps there's some uh there we go multi lib so this should install a glibc for all of the things although i guess maybe for the loader yeah it needs glibc to load it okay so let's see installing just like in class i'm not gonna pause the lecture all right let's see there we go all right well we don't know that it actually worked let's run it we see that it says hello world with a new line great so now let's look at what actually happened so now i'm gonna open this up in gdb i'm going to break on i'm gonna first uh examine the instructions at main so we can see so move ah the syntax uh changed uh so yeah let me uh set disassembly flavor so you can see i sometimes use this so now it changed it say i told you it was very easy to switch back and forth so we can now look at our instructions and we can compare this to what what we had here right so here so here we have our main function is move forward to eax 1 to ebp move to 008 into ecx c which is 12 into edx in ad moves 0 into eax return and now so so the key difference is this value so we should be able to examine and so here we're asking um 2008 so here we're asking gdb hey examine what's at this memory location to 008 and print it out and interpret it as a string so here we have the string hello world new line so the compiler decided to put it there and then it substituted this in for here and so that's why everything works um and so yeah we can uh even let's take it for a quick spin and show you a bit of the disassembly so we can break on main restart it um i'll actually be talking about this environment this is the gef version of gdb which is very very useful for doing things like what we're doing here um because i can see we're at this instruction here we're about to move for into eax i can look at all of the registers that i currently have i can say next instruction so we just um move that there so four is into e uh a x i can do i'll do step instruction so one is now in ebx and okay question would be why is this value not exactly what we saw the 2008 um the reason is address space layout randomization so there's a lot of so basically when linux loads our program it shows a random memory location to load it into um so that's why it wasn't exactly the same but that's okay into ecx and we can see here gef does something cool is it shows me that oh hey so four doesn't point to anything there's nothing at memory address four there's nothing at memory address one but at this memory location that is a pointer to the string hello world so it's actually able to do this interpretation and figure out roughly what things are um so step instruction i can say move c into edx and it's about to call an in 80 so i know right now it's going to call system call four ebx to so it's going to write to file descriptor one with the string hello world and 12 characters i can do one more and i can see that it output that and i can see that actually the return value changed to be 12 because that's how much it output it's going to move zero into ex we're going to return and go and then exit so there we go that was our hello world okay but what actually happened right so we had our program executed so and if we look here right so we kind of have this insane thing that happens we compiled our hello dot s so we compiled the assembly language and what it did is it produced a file a dot out that is an elf 32 bit executable that runs so and this is again breaking down so we looked at okay so we have c code that gets compiled down to assembly code which is what the cpu actually executes but how does that program so remember it's kind of an interesting thing to think about a c file is just a file right it's just a collection of bytes so you have your compiler that translates that to an executable file in assembly but that again is just a file on the computer and i can see here if i do ls-la i'm looking at just files here i have hello dot s and a dot out so a really interesting question is then how does your operating system take that file this a dot out and actually turn it into a running process so this is where that elf file format comes in so the elf file format describes basically one of the things that describes as an executable binary so here i have and that's when i ran file a dot out i can see it's an elf file and i and so so essentially the operating system needs to be able to take a file on disk that represents a program and turn it into a running process so and this is the basic idea so the operating system needs to and we this happens through the exec v e system call but we'll get into that later but when a program is invoked so when we run dot slash a dot out our shell is actually asking the operating system hey please execute this program so what does it do well it parses that elf file using all the file formats it copies parts into memory and you can actually look at let's go back here so on linux the slash proc file system which if you're trying this out on our shared server will not show you anything because i've disabled it i believe so proc self so self is your process so if i run cat so i can see the memory layout of cat so i can look at all of this i can see where all the regions are laid out into cat's memory space so it figures out where to copy everything so remember the code needs to go somewhere the data needs to go somewhere if as we saw in that example if there are references that need to be changed based on where it's laid out so this relocation happens and then finally the instruction pointer is set to the location that the elf header specifies as part of the address then execution begins and then it starts executing and then at this point and it's important to remember and this is kind of the key to assembly language is that cpu is very dumb all it does is start executing wherever the program counter says it should the instruction pointer fetches that instruction decodes it figures out what it's supposed to do and then goes to the next one and keeps executing and then if there's a jump it will jump and go somewhere else in memory to execute more things so on x86 so processes think they have um up to 32 gigabytes of memory um what they so this is a representation of the process memory layout for 30 um specifically 32 bit systems running 32 bit applications so it's just an interesting um performance benefit here that the kernel used and took up the first one gigabyte of memory but that's we can ignore that for now but typically if you're on a 32 bit system and you're running a program memory starts at a high end which is bffff and has all the way to zero at the end so a program effectively had about three gigabytes of memory um so you can see why the jump to 60 so and again if you think about why we're limited to four gigabytes it's because addresses and pointers could really only access two to the 32 bytes of memory which is four gigabytes and you can see one of the big benefits of moving to 64 bit processors is the fact that we now have we can access two to the 64 bits of memory which is a lot so then what does this process look like when we actually start um digging into it so we need everything that gets passed to our program so our program executes and if we look at the main method we have the main method we actually have um so we write like a typical c program we have our main method where we take an argc so the number of arguments argv and actually you can take in another argument which is the environment pointer so all the environment variables that gets passed into your program so all of that actually lives above so at the very top of your memory is your environment your argument variables which include environmental data and command line data so all of the actual data is passed there then these arguments um are passed so it will dig into this um then we have the part of memory that's known as the stack so essentially it really comes down to system uh sorry not system calls uh to function calls so you have one function that calls another function then you call a third function which calls a fourth function if you think about that all of those functions need to use the process uh the registers that are on the processor right and specifically when one function calls another function it needs to be able it wants it to return it exactly the same location exactly the same state that it was in besides whatever computation that function provided however how can we do this if we only have a fixed number of registers right so if we have a fixed number of registers how do we get it such that we can have you know function calls that are a hundred deep that literally have are longer than than um as many processors as we have so what the idea that came up with this was essentially a a stack and in terms of the data structure and we'll we'll look into this but basically you can think of it right now is the stack is a location of memory that starts at high memory and grows down um then we have our shared libraries so any like libc or other libraries that we're using get loaded into the shared library then we also so the stack grows downwards but when we and those of you who did the secure house in c++ or if you remember reading those examples in c++ code those uh folks were able to read in as much memory as much data as possible so oftentimes our programs don't know how much data they're going to get in advance right there's actually pretty fundamental um concept and notion inside of computing is you don't know the size of the input that the user is going to give the user can give arbitrarily large input so there's no possible way for you to allocate enough memory in in advance so what do you do you need some mechs mechanism to be able to dynamically allocate memory and so that is in uh c or c++ you have the you can use the so in uh c you use malloc and free in c++ you use new and delete um similar concepts exist in java but we'll ignore them now um and so you need another place in memory that grows depending on how much data you're using so that's what we call the heap and if you think about it there's this really nice symmetry so you have this stack that grows as you do more and more function calls right so you need some part of memory that grows and grows and and shrinks right so it grows and shrinks and grows and shrinks uh similarly we need some type of memory location that also grows and shrinks based on how much data that we're using and so what we do is we situate these against each other so that each of them can grow towards the other finally we have a section that has all of our global and static variables so whenever you have a global variable it's in the either bss or data section and finally we have our code section which is uh towards the end here and specifically so the code section is usually marked as read only so that you can't modify and change the code at runtime although you can get around that a bit um okay cool so now that we've dug in a little bit we need to kind of step and think about how do we go about um analyzing these programs so I showed you a little bit here so we need to be able to um so specifically in terms of security we've been operating a lot of times on source code but you may not have source code in fact you may just have this a.out binary so the question is how do you know what it actually does and so this key thing is again remember if we uh if we just did a hex dump of a.out this hex dump is you know so this is an l file so the l file has some metadata but eventually it's going to get into you know here's just some code that uh the cpu executes so how do we actually make sense of that so we can use um different tools so this is object dump which tries to disassemble everything and if I look for main so here's my main and so I can see so object dump says ah these bytes b80400000 are actually interpreted as an x86 instruction of move constant four into eax bb0100000 is move one into ebx cool so how do they actually work so disassembly so we looked at compilation and how x86 so how compilation is how c gets compiled into x86 assembly and then x86 assembly is compiled into those bytes so we need a way to go backwards so that's the process of disassembly so disassembling gets you from basically like raw bytes into what is the binary so what is the assembly language representation um so there are a lot of tools I'm going to go over many of them here you can kind of explore them on your own as you tackle these challenges uh so Radaware is a program analysis tool you can go check it out there it has reversing and vulnerability analysis you can disassemble binaries it has a lot of scripting capabilities which is really cool and people use it to automate disassembly of binaries it's free which is cool IDA Pro is the state of the art tool for reverse engineering although that's changing a little bit so you can go to this website you can check out IDA Pro it supports disassembling of binary program it also supports decompilation so decompilation is that next step right so you have your C code you take your C code you compile it to assembly which gets uh essentially assembled down into the binary code and so disassembly is taking binary code into assembly language and then but what would be really nice is not operating on assembly language operating on C code so decompilation is the process of how do you smartly go from that assembly code to C code so IDA Pro has currently I think one of the best decompilers available in the hex raised decompiler it is very expensive in terms of this hex raised decompiler it costs about I think five to ten thousand dollars per architecture or something crazy like that although if you so I'll say this for those of you who really want to get into security as of now IDA Pro is like the de facto standard tool that everyone uses it's a but there is a version that's available for free that's I think a few versions behind so you can actually check it out on your own other tools hopper is a tool that I use that's kind of native to Mac so it has a decompiler actually in it it's kind of crappy but it you know sort of works it's pretty good it's only uh it can be used for free with a time limit and it's on the order of IDA it's basically it's around 90 dollars which I wouldn't recommend unless you're really trying to get into this area the other thing that came out recently is Ghidra so Ghidra is actually an open source reverse engineering suite from the NSA so this happened a few years ago where the NSA released their internal reverse engineering suite of tools called Ghidra GHIDRA so you can go there check it out it's actually very interesting and cool to see the NSA contribute back to the community and of all these tools it's the only open source tool except for I mean Radaware is or is uh open source I believe but I'm not 100% certain but anyways Ghidra is very cool this is actually so the interesting thing is Ghidra is putting a lot of pressure on IDA Pro so IDA Pro has been so if you can um so IDA Pro has so with all these tools you have the ability to add comments and rename variables in the binary code so you can kind of get more semantic meaning about what's going on for the longest time it was crazy because IDA Pro didn't have an undo button so you think if you think about trying to write code in like early versions of notepad which only allowed you to undo once um think about undoing zero times so yeah it was a thing that people complained about a lot IDA Pro never fixed until magically when Ghidra was released it has um the ability to undo and so now IDA Pro also has that ability um so but I don't want you to get super hyper focused on these tools I think if you're really into this you can definitely use the tools you don't and I want to make 100 clear you do not need to use one of these tools in order to do this assignment it's um or you know and actually I think it's preferable to start with low technology tools so you understand what's going on so the actually tool I recommend you start with is object dump so that's what you saw me just use in the command line object dump is the standard it's a new development tool it's a linear disassembler just tries to disassemble bytes into instructions and it's um just do object dump dash capital D means try to disassemble everything the binary and then give it a binary and then pass it through less is the idea so this is one of the you know it's like a so that's what we're looking at here is just the output it's nothing fancy it doesn't give you any fancy output but really you know being able to understand what's going on at this level is super important um but to understanding what's going on so um um so disassembly is important for trying to understand what's going on from the binary level because you need to understand the code to see what's going on but so that's you can think of that as a static technique so you don't even need to run be able to run the binary as long as you can look at that binary analyze it and understand what it does maybe you could find vulnerabilities in it the other way to tackle binaries and try to understand what's going on and try to identify bugs is debugging so actually try to debug the program at runtime to try to understand what it's doing so uh you don't and this is so a i think um debugging is a super important skill that will benefit you beyond this course so i in your other courses so you know going beyond using print statements for debugging and using an actual debugger and i will say print statements can be great and can really help you in figuring out what's going on but sometimes and especially for low-level things like this being able to really dig in to be able to understand what is going on in this binary is a superpower that will set you above your peers um on the job market in your jobs everything um what you're really doing and you don't need source code to be able to debug that's really interesting um i have stories of uh folks who are so good at debugging and reverse engineering they actually debug and find bugs in like the windows uh that cause their computer to slow down um so you do not need source code to debug a program and you'll debug it exactly at the assembly language level um and you get runtime introspection of what the program is doing which is super important to reverse engineering so you can think of disassembling as looking at the binary code statically to understand what it's doing whereas debugging is analyzing runtime behavior to see what's going on so one of the easiest and best tools actually gdb i use all the time for debugging it's the gnu debugger you can analyze if you've ever had a program crash and it creates a dot a core dump a dot core file you can actually debug it analyze the memory look from there you can start a program for debugging which we saw which i did earlier or you can attach to a running process um super important and the the documentation so gdb i will say has a pretty um difficult learning curve it's it's hard to get started but i guarantee you that if you put effort into learning how to debug with gdb it will save you way more time in the long run um and it has a really good documentation so you can set breakpoints to halt your program you can set breakpoints on certain conditions you can examine the memory stack layout everything it's an incredibly powerful tool and even better is you can script your debugging so you can set commands to execute when a breakpoint is hit so you can do things like when this instruction executes print out the value at this memory location it is incredibly useful like i um the yeah i find many many instances where i need to do this i believe uh facebook had a post a long while ago about using the poor man's debugger or sorry a poor man's logging or profiling to see what code is executing often and it used gdb to gdb scripting for this so it's insanely powerful you can change things at runtime you can do all kinds of cool stuff um you can also extend a gdb there's basically as i understand it there's a couple different um kind of extensions to gdb that folks use uh so one was the gef that you saw me use that i actually really enjoy using uh pwn dbg is another one there's a whole bunch of other ones and the idea is they turn the basic kind of gdb experience into something slightly better like we saw so um which gives you much more information excuse me i will say you do need to um kind of uh you know take your time understand this like so understand you know your typing and commands into here you this is the stack layout all your registers all this kind of stuff so here this isn't um so to not freak you out this is a x8664 program so we can see the different registers here not the eax ebx that we've been looking at um okay cool so that's the crash course in what's going on now what we're going to be doing is we're going to be looking at how do we think about attacking a unix systems this is you know very timely advice since you just got a homework assignment that is explicitly attacking unix systems so when we think about it so conceptually and this is what hopefully yan talked about in the beginning part of this program so in the of this section so in the last lecture so thinking about okay here's a system how do i x exploit it or how do i trigger vulnerability or and make it do something fundamentally that's not supposed to do right so conceptually thinking about this so here's this system well and that well one option would be well um what if i just change the code i changed the code and i can let it do what i want right well it's like wow that's kind of cheating like what if you can change the code you already can make it do anything so that's just either a crazy bug and that's really bad or it's not really something you should expect to have happen right so if i want to influence the behavior of this system and i can't change the code what can i change right the important thing to remember is the only thing that you have control over is the data that flows into that system and so thinking about okay where are we in relation to this system so if we have some remote server somewhere then maybe the only thing we can do is attack network services that are on that machine so this ties in when we talk to networks we talked about network security we talked about the importance of reconnaissance right being able to understand what applications are running on a remote system gives us targets of okay now i know i need to find a vulnerability in one of those remote services in order to get access to that machine we can also try remote attacks against the operating system so this is bugs at the tcpip stack level we maybe can do remote attacks against a browser this is how a lot of compromises actually real world compromises happen against companies so essentially what they'll do is so they'll figure out who are employees of the company through something like linkedin then they'll figure out who those people's friends are on facebook break into so they'll try to of course break into facebook accounts or something of those people if they can't do that they will try to break into your friends facebook's accounts then send you a link on facebook messenger to say hey go look at this thing then when you click on that link they use a an exploit against your browser to get access now on your your computer and of course you're on facebook uh during during work hours so you're on your work computer and now they have access to you they have a foothold into the organization and they'll propagate from there um other ways of targeting unix system so we know so local and remote here so remote we do not have access to the system we cannot run commands but if we're local users that means we have a user account on the system we can run commands as that user as our user right so we have permissions of what we can do but we want to escalate our privileges we want more right we want to be able to get root access on the system so what's our target there is targeting specifically set uid applications we could also try local attacks against the operating system because if that's successful then we're able to execute with the permissions of the operating system so of course so here we're going to focus on specifically local attacks so when we're attacking unix applications basically 99 percent of local vulnerabilities exploit set uid root programs to get root privileges um very little of them target the operating system kernel itself if you think about why this is it's kind of makes sense the operating system kernel gets hardened as more and more people find bugs in them because any bug in a linux kernel is likely applicable to almost all linux systems and therefore kind of you need to target different types of vulnerabilities so the question is what things influence the running of our application so again thinking about what of our data that we as attackers control how does that influence the application so that can be based on inputs which could be so what kind of inputs get into a program right it could be the command line arguments we know the command line arguments get passed in as part of argv so in any application our arguments are our arguments come from the attacker also the environment so all of the environment variables and the environment could be pretty large here right it could be um so here we're thinking specifically sorry environment variables so the environment that the program executes in influences its operation then things that happen during execution so for instance dynamic linking of objects file input socket input all of these things are inputs from us that we can potentially control then again as we saw applications do not execute in isolation right they need to be able to interact with various things for instance the file system right so but we if we're running a set uid application we control at least part of the file system right we control everything that we have access to if we um you know we can't just change things that are owned by root that we don't have permissions over this is why we went over the unix permission model so much uh we can create other processes we can invoke other commands signaling um so sometimes defining exactly what constitutes the binary the boundaries of an application is incredibly important so um or is sorry not not just important but it's very difficult um it's like what is the actual input so if you think of you know you can think about some systems maybe you can't get input but you see oh they're actually using tweets that tweet some hashtag out well that's a way that you can get input into that application maybe it's possible to trigger vulnerability that way all right cool so i think we have made actually fantastic progress here probably because i didn't have um and we maybe went a little fast this was a pretty dense lecture uh but i think we have a really good uh kind of starting off point here yeah so what i'm going to do is i will stop here and we will pick back up on attack classes and we'll go and so next class will be super cool we're going to start going through um really interesting uh so we're going to go through all the the things that are in dark here so path attacks command injection and stack corruption so we're going to really dig into the fundamentals of how computers work and see how people how we can subvert them uh for the attacker's benefit so uh thanks for watching and i will hopefully see you next time bye