 Alright folks, new homework assignment, it'll be, you can get it at this link, it'll be posted on the website right after class. Got a week, a two weeks and a day I believe. Is that correct? Big math is hard. On the 20th. There's two parts, the first part it continues where we left off with networking. The idea of the first part is you're going to be impersonating two hosts that's on the network. And what you're essentially doing is creating two fake hosts. One that whenever it gets scanned by an attacker you respond to whatever TCP packet got sent in by taking that packet, sending it from your second host that you're spoofing to the attacker who's scanning you whatever their IP address is. And then if that responds then you respond back with the original. To from the what we call the victim address. So there's two things, victims and reflectors. So to impersonate these you'll have to create the handle ARP on both of them. You'll be given IP addresses and MAC addresses on the command line arguments here. And so you'll need to handle ARP and you'll need to handle any IP, TCP or TCP packets that come from one to send it out together. The idea is, so let's say the attacker is court scanning your victim. So when they court scan they'll send a TCP SIN packet to a specific port. And so what you'll do, you'll get that packet, you'll send that same SIN packet from your reflector IP address to the attacker. And so if the packet's open or if it's closed it'll get a reset back which you will then take and send that from the victim back to the attacker. If the port is open you'll get a SIN act which you will then send back from the victim. So it's, you're going to read this, it's going to seem very complicated. It's actually not very complicated at all because you're just sending whatever you get back. Like if they don't reply at all, you don't reply. So you only reply back. So the really cool thing that you can tell when this is working is when you can SSH into a fake IP address that doesn't exist and you're actually SSH-ing back into yourself. So you think about the attacker, if they're SSH-ing into some IP address they're really SSH-ing into their cells and then they think they're on a remote machine and they run like RM-RF. Then the joke's on that. You can write this like the other things, there's a test script here that should make your life a little easier. That sets everything up. You can dig through that to see exactly what's going on. I'm not going to go take up enough a lot of time on this. You can do this in whatever language you want. I highly recommend you use Python and use the Scapi library. It makes this assignment a lot easier but if you're a, someone who likes pain and punishment feel free to do it in whatever crazy language you want. Sorry, I mean if you want to understand and learn things. The second part is going to be going to shift focus more to binary so this assignment's about getting to understand and looking at analyzing binary code. So you're giving access to a binary, it has some password in it, give us the password. That's it. That's it. Cool, should be easy. Alright, let's get started so you can actually do these assignments. Alright. Our friendly health individual assignments. Work on your own stuff. Okay, we talked about the application lifecycle. We first write our applications in a high level language. We then use some tool to either interpret that high level language and execute for us. Or we then use a tool that will compile that down to an executable format. At the end of the day that still just bites on disk that we actually compile. That those bytes need to be somehow loaded into memory and executed. And then our application executes and we terminate. So the ELF file format. So ELF is the executable and linkable format. The idea is this is the main binary format on Linux. And the idea is it essentially describes some kind of either binary or executable. So in this way it's nice, it's architecture independent. So it doesn't specify x86, nid, arm, all these things can ELF can manage. Let's see. Interesting things. So read ELF and file. These are your nice tools and interfaces to try to learn more and understand more about a given binary. So read ELF is a main tool that you want to use. And remember I'm briefly giving you overviews of these tools. You should read the main pages of these to familiarize yourself with all of their fancy command line parameters. But all kinds of things. So executable shared file. So you can file a .so file. It's an ELF file format. It describes exactly how to use this library. And so it's a pretty simple ish format. You have a header file. The header describes. Yeah, okay. Header has lots of nice things. What's a magic number? Just magic. Cast spells. In order to understand that this is an ELF file. And so you can recognize it from other file formats. Yeah, so it's actually something that's optional, right? You don't need any magic numbers in any files type. But it's nice because you can tell, hey, ELF files always start with, I think it's ELF. Are the magic numbers. Yeah, it is. And so this is nice because when you just get some bytes on a file, you can tell, is this an ELF file? Is this a .x file? Is this a .file? Is this a whatever without using file extensions? So this is why, as you've noticed, a lot of our stuff when we deal with Linux, we don't care about file extensions at all. Because it's whatever the file tells us it is, that's what it is. So I took this other handy command here, the file command. That's what this does. It just has a big database of all the possible magic numbers. You give it a file and it will figure out what kind of file it is and any other important information it needs from there. So basically you have, so the important things here for our purposes for understanding. So essentially this ELF file format is going to describe, because remember these are bytes on disk, right? So it needs to describe, okay. So each of these segments say, okay, map this segment into this memory location at runtime. And set these permissions on this memory region. We'll get those in a second. Another important thing is the entry point. So this is what specifies what memory location should we start executing at once everything is all loaded. So this is kind of the start of your breadcrumbs to understand a program. I want to understand exactly what happens, right? The operating system, what it execs, one of these things. It parses this ELF file format and it loads everything in memory where it's supposed to. And then it starts executing right at this entry point. So the sections are where the main meat here is because these define all the sections in the file, how they map onto memory. So there's a lot of flags here. I'm not going to go into all of them. We'll go through an example real quick. So this is, there's different segments here. .text is traditionally your code. So this is a executable segment. And so it has, so the important thing here are the flags. So the flag bits allocatable means that it actually, I believe, I think it's creating a space in memory based on the, what you have in the file on disk. So you can have different things. Like let's say if you have a memory region of all zeros, that doesn't have to be on disk. So that would not be allocatable. So interesting things are writable and executable. Why would you not want a memory region to be executable? Where are you storing global variables? Why would global variables need to be executable? Global variables need to have a fixed memory location so that everywhere in this program can access that and access the same memory locations. So why would those need to be executable? What about writable? Why would you want memory that's not writable? Constance. We want to enforce the fact that we have constants in our program that definitely should not change through program execution. The operating system will help us enforce that by making sure that what would happen if we tried to write to that memory region that's not writable. Set faults. Yes. This is actually the answer to almost everything. So what happens when you try to write to a memory region that's not writable? A set fault. What happens when you try to execute a region that's not executable? A set fault. What happens when you try to access memory that's not allocated to your program? A set fault. Exactly. So the text segment, it makes sense. It's executable, it has to be executable, and it's not writable. Why is it not writable? You don't want to change your code at runtime. You can write insane programs like that because you can always change these bits if you want. You can go at a super low level and write an insane assembly program that does this. But when we write our C programs, normally the code does not change at runtime. Similar things, the .data segment. So this is the initialized data. So this is any global variables that are already pre-initialized. It's writable because we want to change that. .ro data, read-only data. So this makes sense. BSS is for global data that isn't pre-allocated. And we'll look at maybe this init and finny later. These are hooks on program execution and program terminations. You can have some code that executes when the program loads and right before it ends. So it's actually, it's, yes. What execution are these? .bss. So it's not a file, it's a segment. So these are just segments inside of an elf header. And these are the typical names. So the names don't intrinsically mean anything except for the fact that most compilers will generate a elf section that look like this. So it's not an extension. You won't see a .bss file. You'll run the read-elf on an elf and you'll see all these different sections inside of an elf file. For completeness in the Windows world, the PE file format, the portable executable file format. Also the .exe format, which is mainly how I think of it. It's pretty simple. We're not really going to go into much detail here, but you can read more about that. I just wanted to be aware that there are differences between the operating systems, but really the concepts are still the same. You need some way to describe how to act on these clients, load them into memory addresses, and get them to start executing. Alright. Now we're going to need a primer on assembly language. Because let's say we want to analyze a binary application for vulnerabilities. Do we have the source code, the seed code? If we have the executable, we have an elf that we can look at it. So if we have, say again. If we have the executable, then we can use read-elf to look at the source code. Okay. What do I do by source code? I mean, what does that take? What is the program type of it? Yeah, so there's two different levels, right? So there's the seed code, the high level language, or C++ code that the developer coded the application in and wrote the application in. And then that gets compiled down to assembly code. We're saying it's compiled down to machine code. That's what we get in a binary, right? So if you buy, let's say, does anyone here have the source code to Microsoft Word? No, why not? For private area, right? For private area, right? Microsoft wants to keep that source code. So we're not going to give you the specific access to it, but you can still run and execute Microsoft Word, right? So, but there are cases, so in open source software, we will have the source code, right? But we have to kind of assume, if we want to try to find vulnerabilities in some kind of binary, we're not going to have the source code. And so all we have to do and to work with is the actual binary code itself. So this is why we have to understand and look at x86 code or assembly language and understand exactly how it works so we can reason about what the code does. Brief history of x86. But we won't necessarily have the assembly either, right? Say it again? We won't necessarily have the assembly either, right? We have the binary. We have the binary and we can use tools. We'll look at object dump in a second, which will analyze the L file, figure out all the executable sections, and then start disassembling because it's a one-to-one, very easy one-to-one mapping because it's between this byte sequence and this assembly instructions. The trick comes because x86 is not a fixed-width instruction language. So depending on, let's say you have a sequence of five bytes, you may have five different instructions depending on which byte you start on. So that makes things slightly tricky but in practice it's not a real concern and so you pretty much always have the assembly available. You just need a disassembler to look at it in a nice way. So that's called disassembly, sorry, wait, decompilers, there we go, yes. So disassembly is the process of looking like object dump, taking the object file and turning it into assembly language. Going back up is a harder, more difficult process. So there are tools that do that. Some of the better tools like Ida Pro has a decompiler called the Hexraise decompiler. That's, I believe, like five to ten thousand dollars per assembly language that you want to use or something like that. It's pretty expensive. And they're oftentimes, you know, you don't have variable names. It has to guess a lot because loops are just backwards jumps. And so honestly, a lot of the really good reverse engineers that I know will use kind of like the decompiler at a high level view but really they want to see the assembly code because you look at it enough and you know the patterns and you can start reasoning about it that way. That's a lot of them use like the Ida Pro disassembly view rather than the decompiler. What do you use? I use kind of a combination of both. It kind of depends. I use Hopper, I'll talk about it in a second, I use Hopper as a disassembler. It's a lot cheaper. It's like a hundred bucks for a license and it works fairly well but you know, there's always problems. Is that license on an annual basis or is that like you got it? I don't know, that's a good question. The Ida one is on an annual basis, I believe, with cheaper rates for renewing. The Hopper, I think it's just like a one time. That's a deal then. It's a pretty good deal. There's other tools. We'll talk about it in a second. But first we have to understand x86. Again, like I talked about, x86 is a 32 bit architecture. One of the interesting things when you look at this history is the history of x86 as an assembly language goes back much farther. Actually, it's kind of a lie to even say that modern CPUs actually execute x86 instructions directly. What actually happens if you take that architecture course, you learn about microcode. What actually happens is the CPU takes in decodes of instructions and translates them to microcode that it actually executes. Which is nice because then you can upgrade and update your CPU if you ever need to because let's say there's a bug in how you handle a certain instruction. You can just upgrade this microcode and change how you're doing that. Specifically, 32 bit. So what does it mean when we say that a CPU is like a 32 bit architecture? I just said 32 bit. Yeah, what does that mean practically? So I just said 32 bits. Like, again I just- Just a second, let's go over it. With the instructions. With the instructions? Not quite, not always. With defense. What was that? The boundaries are more, I think it's more CPU specific on whether they care about word boundaries or not. Some of them really care and some don't care at all. Like you can address single bytes. So it goes back to the bus. So you can talk to the memory of the 32 bit addresses, which means you can only reference up to basically two to the 32 memory locations. Or each location is a byte. Arm, though, is 32 bit instructions, I believe. Arm 32, I think, is 32 bit length instructions, except in thumb mode, it's like 16 bit. So the idea is you can really only address up to about 4 gigabytes of memory, unless you use various tricks, which you can use to access. You can chunk the memory to different segments and you can use multiple registers, one register to reference a segment, and the other register to reference inside there. So you can get around this, but this is why there's a big push to 64 bit memory architectures because we quickly hit limits, really hard limits in terms of how much memory you can use in a system. And memory is really cheap. We want to use 256 gigabytes if we can, and we want to actually take advantage of all of that. Arm, important thing to remember about modern computer architectures. When we're talking about it at this level, there's no such things, variables. There are no variables. There's, what are there? Registers. You're only saying that since it's on the screen. What else are there? I mean local registers. Yeah, registers. Indirect and, yeah. What was that? What's an address? What was that? And memory, yeah. So we have two fundamental things. We have registers which are local on the CPU, on the chip, and we have memory. That's essentially it. And we can address every, so every byte in memory has a specific address, and that's how we can refer to them. But fundamentally, all computation happens in registers on the chip. So if I want to add two numbers together, I need to pull the one number from memory into one register. Pull the other memory, hold the other memory region into another register depending on the architecture. Add them together and then copy them back to memory. So this is, the fundamental thing to understand is, and this is what's tricky when you're looking at assembly code, because a single register will be reused multiple times. So you need to get good about understanding when things are used and how they're being used. So there are, and the other important thing to remember, so if you're just looking at a value in memory, how do you know if it's an address or a number? A ratio number. Maybe. What if I add it to something? Does that tell you what it is? The dollar sign. What was that? The dollar sign? Doesn't that have to be in front of the address? Let's say we're just looking, we're using a debugger and we're looking and saying, what's the value in the EAX register at this specific line? That's a trick question. Fundamentally, you don't. So you have to infer based on usage. Sometimes you can tell. If it, we'll talk about de-references, pointer de-references, right? If it de-references something, then clearly there has to be something there or else it's going to throw a left. A set of flaws. You're going to try to access memory that isn't allocated. So, but it's really important to understand this, that there's no, you know, the distinction between what is the negative number assigned, unsigned, all of this. The computer doesn't care about any of that. At this point it just cares about bytes. And it's going to operate over those bytes. So there's four general purpose registers. We're going to talk about EAX, EVX, ECX, EDX. This convention is probably not worth realizing. The important thing, when you're looking at instructions, so whenever you refer to the EAX register, it's always the full 32 bits of that register. So if you move zero into EAX, the constant value zero, that is going to literally wipe out every single bit in the EAX register will be zero. But we can also do these tricky things of referring to half of the register. So you refer to the AX register as just the lower 16 bits of the EAX register. And then within there, we can use the AH register to do the upper eight bits of the lower half of the EAX register and the AL register of the lower eight bits of the EAX register. Which means you can do crazy things like if I move, let's say, all lefts, so all ones into EAX and then I move, let's say, constant zero into AL, I'll still have all those one bits in the other, all those other bits, yeah, lots of different bits. So there are other registers we'll talk about in a second. But this is one of the important things to keep in mind and this is when you really try to think about it and think about, okay, there are X number of registers, there are a certain number of registers, but fundamentally our EAX operation is the same thing as the AL, the AH, all of these different. All of these references are all referencing the same register, just different parts of that register. Cool. There are other ESI and EDI are two other registers. They kind of have this, they're used in memory operations, but again, they can still be used pretty much for anything. There are other registers that actually come up very frequently and so they gave them special names and designated them for certain things. So one is the stack pointer, so the stack pointer, the frame and the frame pointer. There's a third one, EIP, which is the instruction pointer, but I believe on XADC states you can't access that directly. So how do you access or how do you change the instruction pointer, register, jumps, through jumps, or even through normal execution. Because the instruction pointer will point to what's the memory address of the next instruction to be executed. And so as a by-product of executing instructions, that register is constantly changing. And when you jump, essentially jump to, let's say a fixed memory location, is set the instruction pointer equal to this value. And that's how the processor goes and does jumps. Alright. We have a series of registers that we don't really, so this is segment registers. These very rarely come up, but it's important to remember, so we have the memory that we can access based on different segments. This allows you to access different segments of memory. The eFlags register we'll talk about in a second, but it's not super important, let's say. Okay, there's also a whole lot of floating point operations, so floating point operations. I believe the MMX and the XNM instructions are all about taking vector arithmetic, I want to say, but don't quote me on that. It's often happens, you'll be reading some code and be like, what the heck is this instruction? And so you'll copy an instruction and look it up and figure out what it does. So this is literally the main way when I come across these things that I have no idea what it is. This is my process. Oh, it's a weird instruction, I have no idea what that is. Look it up and I'll tell you what it does. Okay, so the eFlags register, not important to memorize all these flags, but essentially, every time there's an instruction that compares the empty instruction, compares two registers together. Depending on that comparison, different flags will get set, whether they're zero, whether they're equal, less than greater than. And then a jump, if greater than, will test these bits inside this register to determine what it is. Alright, important to remember the difference between a byte, a word, a double word, a quad word, a double quad word. So byte, eight bits, it's more important in the assembly. I personally don't use these terms too much, but it's important if you're ever confused, look it up. Alright, and before we dive in too deeply, so I mentioned this when we talked about network security, but be hyper aware of ending this. Specifically, what are the order of bytes in most significant versus least significant? So if we, for instance, have the value 03020100, starting at some address, let's say 00F67B40, that means at 40, there'll be the byte 0, 4-1 will be 1, 2 will be 2, 3 will be 3. So this is, as we'll see, this actually comes up a lot, especially when you're crafting exploits that you copy into memory. If you're copying a value in and it seems like it messes up, it's probably because your byte's in the wrong order. Signed integers, so 2's complement, so what's 2's complement? Yeah, flip the bits and add 1, so this is again, but to the CPU, it doesn't care that this means negative 1. All it does is this is all 1, so this is up to you when you're reading and you see like, why is this register being compared with all F's? It's likely a signed operation that's comparing, is it greater than negative 1? So it's something to be aware of. I actually, let's say the two tools I use the most whenever I'm doing binary things, it's usually, actually I guess three things. So one is a disassembler that you're proficient in, you know how to work. The second thing is a calculator with 16 bit and hexadecimal input, so you can do hexadecimal calculations and not have to think about it. And the third one is actually like a piece of paper and a pencil, so I can draw things out as we'll see in a bit. But I very frequently do that to keep notes and see what things are going on. So x86 is a slightly, very slightly higher level language than machine language. You basically have, let's see, okay. So the important things here are the really annoying thing about assembling language is you think it's very simple, it's the thing closest to the CPU. And yet there are two different styles of syntax to use. And one is AT&T syntax, one is the Intel style syntax. And you think like, oh, this is just a minor thing. There's an incredibly big difference. And so usually the way most instructions are, you have some mnemonic like add, compare, subtract, whatever the actual instruction. And then you have either the source register or the destination register. And that order is completely swapped depending on which order you look at. I'm going to have to cheat to figure out which one we use because I don't actually remember. I think, yeah, we're going to use AT&T syntax. And the way you can always tell, so this is the trick I use to be able to tell, is you look and you see where are the constant values. So when you look at the assembly, a constant value will start with a dollar sign. So if it's saying transfer dollar sign zero into EX, you know there is no dollar sign zero register. It's not possible to transfer something into zero. So this means that the destination has to be the last operand. If it's flipped the other way, then you know the source is the first parameter. So you just have to kind of switch your brain and I don't know. It's less of an issue and every tool you use has a way to flip between the two. So you just have to learn one and stick with it. So I use AT&T syntax, so that's what I use by default. All right, so hexadecimal numbers, we'll start with zero X. We want to use numbers. We can define data objects as different types. We can define and remember we need, we're not talking about high level, oh this is an integer or this is a float or this is a string. Like is it a byte, a word, a double word, a quad word, right? Those are the sizes that assembly cares about. So we can define a variable, so we can find two 32 bit values. And I will say assembly is nice-ish in that we can define these global variables and we'll figure out a memory location and we can then refer to those in our assembly code. Because we will be needing to write our own assembly instructions for this class as well as read assembly. All right, so addressing memory, this is one of the more tricky parts of X86, especially when you're new to this. It seems very simple. You want to address a specific byte in memory. Every byte has a memory address. That memory address. The problem is every instruction, so to make it easier to, I guess the important thing to remember is X86 comes from this very long history of CPUs. I don't know the original start date. But where you mainly program in X86, you're programming in C or something else and then just compiling. So when you're accessing memory, you're not just accessing a specific byte. How do you access, let's say, like a list of integers? Think of base address add 4 bytes to it. Think of base address and then so you loop over some counter i. And so you'd say from that base address go i times 4 to the next number and that would iterate you through. So that's exactly what we have in X86. Essentially the thing that kind of is a primitive of addressing memory. It's actually much easier with an example. So this is the way it's represented. It's basically you have a constant displacement from the address that you're using. So that's like a fixed offset. Then you have your base is your base reference. And then the index times the scale. So this basically says move whatever eax points do. So there must be a memory address and take eax. Subtract 20 from that, 20 in hex, 32 or something. And then add whatever is an ecx times 4. So in this case ecx would be like our iterator counter, our i loop. And we're looping over four byte values, probably integers. And the minus 20 acts as an example to see that that comes up often. But the important thing in here, the key is to map this to your understanding of a language like C. So when you look at this, you should think dereferences. You could think pointer dereferences, right? Because whatever eax is inside the eax register, that's going to be used as an address that we're going to look up and fetch those four bytes and move that into the edx register. So for instance, if eax was zero or something, right, this would probably throw a second fault because that memory address is not found. So this example is moving what? Base point or minus 8. What was that for? Base point or minus 8. Eax minus 8, where? Is it into or out of? Yeah, so it's eax minus 8 into eax, right? So and it's important to comment the contents of the memory pointed to by eax. So we're referencing memory, right? The pointer dereferencing. We're dereferencing memory. Is a much easier one? We want to move eax wherever eax points to that. So those four bytes, we want to move that into eax. How do we know we're moving four bytes and not 16 bytes or 64 bytes or 128? Or I said bytes, bits, whatever. Yeah, it's the size of eax. There's literally no place to put any extra bits, right? There's no, you can't even think about it. It doesn't make sense to talk about 64-bit series. You're literally just copying 32-bits. The destination register is 32-bits wide. What about this one? Take eax and move it there. Eax plus eax. Into the memory location pointed to by eax plus eax times 2. I think we're, okay. So this is a good one. So this is what? So this is a dollar sign. So we know that the destination has to be ebx. So we're doing the constant value x080480e4 into ebx, which means after this instruction executes, the ebx register must have that value. That's what that means. So what's the flip side now? We introduce parentheses around the address, around the constant value. That memory location, right? So that's a fixed memory location. So try to access that memory location, depending on the permissions of the segment, if it's not readable or it's not, if it's not allocated to us, this will cause a segmentation fault, right? So depending on the program, this could work, this could not work. But it's important that this is the point of dereference, right? So this one instruction is dereferencing that location. Cool. Addressing memory. It's tricky. You just got to piece it out. You'll really only see this syntax with the multiple width and displacement when you're looking at array loops. So it's fairly clear based on the code of the pattern. For you'll see the first one. You'll see in terms of local variables and arguments passed into function. So they'll be fixed offsets of the base pointer. So yeah, I'm trying to think, I don't know that I've actually seen this full one before, but if you do, you just write it out. Other instructions classes, so there are instructions to move, so move is going to move from a memory to a register or register to register. Exchange is going to, I believe, swap to registers. So push and pop. So what are push and pop? Stack. Yeah, onto the stack. So this is an important thing. So the stack abstraction is actually built into X86 from the instruction level. So as we'll see, you can only push and pop registers. So you take whatever is in that register and you'll push it onto the stack, which means a very specific thing. So that's exactly what happens there. And a pop is going to be the opposite of the instruction. We'll look at that in a second. There are lots of binary instructions. Add, subtract, multiply, divide, increment, decrement, and logical operators and or XORs, not all of your favorite parameters are here. Jumps, calls, we'll look at the difference between a call and return and a jump. Those are super important. Int, so int is going to be interrupt. Yeah. So we'll see this is how a program will make a call into the, make a system call into the kernel of certain interrupts. It's used for other things as well. IRET, I believe, is an interrupt return. And they have, yeah. So there are various jumps depending on various bits inside the e-ply register. This is super annoying, I will definitely say. And this is one of the nice things about it. Probably the best thing about a good decompiler is it will be able to understand all of these crazy control transfer jump instructions like jump, compare the difference between a J-A-E and a J-G-E. I don't know, I always have to look these things up. So that's just the way it goes. So important things about jumps, they can be direct where we say an offset. So from this instruction jump, let's say 10 instruction, 10 bytes back and start executing from there. There's also indirect jumps. So jump to wherever is in EA apps. So this actually makes the job of a decompiler disassembler much more difficult because you can essentially jump to anywhere in the program memory. So this is how you can do, you do all kinds of really cool tricks with this. And so our friend Knopp, we will come back to them in a bit. Okay, before I go on though, I don't know why I don't have it in here. There's a super important detail here. So let's see if this will continue. So what I wanted to show was, which this is insanely small. Yes, I can tell by it. All right, I'm going to draw. It's been a long time. Okay, so there's one important instruction that's going to come up a lot. That's LEA. It's going to look exactly like a move instruction. So it's going to look, so load effective address is the way to think about this. It's going to do negative eight. So I slightly lied to you in that everywhere you see parentheses is a dereference except in a load effective address. So the idea here is a load effective address is essentially compute the location that you would dereference but don't actually dereference it and copy that location, copy that address into the register. So the difference here is, so the difference between, okay, so the difference between a load effective address negative eight parentheses EVX into DAX is this, the value inside the EAX will be EVP minus eight. Whereas the difference when you have move negative eight percent EVP into DAX. This is dereference of memory. Get the memory location, EVP minus eight, copy those four bytes, those 32 bits into EAX so whatever's at that memory location. So this comes up because sometimes compilers will, I've seen them use instead of an add instruction they'll use a load effective address and so it's really confusing because it's not an address inside here. It's like 10 and they're trying to add 10 to 10 or something. Anyways, I wanted to point that out now before we get into it too much. Alright, so what are system calls used for? So let's think about it in a different way. What permissions does our program have? We're writing some applications, what program? What can we do of the two to the 32 locations, the four gates of memory? What can we access? That's three gigabytes. The user space memory. Well, okay, so let me tell you if we're here for the kernel, let's say even there it doesn't have to be there. What was that? Program address space. I can't hear you. Program memory? What was that mean? So can I access all of it? Can I start reading out? Like start at zero. Do you reference zero? Copy it, read it out. Do you reference four? Copy it, read it out. Do you reference eight? Only where we have permission for it. You need to take permission from the operating system. That rejects. Alright, so actually all things back to the L file. We have the L file specified. You can actually map, you can say I want specific segments of this space and I want them to map to my program memory at various other addresses. Let's say I have, you know, so in the way I can get that memory or trying to get more memory is by asking the operating system. One of the things like can I or can't I do as a program? So that's just a binary executing without talking to the operating system. Accessing the files. Accessing files? Can I access files? It ultimately makes a system call. Hey, you have to make a system call, right? You can't actually even talk to a file, right? Why is that? You're actually in the hardware. It's going through across the bus system to your PCI device or wherever the destination is on. The translation of that physical address to whatever. Yeah, not only that. So you have, you want to open some file. What is a file except for a name and operating system has to then figure out where those bytes are on a physical hard drive. They may not even be continuous bytes. They may be disjoint bytes. And depending on what file system you're using, EX2 or whatever, there's metadata. So if you just get a program access, raw access to the hard drive, they can fundamentally change or edit any file on the system, right? So the operating system not only mediates who can access what files but it also provides nice abstractions. So you have to figure out, oh, this is the EX2. That means that this sector on the hard disk and this sector on the same file, I need to read this and then read this, right? It just says, hey, I want to open this file, right? We have to ask the operating system to do that. What other things? Yeah, is that it? Later? Yeah, so input and output permission. So right, you know, output to standard output and standard input, right? These are all system calls. What else? Access codes. What was that? Yeah, if I want to create a new process, right? So I have my process. So maybe like when you wrote your web server, you don't want it to hang when processing a request. So you want to fork another process and have that new process made on the request where the old process will continue listening for more requests, right? We can't just magically proof create a new process. Right, can you access other processes that are running on the operating system? Can you just be running that process as a process go out and start messing with the other applications memory that are running? Because it's a different PCB. Can I explain? Each process gets a process control block which is that sort of playground that's like sandboxing. Yeah, so there's other magic that's happening, right? Your address that you have, let's say your application is running at, I don't know, address 8,000, right? That address 8,000 doesn't necessarily map into the physical memory application 8,000, right? So if you're running two applications, they're both accessing that memory address. They're actually accessing different physical memory locations, right? Virtual memory mapping and all the things that the operating system provides. But you can actually get around that. You can actually ask the operating system, hey, give access to this other application to this chunk of my memory so that you're actually in the same physical memory. So all of these things, you as an application, we didn't even talk about sockets, right? Sockets, if you want to listen on a socket, you want to do anything like that. You have to ask the operating system for permission, right? But it's an interesting process because you're an application, you have, you're executing, right? You have access to the CPU. So we need some way, so system calls, you can think of our standardized API that any application, no matter what language it's written in, can call into the kernel into the operating system so it can provide whatever service the operating system needs to provide. And so on Linux X86, it's an interrupt of, I don't know if the correct term is type 80 or I think it's interrupt with a number of 80. So the operating system has 80, I should say. I think it's, there's a large number of interrupts and that's the OS that I'm texting. Is that just called a number or a type though? That's what I thought. I know it's this specific interrupt number, but I don't know. So the system called, so the hardware will then trigger an interrupt. An interrupt handler will go through and say, depending on the interrupt number, do this, do this. If it's interrupt hex 80, then I know it's a system call. So the question is how does the application tell it what things to, what system call it wants to call. There's over 250, I think it's like 260 different system calls. You can go look for the list of all of them, the latest Linux. So the EAX register will contain the system call number. This is how it tells the operating system, which system call to actually make. And so we'll look at a brief Hello World program example. So here my dot, so I have a data segment that I'm saying. I have a HW, Hello World. I'm going to say it's a string, Hello World, space N. Then in my text segment, so this is a text segment, I'm going to say this is code. I'm, this is, so the global, global directive GLOBL is telling the linker I want to expose main as a, as a symbol that other people can call. So this is needed, I believe, when you want to use GCC to compile it because it expects the object file to have a main method. So in my main method, I can move four into EAX. So four is going to be the, well, let's look at the code. We're going to move one into EVX. We're going to move the address of hardware. So the compiler as we'll see, so this dollar sign HW will be replaced by the compiler's choice of where to put the hardware, HW. And then move 12 into EDX and then call in 80. So we didn't talk about is, in terms of system calls, is to where to put all of the rest of the arguments, right? So EAX is going to contain the address of the function we want to call. In this case, if you look up in the system call table, EAX four would be the right system call, W-R-I-T-E. So we want to write out, so what are the arguments to the right system call? Where to write, so what file descriptors standard output? What else? What are the arguments? The buffer of what we want to write out and the size of how much we want to write out. And so we've used the registers EVX, ECX and EDX for those values. So EVX is going to contain one, which is which file descriptor? Standard output. Standard output is zeros, standard output is one, those are things you should memorize. There's a few things that you should, that's one of them. ECX is going to contain the address to our buffer. In this case, hardware, so it's going to contain the address that points to the very first byte of our string. And then the length EVX is how many characters to output. So this is, should be the length of Hello World plus a new line. And then we're calling it 80, so that's going to trigger an interrupt to handle the writing for us, that file descriptor. And then we want to clean up, we want to call exit zero. So we're going to move zero into EAX because this corresponds with the system call exit. Or shoot, all right. We're not going to do that. We'll look at, we just want to get C-Double calling dimension, but this in exit 6, the C-Double calling dimension, the EAX register executes, will contain the return value of that function. So this is basically the same as return zero in our C program of our main function, which will get called the exit will then get called with this value. So we can take this, we can compile this, we can run it. It'll work, I think. Although if you use communist end directments in the slides I'm not 100% sure it worked manuals or PowerPoint manuals and then doing this, running it, executing it. Actually, you know what, free. All right. So everyone can hear witness that this actually didn't work. So even though it's only being recorded now, compile it. We can look at a.out. We can run a redelve. I think capital S gives all of the sections, like all the information that redelve has. So we can look at all of these sections. So here's how to read this. Let's say that the dot text segment is. So it's program. It basically means. So I want this text segment. I'm calling it dot text. I want it loaded at memory location 0804 82 E zero. It's that file offset of two E zero. So offset in the file. So starting it by two E zero in the file. 0804 82 E zero. And that's what my code will be. And it's really this code. It's allocatable and executable. So it's not right. You can see other segments down here that are right. Like the dot, the data segment, the VSS segment. And if I run file on it, it'll tell me it's an L 32 bit executable. It's dynamically linked. If I execute it, it should say hello world. And then I can use object dump. Dash capital D to disassemble everything. I've been through less. I'm looking for the main function that I executed because there's a lot of stuff in here. And I can see. So this is at. 0804 83 DB. It's going to move four into EX one of the EBP. EX this 804 a zero one eight. If we look, this is probably in our read only data section. This is going to be the bytes. Hello world. It's going to move into ECX, move hex C, which is 12 and DEDX call it 80. Moves 0 to DAX and return. There we go. Maybe it's first assembly program. Questions. Other super important concepts. So. What things doesn't think so does the operating system provide everything we ever want? So what are some nice functions you'd like to use when dealing with strings? Or how do you print out strings in your application? Some might normally. Do you call right one the buffer in the size? How do you write out strings? Print. Like a print out function to maybe print out in a certain format. You may use puts, which is a little bit of a little bit of a little bit of a little bit of a result. So you may use puts, which just automatically does it, is the operating system providing all those functions? No. Who's providing that? Who's coding or who's running it? A library. A library, yeah. So that's in lib c. So lib c is the main, there's a really good book. I should find the name of it, I don't remember it. There's a really good book though that is all about So it's like a Reimplementation of libc in a way that's really easy to understand so you can see all the things that libc actually provides you with So just so in your application this way you can call a standard library to let's say print out a string Which will then call Something like string length to figure out the length of the string and then ultimately call down to the operating system to figure out Actually print out so this it gives you a nice abstraction layer Above these kernel level functions and the kernel can be simple because it doesn't have to provide you with all of these crazy Functionality and this can go to your program. The question is how does this actually get called by your program? so in your program Let's say it makes a call to puts or to gets or to print out right The C standard library is fairly large How does that actually happen? So and this actually is really important in terms of vulnerabilities as we'll see But the basic idea is there's mechanisms the PLT and the global offset table so that the P sample procedure linking table and the global offset table so essentially What you'll see is that for every library function that Your application uses it calls into a specific location Actually All right All right want to another trick if you don't know how to If you don't remember what library so it includes standard I about H So it should be an incredibly easy short-function right essentially equivalent to what we had before We can see it's actually a little more complicated. We have some stuff at the preamble load effective address. We have we're zeroing out the stack pointer This stuff isn't super important. We have our other preamble here We're pushing things the important thing is we have now put so it actually even though we use print F the compiler new That we only have one argument. So it's just calling puts on our function So it's gonna call into there. So it's gonna call 80482 E zero and it's gonna pass in 80484 C zero, which is our string and if we look at this handy handy function That's not right All right, so we look at puts Right This is the code that's in our program right the code that's in our program So puts actually doesn't Like this isn't the puts code. What do we want to have happen here? We want the libc binary to be brought in and we want to actually jump to the code That's in the libc like the .so file But our program doesn't want to deal with that our program needs to jump to a fixed location So what happens here is we jump so this is a star system in direct jump So we're jumping to whatever's at memory location 804 a zero zero C is everything So And so essentially what's at that memory location is nothing at the start It's code that then loads up libc and copies the value when libc is loaded And copies the value when libc is located in our address space Where this symbol puts is located and then puts that memory address at that 80 Because if this one there was 0804 a zero C seven So the first time it's called the library's loaded added the memory and then it jumps there and then that way all the other times the You don't have to do this loading procedure and it just starts executing there and you can Okay, yeah, so that's And so this will be really important when we look at this is one of the regions you can corrupt to try to Change the program code execution Let's see is there anything other interesting oh other interesting programs So if you ever want to See You want to see what system calls your program is making the s-trace program The s-trace program will trace all the system calls your applications making So this shows you that it's calling ends up calling right one. Hello world 12 If I didn't L trace L trace hook see and shows you all the libc calls the applications making these are other ways to learn More about what we're doing What's Hello world and with s-trace you can see that actually all of that it's All right, cool, so we get back we will dive into