 Sign-in 1 was due yesterday, don't have any stats yet, so I need to wait until everyone's submitted because they can submit today for 80%, tomorrow for 60% is the day after the recording. So we'll hold off any burning questions on the assignments until then and then I can tell you as much as I can tell you. Sign-in 2 will be on Wednesday. We're going to keep it a topic. You guys are here to work in the program. This is what you're going to have again. Do you think it'll be like an equivalent amount of time? This is a longer shorter, what do you call it? Hopefully you've gotten better, right? So hopefully it'll take less time, but then maybe it's more worth it still. I don't know. Any other questions? Yes? Hey! No. What is the second assignment you're going to do? You'll have two weeks for the second assignment as well. Keep the pace going. There'll be another assignment after that. We'll have, I think, three weeks on that one and three weeks on the last one. So four assignments in total? Yes. Four homework assignments. Why is it not? I'm curious. It could be three really difficult ones or four medium-sized ones, right? This one's not going to be difficult. This was the warm-ups. All right. So let's get here and eat. So remember what we're talking about now. We've done the networks. Now we want to focus on applications and binaries specifically. So we really need to study how our binary is made. Folks, if you want to talk, go outside. So we're talking about how do we actually get binaries. So we're talking about two different ways. Interpretation and compilation. And we're still in the reviewing phase. So what's the bitch to be reviewing? So when we're compiling and the main thing we want to think about is C code. We have C code. We first have a fruit processor that runs on our C code that operates on all the pound-defined and the pound-includes, expands all the macros out, and then turn that code into architecture-specific assembly code. Right? So here, what are some of the architectures that we've talked about? MIPS. I'll throw out MIPS since you guys like to look at them. MIPS. XA6. ARM. ARM. AMD64. AMD64. Spark. And all kinds of other stuff. So you guys can hear this. So you can actually, one of the interesting things that I didn't learn, or I didn't really learn throughout the start of the program, is mainly you think like GCC just takes in a C file and outputs a binary that you can run an execute. But you can actually, with various flags in GCC, you can have it give you the different stages of the output. So this is actually going to be very handy when we look at shell code, is GCC-CAPITAL-S will output the assembly code of what GCC will eventually output. So this is kind of a cool debugging step that you can run around the C program. You do this on your back door web server to see what's the assembly code that actually, it's not great. The other important thing for this class, the N32 flag, is going to be all important because we are going to be exclusively focusing on XA6. So we're going to ignore 64-bit and AMD64. But if you're on a 64-bit system, you can also compile a 32-bit application using this flag. So the high-level goal of a compiler is to turn the source code into an executable file, into some kind of executable binary. So the assembler, and so the high-level goal of a compiler, so we saw that GCC will output assembly, and there's another step, the assembler, which on GNU legs is the ASC command, which will take in a .s file or some assembly language and we'll turn that into an ASC executable. So this is what is really important to think about. Remember, we're trying to dispel all the myths and get rid of magic out of the different layers, right? So what is the actual output of these things? It's not just the binary code itself, and why not? Yeah, so maybe, so you have 32-bit and 64, right? So you may need some metadata to try to tell the operating system what kind of file is this, what kind of code is this. What other kind of metadata might we want? A location where the program can write into memory. The operation will probably lock it down on this one specific range of RAM. Yeah, so we may want to specify metadata about where should the program expect memory to be, right? Where can it write to memory? Where can it read from memory, right? Additional types of things, oh yeah, go. What permissions following that, too? At what level? Like, do you change files? Which files is allowed to change? Depending on the operating system, maybe, in Linux that's outside of the executable itself. But there are some permission-type things. What about what code to start executing that? What instruction should be the very first instruction on this program? Yeah, so that, but where, right? You have a whole chunk of code. What do you want to be the absolute first thing that gets executed, right? These are all metadata that helps the operating system load and execute the code. And also we'll see relocation information. So there's, we'll see some binaries are written so that they can be shifted around and executed in different types, not different types, in different locations in the address. So every time it's not executing, they're always at the same place. We'll study this more when we look at SLR and those kinds of techniques. Symbols, so debugging information, right? So there may be symbol information, like which, what assembly instruction corresponds to the start of the main function in the C program? Or the variable foo, where does that correspond to? So that information can be in there. And that's actually when you compile your program with GCC-G, it adds a bunch of metadata so that it can do this mapping back between. So now that we've assembled our program, can we run it? 50-50 shots. Why? Let's go to the problem and ask you why. If it matches the architecture of the machine, if it matches the architecture of the machine, it shouldn't just be able to run. Yeah, so the other alternative would be what about libraries, right? Do you call into libc, any libc functions? So that actually, the process of translating from assembly to a binary object is what the assembler does, but that's still not enough to actually execute because you need to link up with other libraries that you're going to use. So this is why you have to specify .so files on the command line you're trying to compile everything. Or if you've done this with C code, you can individually compile a .c file if you use the .c flag, which will output just a binary. And then if you want to compile the entire thing into an actual executable, you specify all the object files and I will link everything together. So the linker basically says, okay, the assembler basically kind of has a placeholder that says, okay, you're calling function foo that's somewhere else. Great, I'll just put in a placeholder here. And now when I actually know where function foo is, I can link that up so I can make the call be to that actual memory address. So the linker is the very last stage that resolves all these references of either libraries or other object files that we're trying to use. So that'll be the linker file there. The other important thing, this is going to come up really important later, there's two different types of linking. So one is static linking, which is essentially your library is included in your executable. So if you statically link the libc library, you will generate about a 20 to 30 meg file even for a simple hello world because the entire code of libc is included in your binary. So why is that good? Why do that? Yes, it'll still run on a computer that doesn't have this library. Libc, everything has libc, but maybe it's a special library. What else? In what way? So maybe we're using a special version that maybe we've added some security features. It could be faster if we do some kind of wholesale optimization, we could be able to maybe optimize that down. One of the main reasons is portability. That's one of the huge reasons. So that I can ship this binary to any system and now it doesn't matter what libraries they've installed or even worse what versions of which libraries they've installed. Rather than using whatever crap library they have, if I include the latest, greatest version, now I'm good. What's that in there? Huge file size. What else? Huge. Yeah, so this is the flip side of the security coin is what if there's a security vulnerability in that library that I've just included in this execute one given you. Right? If it's a shared library, you can just update the library on that system and now every program that uses that library will get updated. In this case, I have to recompile your application using the new updated library to get those security benefits. Any other pros there? So the other type is dynamic linking which essentially when the process starts, every time you first try to call a LIMPSE function, the system, the dynamic linker will then look for where is LIMPSE on this system will load it into your process space and then do all the linking already. So we'll see exactly how that works on this level. But these are the two main differences between these two cases here. So the most common executable file formats are L and PE for Windows. So the important thing about executable file formats, remember we have to think we are looking at a program that just exists, it's a file, right? It just exists as bytes on disk. Yet somehow the operating system needs to properly load that code into the proper locations, set up memory in the proper way and execute that program so a new process can be started. So these file formats, L and PE, must contain all the information in order to do that. And so the L file format is the main, it's the most commonly used binary object format. It's architecture independent which is nice, you can use it on MIPS, x86, x64. There's four file types, either it is real capable, which means that the linker has to be involved before it can actually be used. It can be executable, which means we are ready to go, we can execute this binary right out from the back. Shared means it's an elf-shared library, so it's not executable at all, it is a library that can be used by other files. And finally we have core dumps. So these are any of the program crash on them and then have like a core.sum number. Yeah, so that is actually a specific L file format. What's really cool is you can use GED to debug your program right at the memory, like what the memory was right at that state. So you can see, so that core dump has the entire layout of memory of your program at that specific point that your program crashed. Very handy tools. So actually the most used tool that I use when doing any type of security setup, especially for like capture of live hacking competitions, is the file command. What does the file command do? It shows you the type of the file. Yeah, it shows you the type of the file. How does it work? What was that? Chica file hater. Second? Chica file hater. Yes, all it does is a very stupid program. All it does is look at the first couple bytes of the file and it has this huge database of, if it starts with these three characters, that means it's a JPEG file. If it starts with these other characters, that means it's an EMP file. If it starts with these characters, that means it's an L file. So file command is really good to run any kind of thing that you don't understand what it is. And it won't tell you exactly what it is, right? It'll tell you what it thinks that file is based on the magic numbers of the file format. So super handy. I run this on anything unknown that I don't know what it is. The other program that's really good is the Read ELF program. So I would look at this, look at the documentation. This allows you to query a lot of information about an ELF executable file or if it's executable, it won't be any of these formats. So it can tell you what exactly are the symbols, where is the code going to be loaded, what are the permissions on all the different regions of memory, all really important things about your binary. That especially will be helpful when we get into exploitation. So the ELF file format, first step header, like most other things, we have a program. So at high level, we separate up the program into a series of segments. So then we have our special ELF header, which has a little detail showing in a second. Then we have a header table which says which of all these sections where they are in the binary and what are the offsets and any other important information. So the header has like a magic number, an addressing info, which says exactly how the addresses work, which will, oh no, sorry, that's 64 bit versus 32 bit. Another one is the file type, which shows you what specifically, what type you're using, the architecture type, what architecture you're using, and so on and so forth. The other important thing is the entry point. So this is what address when the program is loaded in memory should we start executing from. So this is an incredibly important one. The program header position, where the program header table starts, the section header position, the size, the size of the number. Is there any information you need in order to parse the rest of this file format? I thought you said it was architecture. It is. It tells you which architecture it is. So it's defined as whatever architecture type. So there's two different things in here. The addressing info tells you if it's 32 or 64 size addresses, which you need to know to read the rest of the addresses. The file type tells you, I believe, which one of the four types it is, and then the architecture type tells you which architecture it is. It's Spark or X86 or whatever. But that's just convention defined in the header standard, right? So some number would map to one of those. So then each section defines what the type of that section is and the permissions on this section. So we've got to think about this is part of the binary file, right? This is some section of the binary file. And so there are big steps telling you is this data or is this code, right? What is this section? If some sections actually have no part in the file. So let's say we want to pre-allocate a gig of memory in our program space, right? Because we know we're going to use at least a gig of memory. We don't want to actually have just a gig of zeroes in our program and say load this into a certain memory address, right? So we can actually say, hey, at this memory location in the program, allocate a gig of memory, but that doesn't actually exist in the file. We have symbol tables for static linking and dynamic linking. We have string tables which shows us the strings, relocation tables which helps when we're moving the file around. And so these are the important bits of permissions on memory. So yes, that's right. So allocate, if we're going to actually allocate what this section is in memory, write if we can write to this memory location. Why do we want to have sections of memory that the program cannot write to? Defensive programming measure. Do you ever use anything in your programming languages where you can't change the value? Yeah, constants. Either constants or a const structure, right? So there's a lot of cases where you want maybe read-only memory, and this is a way to actually enforce that. So you've got to think what's going to happen. You type in execute that program. You say dot slash whatever, normal web server. What the operating system is going to do is going to look at this and put each segment where the program says, hey, this segment goes to this memory address. So we'll put it there and it will give it these permissions. So it'll give it write executable. So there's an executable bit if the section can be executed. And write execute. I think there's also a read. Maybe it's all readable. So this way you can have different sections. And have more fine-grained permissions rather than the entire memory can be read and written to by anyone. So this allows the program to have a little bit more control. So this is some of these typical sections when you look at these things. So the dot text segment section is the program's code. So this is typically where your code goes. And it is allocable, or alloc, which means that it came from the file. So the file got put there. And it's executable. It makes sense, right? We want to actually execute our code. If this executable bit was not set, we would get a segmentation fault whenever we tried to execute our program. The dot data segment is for initialized data. So this is allocated from the code and it's writable.ro-down. What do you think the RO stands for? Re-only. Re-only. Re-only. Re-only. That's still... So it comes from the program, but it doesn't have a writable flag set. So this is important. We need to count write to this memory. The operating system itself will stop you from writing to this memory. The dot BSS. So this is data that does not exist in the program itself. Sorry. This is data that does not exist in the file, in the BSS. The init and the fitting are created in postcode that's executed. So this does some setup and cleanup stuff which we won't really go into here. So the pq file format is the exd file format on Windows. So this is the standard exd file format. The way, there's a lot of differences from the elf file format. There are programs there soon, but they're going to be loaded into address zero. So address zero is going to be the first instruction that's all executed. They... However, all programs are not loaded in memory address zero. They're moved around and so the OS knows to fix this up so the relocation will fix that up but it will change all the addresses. That's all we're going to say about that one. We won't put that too much. Now we need to dig into assembly so we need a primer on x86 instructions so that we can understand and look at actual binary code so that we can actually exploit binary code. So x86 family starts with the 8086 which was an old school CPU. It had 16-bit registers and the idea was this game was so popular that new CPUs rather than defining a new instruction set architecture said we'll just use whatever the x the 8086 used. So what's the benefit there? Standardization, backwards compatible how? New system uses old architecture. So what happens when you buy a new CPU? So I have an 8086 I buy a new CPU that's three times faster. You have to what? You have to rewrite all your software. Not only do I have to rewrite all my software, that's the software I have but what if it's not my software? What if somebody else gave me this binary? Now I paid money for this system and now I can't execute it on this new hardware. So this means it needs to be recompiled by whoever created it and then they have to give me a new binary specifically for this system. So this is part of, it's kind of like a historical weirdness but this became the de facto standard of CPUs and so every single, so most CPUs support x 86. So, eventually, so 16 bits is not a lot. What's the main limitation of having the limits here? Yeah, one of the big ones, even though it's possible to do it, is memory size allocation. So it's difficult to address more than two to the 16 bits that have that memory size. So, eventually we have 32 bit registers and it's been a lot of actual extensions to x86. So there's all kinds of various speed improvements. They're actually Intel's creating and other companies are creating new instructions every year to do things faster. So they're doing crypto stuff natively in hardware, all kinds of cool stuff. Hyperthreading, multi-core, 64 bit architecture. Anyways, this is a gap. What do these numbers actually stand for? Does it have any macro? It's stood for the name of the processor. I got started from the 8086 and then I don't know where that name came from. Obviously, the 8088 is too better. So I think from there it's probably marketing, although I think these are motor roll engines. Yeah, I think so. So yeah, I don't know other than marketing, but there's going to be a P3 or P4 or like the...actually they don't even have names now, right? They're like the Sandy Lake versus whatever Intel chip is, right? So there aren't any names. Okay, so in x86, we wanted to address memory. Namely, the way we're going to think about it is the flat memory model which basically gives us 0 to 2 to 32 minus 1 addresses that we can possibly address in our system. There is so there is another way of segmented memory where you specify the segment first and then the offset within that segment and that actually allows you to access more memory than 2 to 32. So what's the core problem with a 32-minute system here? Yeah, so we can only access up to 4 gigs of memory. So how does your memory actually work? So let's take a step and think about... we have our program, right? We have our executable on disk, so it's just a file at this point. We call which system call to start executing that program. At this point, it's already been loaded. It's already an executable file. It's done all the linking and everything. It's ready to go. Almost exact. No. It's actually exact VE. So this is a system call. So what is a system call specifically? Or generally, what's a system call? Yeah. So it's a way for user space programs, right? Programs are not executing in the kernel. It's a way for a user space program to call some functionality of the kernel. So this is why I said you do libc functions, which is just a library that you can use. And you can also use any kind of system call. So the socket command is a system call. So the socket calls into the kernel to say, hey, I want to listen to a socket. So that's all of what those libraries do. So the idea is you want some functionality of the operating system. So when we want to execute a new process, we actually call this exact VE. We'll look at the parameters later. System call, which basically tells the operating system, here's the executable I want. You pass it a file, essentially the path to a file. So what does it have to do? It has to load the program into memory. It may have to do that. We can ignore that part for now. So it's going to read the Elf header file format. It's going to place each copy each part from the file into the appropriate thing that the Elf header specifically says. So it's going to copy this thing to memory address this, this thing to memory address this, this thing to memory address this. But how do you do that? How does the operating system execute multiple programs at once? It does use a fork, but how does it actually, how come your, all of your excellent backdoor servers didn't crash my, what, my operating system? Even though you did, technically, but Time shares. Time shares? So they're not executing all at the same time, but what else? Contact switch is how you switch between. But why do you need a contact switch? What was that saying? Yes. But how can I don't have to worry about you guys if you wrote a program that scanned all of its memory, are you going to find the other programs that are running on my system? Why not? What was that? Yes. This is a key point about operating systems, right? One of the key abstractions they provide is every process that's executing thinks it has free reign over all the memory, right? So it can access everything from zero to however much. So this is a long way to get back to this four gigabyte limitation means that one program alone can't use more than four gigabytes of memory. But you could have, you could, yes, theoretically, you didn't have a like a shim for a while where you could have a 32-bit system have more than four gigs of memory. The operating system would be able to handle it and then each program itself or each process technically would have up to four gigs of memory that it couldn't use. But it couldn't use more than that. Yes. So you're saying you have two processes? So it comes to the virtual memory management. So you should look up virtual memory management of the operating system. So the idea is you physically have some memory, right? But there's a level of indirection between when your program says, hey, give me address zero. You don't actually get physical address zero. There's a level of indirection where the operating system has set up hardware that says, hey, actually for this process, address zero is at address 20. Yes. Maybe Oh, two programs talking to each other? So you can if you want two different processes, right, to be able to talk to each other you can use I believe it's M-MAP to You can map memory across both processes. So you have a shared memory region where you write from one process and you can read from the other process. So you have to set that up programmatically when the program first runs. And that's one way that you can get communication in between two processes. What was that? You can also use sockets on the files. You can just use files. There's plenty of ways to talk to each other. You got to lock that memory every time. Yes. It's tricky. It's any kind of distributed programming systems, right? Or always tricky. Cool. Okay. Four gigs. So it all starts with registers, right? We're learning any kind of hardware system. So the registers essentially if you've never done any architecture before it's going to be, we're going to go quickly. So I think you have to study this on your own if you're not familiar with things. Right? So registers are essentially local variables of the processor. So why do I say that? Well, they're physically on the chip. And so computing adding them together and multiplying them doing whatever we want to do to these registers is incredibly fast. A and B is the only way we can actually perform any computations. So all computations must really happen on the registers. On X86 there's four general-purpose registers A, B, C, and D. Super easy. The way these look, though, that E, A, X, E, B, X, E, C, X, and E, D, X. So we'll look at exactly what the difference is here but for now the E stands for extended and the A stands for the A and the X, to be honest, I can't it was an extension to the original ones I don't know what the X stands for. So four registers E, A, X, E, B, X, E, B, X, E, B, X. Convention and I'll name these for now. So the idea is when you refer to when you're writing assembly code and you refer to E, A, X, so for instance if you're writing move E, A, X into E, B, X, when you write E, A, X you are referring to the entire 32 bits that are inside the E, A, X register. So remember we're talking about a 32-bit architecture. So this means the memories there are two of 32 memory addresses and the registers are 32 bits as well. If you only want to refer to the lower half so how many bits is that? 16. Is that A, X? So this is the difference. So this is why it can be confusing when you're reading code when it says oh copy E, B, X into E, A, X and then copy A, X into E, C, X right even though it's different work I mean it looks like different registers, it's actually the same register so you have to keep this mapping in your head. Also you can even split that E, A, X register and access the upper part so the upper 8 bits so the upper byte is the A, H register with A and so the high of the A register and A, L accesses the lower 8 bits so the registers and you can do this for every single one of these 4 registers so you can see you will often see variations of these when looking at my error code. So there's other registers the ESI register ESI and EDI are used also to other registers and I have 6 registers just like the other ones we can have SI all the other types of operations and then there are different special purpose registers that are also really important so we have ESP which is the stack pointer so why is the stack important what is the stack? What is the stack in the context that we need in here? Yes, so stack is incredibly important for calling functions right so we need some way in memory to essentially keep track of the call stack of what functions we've called as the program executes and when functions return so you can think of it as scratch memory we're going to get into it more the really important thing is the top of the stack is the memory address that's in ESP so there are some instructions that will you can actually arbitrarily reference ESP you can say move ESP into EIX there are also another set of instructions that will implicitly modify ESP so it's incredibly important to understand this and I'm going to turn on those as well EVP which is the frame pointer which we'll look at more in that later so the frame pointer point so the stack pointer points to the current the current location of the stack of the top most of the stack and the frame pointer is a pointer that remains constant throughout a function's execution that points and references local variables and parameters from the current ESP so we'll look at this more in depth we have different segments so this is how we access different segments so you mentioned that there are ways to access different segments the eFlags register has a bunch of different flags that change depending on when the instructions execute so this is where things like when test whether this I think there's an XA6 instruction test whether this register is equal to like EAX equals EVX and if it is then it will set one of the bits in the eFlags register you can always never access this directly another key one EIP so the instruction pointer this is the next line of code that will be executed so when the program executes right start out whatever the entry point is that's what the operating system is going to do load everything in memory set the EIP register the very last thing it does is set the EIP register to be this the entry point in the program and then the processor will start executing from there yes there is no program counter so it's the same thing different architectures different languages mints is the PC arm arm has a program counter register so this has the EIP register you also can't read or set this explicitly so how do you change this value function calls what else jumps conditional branches all kinds of stuff all these things implicitly changes the value wool gets that later there's also a whole bunch of floating point operations so when you're doing floating point operations depending on how you're doing it they may be offloaded into a different floating point co-processor for those things anyways the E-flag need all the basic reset I'm going to go into that data sizes so if you're not familiar with the different types of data sizes a word is 8 bits or it's like look at the exactly a byte is 8 bits 2 bytes is a word 2 words is a double word 2 double words is a quad word and so on and so forth to be honest I usually think in terms of bytes I don't really think in terms of the words but you do have to be careful moving things keep that in this is another incredibly important concept so what is envy in this how do you specify something in memory so that the high side starts the left hand side or the high side starts with the right hand side the high what side not the most significant bit the most significant byte so this specifies the byte order so intel uses little empty in the ordering which seems counter-intuitive so this means if you had the number so if you have in memory so at memory address 0 0 f6 7 b 4 0 if at that memory address you have 0 above that you have 1 above that you have 2 and above that you have 3 if you were to ask the processor print out whatever is the 4 bytes or the word whatever the 32 bit number that is at this 0 0 f6 7 b 40 the question is which number does it output does it output 0 3 does it output 1 0 0 or does it output 0 0 0 1 0 2 0 3 so in little indian it will output this way right it will output the most significant bit is at 4 3 so the number output will be whatever they call it as x 0 3 0 2 0 1 0 0 but if you took those same sequence of bytes the same sequence of bytes in memory in a big indian system and you tried to ask what number that was it would say 0 0 0 1 0 2 0 3 so this is why you have to you know in that c code you wrote for that server you had to write that hgoln function that is host to long number I think or host no hgoln it is being host to network long so that takes in that when you set port 80 you want to list it on port 80 let's say so the host is little indian format but the network order is big indian format so it actually has to swap them before it goes out on the network which is super annoying and confusing that the networking format is different than most of the hosts but as long as everybody knows and does those functions correctly then it's fine so this is really really important and this can be incredibly frustrating when you're trying to overwrite buffer we'll see where that gets into it but if it's been a while refresh yourself on indianness so that you'll understand when you're trying to do an exploit and you think that it's supposed to be working but actually it ends up crashing this is like the first place I always want to know about signed integers so if you put in negative 1 into the program how's that going to be represented all s right this is the other thing we're only able to look at essentially really you're only able to look at what bits are in the registers and what bits are in memory the interpretation of what those bits means depends on you right it depends on is this an unsigned number then it's a really large number if it's a signed number then it's a negative number so this is another thing that's really important to keep in mind things like characters it's all just bytes, they're all just 1s and 0s and especially when you're just looking at debugging the processor I mean not the processor itself but when you're debugging at the binary level without any source code you go back to you don't know is this an int or an unsigned int you have to kind of defer based on meaning and usage so yeah negative 1 is all s, negative 2 right the other thing is super handy which I always have out when I'm doing this kind of stuff in programming language calculator I actually get by pretty well on the calculator app on Mac there's a mode you can switch it to programming mode so you can get 64 not 64 you get hex input mode and it also shows you when you're typing in the exact bit string of whatever your input is really handy I use this a lot that's surprising enough so make sure you interact with it ok so what does x86 look like as a language so it is you're coding assembly language right so you are basically right above 1s and 0s although you're essentially almost at 1s and 0s because all the assembler does at this point is translate what you wrote to the equivalent of 1s and 0s and it's not very complicated so it should be so who has experience with the assembly language it should be fairly so your assembly language code has some directives which we'll look at in a second so similarly to the sections in the L-threader you can have a data section which defines essentially variables which will be a place in certain memory locations instructions to make it super confusing there's you'd think like for something as important as assembly for something as important as x86 and for something as low level there wouldn't be so many choices like how many ways are there to call a function in C in C 1 and how many different syntax are there to call a function how many different syntax is there to do anything to add a number 1 there's one way to do it but in x86 unfortunately there are two different syntax and it is completely opposite the order of operations so in AT&T syntax which is already using throughout which I'm using all my slides you will have your basically your operator so add, move, subtract whatever, jump so that's always on the left but then it will be source to destination so if you have eax, ebx this is move the value of eax into the ebx register Intel syntax is completely opposite so this is move from ebx into eax you have two of the exact same instructions that are written completely backwards from each other there's also more differences but I suggest you pick 1 and every I'm using any tool that touches this and I believe there's a way to change it from one to the other but you just need to know that in the settings so here's my main tip on how I always figure it out is I look for instructions that have a constant value because can the constant ever be the destination no and then that tells me what's the rest of the file so I'll see if we're moving constant 0 into eax well I know I'm in AT&T syntax because it can't possibly be the other way around if I'm moving eax into 0 I know I'm in Intel syntax because it can't possibly be doing that so we can define constants we can start constants with 0x we can define different labels and different types of data we can define bytes double words quad words for instance we can define a my variable that is a double word 32 bits and so this would be 2 32 bit values in the section of your assembly code we can define all kinds of nice strings and often these type of things are dependent on your assembler that you're using so whether you're using an assembler using an ASM or whatever you're using the syntax may vary slightly so when you say constants don't you sometimes just specify the artist's location like move a b or move 10 to 7000 so yeah so what this essentially this is a little extra nice layer that the assembler is giving us so now we can say like move bar into eax and when it assembles it it will put bar in a specific memory location and when we look at what assembly code it generated it will say if bar is at 7000 it will say dereference 7000 and take whatever's there and move it into eax so it gets rid of all that for us but this is some niceties for us when we're programming in assembly and then we can address memory and move things from registers to registers we need to be able to say access the thing that's at this memory location and copy it into a certain register so all of these have and surprisingly again for such a simple low level language it's actually quite complicated how to do these memory accesses so we have the width so what's the size of the memory that we're trying to access what's the base so what's the starting address which index in there are we getting and scale and placement so we have the starting address the offset from that base address the scales the constant multiplier from the index width width tells us are we doing bytes so are we doing bytes shorts, words, longs or quads and that would be in the instruction itself a lot let me build a good example the formula is essentially like this so starting to base go index times the scale plus some constant displacement so think about I always think of this like arrays we have an array of characters and we're going to index it well that means the base is going to be the address of our array the zero in element will be indexed zero times the scale which is going to be one byte and then if index is one it will access the next one and if we're talking about integers those will be 32 bits so the scale will be larger there the way it's written is like this so the displacement on the left base index scale and if index and scale are if index is zero it won't be there and if scale is clear it will also not be there so many times you'll see it kind of like this so what's this doing so first we'll fit L on move along so we're moving 32 bits so along is two words the word is two bytes we're moving 32 bits so that so the move here right this is the V we're copying one byte if it's an S it means we're copying also byte I don't know if this is the right answer the W we're moving words L we're moving along to Q we're moving a quad so this means start at EAS right EAS is what what is inside EAS the address of whatever we're trying to dereference or we're trying to access the memory so it will be some memory address this is important think of it like a pointer it's another thing that's important it's a pointer inside EAS is just an address and this operation is essentially dereferencing that pointer following that pointer but it's not just following it directly it's following it and it's getting that address plus ECX times 4 so if ECX is 0 we won't get the numbers back to EAS minus 20 in X and we'll copy that into EDX source destination destination source yeah so copy that into EDX that's what you said copy that what's that so more simple ones so what's this going to do so take EDP take EDP minus 8 whatever is located there at that memory address take it and move it into EAS so we're essentially think of here we're addressing fixed offsets of the base pointer of EDP so EDP as we'll see gets set up when a function starts executing and so we will go down 8 bytes access whatever integer is there copy that into EAS if it's 0 well it is the same as mole so yeah if it's 0X and it's not it'll be decimal think so so in this case this is 32 or yeah so this is 32 so minus 32 this one is just minus 8 so this is where the one we're doing is move into destination so this is move from EDP minus 8 whatever is at that memory location copy that into EAS does it work both are what into EAS or into the address which is ah this is move it into the register of EAS so let's just draw a picture so now this instruction all is happening yes so move whatever is in EAX so we just put X20 in there move whatever is inside EAX to wherever EVX points to plus the address that EAX points to plus ECX times 2 is that the value that's in so the value inside ECX yes we only be referenced when we see the parentheses so this is the important part now so in our specific syntax dollar sign before a number means a constant so this is move the constant straight move the constant value 80480 E4 into EVX this is all this is doing no memory manipulation no nothing just copy that value into the EX register register some special kind of memory or there are just locations of these the registers are on the chip itself so these are you're really stressing my upper knowledge flip-flops on the chip itself as part of the architecture so they actually do not exist in RAM at all this is why when you get to the habit of looking at your code what will happen is you'll copy a value from a local variable into a register then another value from a local variable into a register add the two and then copy it back into a third variable in memory because unless it gets back to memory it might as well not happen because it's still in the register so will it like subtract the decimal value and copy it to the so what does it mean to not have a memory address so important difference so the difference between this move and this move is what yes de-reference de-reference access whatever is at memory address 804A0E4 copy whatever is there the 4 bytes because it's an L copy that into EAX what is the percent sign the second the last one versus the last one the percent sign the percent sign needs a register so this is this specific syntax of registers but the last one but does that include the referencing table percent check on the dollar sign the dollar sign needs a constant yeah for whatever reason we don't need a dollar sign we don't need this move L on the next one I think it does impact I don't know what that is correct I think it would throw an error if you tried that how does it know what register to get do you want to register the last one the 0X804 well it's not a register it's a constant value so it says copy the content of memory 804 into EAX so the difference between the two is in this one we are copying the constant value 804A0E4 into EVX so after this instruction is executed in EVX better have this exact value whereas in the second one it says access the memory look up memory look up whatever is that 804A0E4 whatever is in there copy it into EAX yes do you say move L what if I said move B and then would it only look at the first 8 of earth it would only look at that byte whatever is at that byte location in memory yeah so it's just like when you say move L it's just extending the size so if you're interpreting move B is this is the address copy that byte into EAX if it's move L it's saying copy that in the next 4 bytes and interpret that as a little endium number well interpretation of the matter but take those 4 bytes copy them into EAX and the delivery address is always referencing the RAM when it's not like restoring cache memory there may be caches in the way but that's more advanced stuff the value after the color science is better a couple more questions is that the value itself yes that is a constant value so whatever you change that that will be the value of EAX after cool alright so we're back we'll learn more types of instructions