 Good morning, I guess it's still kind of morning, right? For some of you, probably very early morning, 10.30. All right, I started a little bit about, Eric pointed out that I made a bit of a mistake on the time description for part one. I'm gonna update it so the types are zero, one, and two. So I thought we don't have to change anything about the grading infrastructure. So it should be a small, easy change. Cool, eight, well, we're gonna get into stuff, so we're gonna save any assignment. Two questions will be answered by Si on Wednesday, Si on the corner. So I'm not gonna be here on Wednesday. I'm gonna record a lecture for you so we can continue with this awesome application in security topic. And then Si will hold the in-class discussion section. So bring all your questions on the assignments, bring it on Wednesday, they'll talk about it, and you can have a cool, good discussion. So yeah, you can also tell them what you want covered if there's anything specific you wanna see. I think he's totally willing to do that. All right, then let's get into it. Okay, so we are talking about applications and then we're talking specifically about lights. No, we're not talking about lights. Okay, so we were talking about compilation, right? So we're trying to understand the process by which our application actually gets executed by the CPU, right? So we've looked at, okay, so we have a language written in a high level language like C or whatever. And so we've seen that the compiler, right, will first compile that program to assembly where the assembler will compile that to a binary object and the linker's job is then to take all the binary objects, link up all the references, and create an executable file, right? So we have our linker, and we talked about the differences between static linking. So static linking is done at compile time. So any libraries that are statically linked are included in your binary, whereas dynamic linking flags are set on your executable that says, hey, at runtime, I'm gonna use the libc library and I'm gonna call the printf function and these other functions. So load those up at runtime. And so we talked about the common executable formats, ELF and PE. So now we're gonna dig into what are these actual files? So what does it mean the executable format? Is this a common executable format? What do I mean? Exactly, yeah. So when you compile using GCC a program, you just give it an output name, you'll output a dot out. So the question is, what is the structure of that file? We kind of already talked about, I believe on Friday, that it's not just x86 binary. The operating system doesn't just start executing it. There's additional metadata about that file that the operating system needs. So that's what the ELF is a file format and PE for Windows is a file format. So this is kind of an important distinction in your mind when you see an executable on Windows or on Linux, right? So that you know, hey, this isn't just binary code, right? This is actually a file format that tells the operating system how to load and execute this binary code. So ELF is a great name, I actually love it. A lot of cool puns that you can do with papers about what's next up. So it stands for the executable and linkable format which is actually kind of a more unnamed but with a good acronym, so. This is how you should name things, by the way. So it's incredibly well used. One of the important things is the file format itself is architecture independent. What does this mean, architecture independent? x86, x64 independent. Yeah, doesn't matter. Arm, it's x86, x64. The file structure is the same, right? Whatever the binary ends up being maybe, the binary code may be different for different processors, but it's architecture independent. So at a high level, there's four types of ELF files. So they can either be relocatable files which means that the linker, right? The linker hasn't done any job on this so this code needs to be, is able to be relocated to any address so that it can be linked together with other object files. So when you compile something with a .c, the .o file, that binary object is gonna be a relocatable ELF file. Is this executable? No, right? The keyword is fixed, right? So this code doesn't live anywhere yet. It doesn't have a specific address that it's gonna be executing at, but it has all the information needed for a linker to put it wherever it wants to go. So executable is kind of the one we're probably most familiar with, right? So all symbols haven't resolved except for symbols that are used by shared libraries. So the dynamic linking process. If you execute like a specifically .so file, a .so file is a shared library. So this means that it has information about what symbols it exports and how to call it at runtime and load it. So this is another type of ELF format. And finally, the fourth one is kind of weird, huh? A core dump. So what's a core dump for? What was that? Segment. Yeah, so when your program terminates or set faults it's probably when you've seen it, it creates a core dump, which is a memory, the dump of the code and memory of your program at that point. So that way you can debug it so you can see where your code is executing and what happened. Very handy tools. So read ELF is a very cool tool that parses an ELF file and tells you information about the structure of that file. What's the file command? The file called file on your Lake machine. Yeah, you can still find a type. How does it determine file type? What does that mean? It reads the first four bytes. First byte and determines the time. Yeah, so maybe not necessarily the first four bytes, right? But what file does, right, so all the data on your program, including ELF files and every other type of file, right? It's just bits on your hard drive. Right? So oftentimes programs wanna know, hey, is this a JPEG? Is this an ELF file? Is this a whatever? In Windows, you often have the suffix that tells you kind of what the program should be, but on Linux, there's no restriction like that. So oftentimes they include magic bytes at the beginning that says, okay, every JPEG file will start with a GPG. I actually don't know if that's true. I can't remember exactly what it is, but if it's a valid JPEG file, it will start with these three bytes or four bytes. And so the file command basically has this whole database of all of these files and these magic numbers. So if you can look at a file and say, okay, based on the magic number, this looks like an ELF file. Or based on the magic number, this looks like a zip file or whatever that file format is. Okay, but now we can dig into more into the ELF file format. You can see I messed up a little bit here. That's fine. So first, at a high level, we have the header and the header specifies special information about the ELF file. After that, we have what's known as the program header table, which is basically a table that's gonna tell us about other parts, where to find more information about the parts of the program. We have actual segments and sections, and then we have a section header table, which describes all the sections in our ELF file. And so the idea here is these segments and sections depend on what type of file it is. So the loader sees a collection of segments and a collection of sections, right? Depending on what it needs. So segments are multiple sections. And the idea is that these header files describe the structure. And important information is also in the header. So there is a ELF magic number. It's not exactly ELF, it's a four bytes. I think the first byte is one, not exactly on the texture. But that's fine, you can look it up. It's very easy to tell. After that, there is the addressing info, which says this specifies the size of addresses in the ELF file. Are they 32-bit addresses or 64-bit addresses? So why is this information important? It's architecturally dependent. It's architecturally dependent? Yes. But how does this help us? Why is it a second byte in the header? So what's the big difference between 32-bit and 64-bit? What about memory? So what's the size of addresses in memory? Yeah, 32-bit and 64-bit, right? So pointers are different size, right? So if I'm using addresses in my ELF file, which part of what ELF says is, hey, put this segment at this specific memory location. I need to know if it's 32-bits versus 64-bits. Because if I think I'm parsing a 32-bit address, but really it's 64, I'm gonna mess everything up. And so that's why it's the first, one of the first files here, sorry, files, one of the first header elements here so that you can know when you see an address later what the exact size is going to be. We have a byte here for the file type which says, I believe this specifies the endiveness of the file if it's little endian or big endian. It then has more details about the specific architecture type. Is it x86? Is it MIPS? Is it ARM? You've got ARM that's 32-bits, which is gonna be still, the address in info will say it's 32-bits, right? But the architecture is ARM and not x86. Do you know this name structure? The entry point? What does it mean, the entry point of a program? Yeah, the start, where, right? So this is, now we get to like the very critical component of the binary, right? This is the important metadata that tells the operating system where to start executing this binary from, right? So this is an address that specifies the entry point. And then we have an offset that says where to read the program header and an offset that says where to read the section header and the size of this header. And then an entry that says the size and number of program headers so that you can read this program header table, right? If you don't know the size, that'll be a problem. And then the size and number of entries in the section header file. So the idea is each of these segments are either going to be code or data. We don't actually know, but the segment header table specifies all the information that says, okay, segment zero should be put at address 08001 and it's executable. And then segment two should be located at 0x10 and it's data, it's read-only data. So we can see that, so I'll show you an example in a second. So the idea is each section that header specifies the type of the section and the permissions. So what does it mean permissions? Why is that? I was talking about permissions. Is it like read-only or read-only? Yeah, right? So this is actually, it's not necessary, right? But this is a way for the, trying to think of it as the program or the operating system. The operating system has to support it, right? But it's a way to say, hey, this memory is read-only memory. It should never be written to, right? When would that be useful in just a day-to-day program? Probably for the scope. You're finding the scope of the variables. Which scope specifically? That's global. Yes, exactly. So global, right? So global constant variables you can put into memory that's read-only and you can never write to it, which helps guarantee that your code is never gonna alter any of those programs or any of those memory, yeah, not programs. It is all just bytes, okay? So some of the permissions you can have are bits that say, hey, this section is part of the program. You can have, you can say that, okay, this is not actually space in the file. There's nothing in the file so you don't have to load anything in memory. But this memory should be mapped. So this memory location, this is some global memory, but it's uninitialized. We have sections for the symbol table and the dynamic symbols. So the symbol table specifies strings and it says what kind of symbols we have in our code. The dynamic symbol says this is what we're going to use or it represents how to link up that dynamic linking. We have a string table that specifies how to match IDs, identifiers with symbols. Relocation information that contains all the information that the linker needs to handle relocation. And so the flags, the flags specify the permission. So we have alloc, so the section is actually allocated in memory. Of right is the section is writable, but this section is writable. Execute that the section is executable and there should be a readable. And there's a flag that should say that something is readable. So we can look and if you look, you can actually use the readElf program to read out all these segments. So it's actually really interesting to kind of look at this. So this is just kind of a generic or typical version of an Elf file. So usually the dot text name of the section will be the program's code. And so the type, it's programming bits. So it's code or data of the program. It has allocate and execute. So alloc because of that memory should be created in the program and it's gonna be executed.data is initialized data, which is gonna be allocatable and writable. RO data for read-only data. So we have a read-only section, which is just allocatable. The BSS is uninitialized data. So what's the difference here between the read-only data and the BSS? It says no bytes and program bytes. So what does that mean? So it's BSS like blank space. Exactly, yeah. So the idea is, we don't know. So this would be global data that's uninitialized at the start, right? Or it could be, I think the heat starts in there. I think the heat may be another one, but, but yeah, right, as opposed to read-only data. So like all strings in our program, right? The compiler can take those, well, I'm sure that would probably go into data. If we have some constants, right? Those will go into read-only data. And so we know what that data should be, right? It has a compile time value, an initialized value, whereas in the BSS it doesn't have any initialized data. If we did have a read flag, would you have the read flag on for the read-only data and offer the uninitialized data? You would need it for both because your program will need to be able to read the contents there. Actually, maybe that isn't a read flag. Maybe I'm getting confused with approaches that try to add read flags. I have to look at it, but it may seem like a, well, I think we'll get to it much later, but it may seem like a pretty crazy use case, but there are some cases where you actually don't even want to, for security purposes, you don't want to read. You just want to be able to execute, let's say your code segment. You never actually read from it, so you don't want to do that. So just for comparison, the PE file format, it was introduced, and we know the limit on comm programs in Windows small 64K. So then they were like, okay, well, 64K is not big enough for programs. So this is, when you have a Windows program that's an EXE, it's not a binary, right? It's a PE file format. The idea is just like ELF, it contains all the information necessary of the operating system to load that into memory and execute that program. So PE is different because the programs in the PE format are written as if they were always loaded at address zero, right? So you don't have to specify an entry point, right? Because it's known that the entry point is going to be zero. But because of this, there's extra information so that the program can actually be loaded at different points in memory. That's just with the, and this is because of the header, so this is what I was gonna say. Let that thing go up, yes. Battery problems. All right, so this is why the header has to contain the information so that the code can be. So why do you have to do this? Why do you have to fix, if the code's written as if it's everything's executing from address zero, why do you have to have additional information to change? What was that? What about jumps? You need to access a different location. Yeah, so depending on how we do jumps, right? So some jumps can be offset based, right? And say just jump four instructions down. That wouldn't be too big of a problem. But if we wanna jump farther than that, right? We may wanna jump to a specific address. So if my jumps say, hey, jump to 10 because I know absolutely from zero that's where I'm gonna go, then now that doesn't work if we relocate from zero down to somewhere else. So we need to know to update those jumps. Also calls, so if we're gonna call a function and we use its absolute address on this zero-based system, we're gonna need to know to change that. See if this works. Don't we use stack pointer for that? The short answer. For some things we do. For local variables we use the stack. But for global variables and calling functions, we use more or less fixed addresses. Okay. So who has experienced coding x86 assembly? Cool. All right, the recipe is gonna be a crash course in x86 assembly, so it's fun. It's really like everything. Okay, so x86 has a long and complicated history which kind of manifests itself in the architecture itself. So it's kind of started off as a 16-bit, for 16-bit processor, so that 86 comes from the 8086, which I can't say I've actually programmed on. Then additional modes were added to it, so additional, we'll see what the protected mode of the real mode is in a second. Then they upgraded the 32 bits, so we talked about the size of the addresses, but why is this such a big deal moving from 16-bit to 32-bit? We can address instead of two to the 16 bytes of memory, we can address two to the 32 bytes of memory. The same reason going to 64-bit is so nice, so we get access to a wider range of memory. Further features added new features, added faster speed. There are multimedia extensions that are added, right? So x86 didn't really remain constant, it is continually being added to. Why do they add things to these? Shouldn't we just like say, bang, stick with x86, never change it so that all programs are gonna be backwards compatible. What's the third party? Third party? Third party in what sense? Different vendor devices. Different vendor devices. Close, I think you're... What was that? Explosive benefits. Wait, is that it again? Benefits, so... Ah, peripherals. Yeah, I mean, part of the instructions could be because of that, right? We wanna support different architectures. What's one thing we're always concerned about? Let's see if you use performance speed. We want it to go faster, right? Our programs, we want our programs to execute faster. So one way to execute something faster, right, is to code a better algorithm, make it be more efficient. Another way is to just shove something into the processor, right? Make the processor make it fast, right? It can do things in parallel, right? A lot better than a program can. It can, you know, complex things are happening in hardware, right? It's gonna be a lot faster. So this is why we can put like multimedia things in there, right, or we can put security encryption into the chip, right, so hopefully it'll be faster to speed up our programs. Anyways, so there's all kinds of additions. The SSE allows you to do multiple, like compute on multiple sets of data, right? So you're gonna have, so that's the other trade off, right? So if you have, you can make your program execute faster or if you can shrink down the size of your programs, you'll make the program faster, right? Even if the instructions still take the same time because of caching and everything in the system, right? Smaller code that does the same thing is probably gonna be faster. All kinds of stuff, we have hyper threading and now we get into the point where we actually have kind of hit the limit on how fast we can make the actual cores, right? How many people will have like 3.5 gigahertz CPUs? Fortunately, not any more, right? I kind of miss those, but I don't know, maybe they're actually faster now, I'm not an expert. The problem is we hit a point where our chips are generating so much heat, right? That we can't just make them faster and make them do more things per second, but we can take advantage and do multiple things at once, right? So then it's okay with this continually decreasing transistor size, let's try to throw more cores on the die, so I have independent processors on one CPU. So what is Moore's law of state? I don't think it's the size of the CPU. So I'm gonna, 18 months, yeah I guess you can't remember the exact years, two years, ooh, sad. Yeah, so the idea is one of the, I guess a lot of you probably know, right? The key thing there is it's not, people use to think of it as the speed is doubling. Speed of CPUs are doubling every 18 months or 24 months or whatever it is. It's actually about the number of transistors you can fit into the same space, right? So it used to mean that that caused a speed increase because you could just do more things faster because the distance between the transistors was smaller. I don't know, I'm not a hardware person, right? But at a certain point you can't make things faster because you hit heat. For every time you try to go faster you generate more heat. And so what they've had to do, so this is why now they still are getting the transistors smaller and smaller, right? But they can't really make it any faster so they add more cores. Basically they've given up and saying like, okay software people you've got to code your application so that they take advantage of this parallelism, right? Not just, you used to be just buy a new chip you go from a P2 to a P4 and it feels like everything's screaming, right? Because it's executing so much faster. Now you get a new chip and it's like, oh it uses less power so my battery lasts longer. Which is good, I'm not going to complain. That's not quite the same. All right, and then we got into 64 bit architecture which AMD first released and then Intel adopted so they actually used the same architecture there. Okay, so I'm going to talk about memory addressing. So what was the memory addressing? Like server, VAM, doctor, professor, address. Page table. Page table? Kind of, actually yeah, I think that gets into it. What's that? Lookup table. Say it louder. Lookup table. A lookup table. So yeah, how do we access memory, right? It's one of the kind of fundamental parts of our program, right? Our program is due commutation over data, right? Essentially, so the commutation part's pretty much in the code, right? But then how do we get to that data, right? We want it to be in memory, we want it to be in disk, whatever, but we need to access memory. So because it's 32 bit, right? We can access memory, we can access two to the 32 bits of memory all the way from zero to, I don't know how many f's it is, ff, ff, ff, ff. All right, eight f's, right? And so the question is, so what does that mean? When I access this memory, right, what does that mean? So with a flat memory model, basically it just means our program sees memory as a single contiguous thing from zero to two to the 32 minus one, or four gigabytes of data, we use access whatever the heck we want every earth in there. What's the problem with that? Yeah, right? So processes interfering with other processes. It depends on if there's any process protection or memory protection, but if we think it's okay, everybody has access to raw physical memory, right? Then what happens when processes are actually executing over and on top of each other? So the idea is, it also gets a little bit more difficult. So how do you, how do we implement these things like writable or not writable or executable? Is that something like this one? But who, I can access everything, right? So in the OS we have to check basically every single memory access to understand is it within this, doesn't have these proper settings on it, right? Or maybe the hardware then has to support that. So it makes things more complicated. So the idea is, they said, okay, this is actually kind of crazy to do this. So let's what they call, we'll segment the memory. So instead of treating memory as one huge contiguous segment or one huge flat memory space, we're gonna separate the memory into different segments and then each of these segments, then we can address them separately and then we can, the operating system can use these segments to know exactly what's being accessed and to access the memory. Yeah. What if you have, maybe it's a bit machine, but you don't have four gigs of physical memory. Yeah, exactly. It'll be a problem, right? Every program would have to handle that correctly. They would have to have a handler for when they tried to go. And then you have to establish a standard of, well, does memory start at zero or does it start at FFFF and work its way up? Obviously zero down would probably make most sense, but yeah, it'd be a problem that needs to be specified somewhere. Okay, and so the other memory model is this real address mode model, which is actually how things used to be. So this is actually kept around for compatibility purposes. So your modern processors right now if you're running an x86 processor, when it boots up, it boots up into this real address mode. So that way you can actually still run DOS and older operating systems that run in real address mode. You can still run that on your modern CPUs. It was all crazy backwards compatible all the way back to the original 8086. Okay, so this is kind of how the different models come together, right? So a flat model, we just have an address space. But in a segmented model, I have different segments. So something tells me which segment I'm talking about, right? So this, let's say, register whatever tells me, okay, which segment am I trying to address? And then this register tells me the offset. And then that maps into a segment and then we get into page tables, which I don't think we're talking about too much, but that may be mapped then to a different part of actual physical memory. So the program actually has a different view of memory than the physical memory that the kernel sees. So what's the benefit of this segmented model versus the flat model? As you said, giving permissions to sections about read rate takes much less space. Which one? Mm-hmm. What about memory allocation? Yeah, if I can maybe allocate at the segment level instead of giving the program a whole four gigs, right? And so I may, in this way, I can actually maybe control or have a handle on as the operating system how much memory a program's using. What about, how much memory, so how much memory can I access here in this flat model? Two to 32, right? What about here in the segmented model? It could divide it with the offset on this. Yeah, so if the offset is 32 bits, right? In the segment selector, who knows, we got 32 of those, right? You can actually allow the program to access more memory than just two to the 32. Which is actually originally how this came about is with 62 bit, or 60, cool, it doesn't make sense. 16 bit applications. So in 16 bit applications with a linear model, you can only access two to the 16. But if you wanted to use more, they introduced the segmented model, so your program could access two to the 16 times however many segment selectors. So you could actually access more segments and more memory. What's the downside of this segmented model? Hard to manage what all the segments and from whose perspective? Determine what the offsets are. Like have an OS figure out which segment that you're accessing. Yeah, so maybe as the compiler writer, or if you're writing by hand, right? The developer has to keep track of both these things, like segments and the offsets. You have to keep track of which segments you're using. It also can make it harder to analyze this code because you're dynamically calculating offsets and selectors. You're also doing extra processing here, right? For every memory access, somebody has to do this addition of segments plus offset to get the actual address from memory that you want, right? So that's gonna add some kind of overhead, right? So, hey, just showing that you don't get anything for free, right? Everything has some kind of cost. Okay, what are registers in general on CPU? Yeah, so they're giving them just as like bits of memory, bits or places of memory that live actually on the chip and are incredibly fast to execute or to access, right? And this is, and in most assembly architectures and languages, right? You can actually only do computation on registers, right? You can't say, okay, add that memory location to that memory location, right? You have to first bring in the value from memory onto the register, then do some execution and then save it back into memory because if something stops or whatever, that value is just in your register and your CPU and it goes away. In some sense, you can think of them as kind of local variables of the processor, right? So the processor has its variables that it's executing on and this is how it's done. So on x86, so how big are the registers? It depends on like five answers, I think. Yeah? It depends on the architecture. So if it's a, if it's a big architecture Yes. Okay, so we're gonna talk about only 32 bit from here on out, pretty much. So yeah, so pretty much everything's gonna be 32 bits. So the registers are 32 bits, right? Because they have to hold addresses which are 32 bits, right? So that's how everything kind of follows from there. We'll see some backwards compatibility stuff with 16 bits and then for 64 bit, obviously the registers are gonna be 64 bit, 64 bit registers, yeah. So the registers on x86, there's four general purpose registers, which means you can kind of use them for whatever. EAX, EVX, ECX, and EDX. So a couple things to note here. We'll look at it in a sec, oh, we'll get there in a second. So the AX is the 16 bit way. So in 16 bit, there were AX registers, DX register, CX register, and DX register, which I see, that's not a D, right? So the E stands for extended. So it's extended from 16 bits to 32 bits. So that's how you know you're referencing a 32 bit register. So conventionally, right, which is not, doesn't mean it has to be this way, right? The machine will absolutely execute no matter how you use these registers. A is the, uses the accumulator, right? So an accumulator, you're adding things to, you're changing, right? EVX is usually a pointer to your data. What's a pointer here at this level? What's the address of the address? The address, yeah, exactly, right? So it's just inside EVX would be the address of whatever memory or data that you're computing on. And then ECX would be your loop counter and any idle operations can happen with the EVX register. So the EAX register, 32 bits, wide, right, contains 32 bits. So important thing to know, while you're writing x86 assembly or when you're reading x86 assembly, if you see a reference to the EAX register, it means all 32 bits of EAX. If you see a reference to the AX register, it's just the lower 16 bits of the EAX register. So things can get tricky if you move, if you're comparing AX with zero and there's something in the higher bits, right? They're ignored because it's only these 16 bits. So this is something to keep in mind while you read code. This is split into the AH, so the A high byte and the AL, which is the A low byte. And the same with all the other registers. So there's other registers like ESI. Okay, I can't remember what the SI stands for, so I'll go ahead. What was it? It's not a single instruction. So let's do the stack pointer. I don't remember, I'll have to look it up, yeah. Segment index. Yes, segment index. Yeah, that's right, it's used for the segment stuff. Good catch. All right, and so just the same way, right? You can reference the lower 16 bits with SI. We get to a pop. Oh, not the segment, close, close, right. Source, so we're doing high speed memory transfers. ESI is the source and EVI is the destination. And there's also other special purpose registers. So like we said, ESP. So SP stands for stack pointer, ESP stands for extended stack pointer, the 32 bit stack pointer. So this points to the address of the stack on our program. So we'll look, we're gonna go really into depth about how the stack works on X86 assembly. But it's important to note that this is, it once again is convention, you actually don't have to use it like this, but we'll see there's instructions that automatically implicitly use this stack pointer register. And EVP is the base pointer. So it's just also known as the frame pointer. So this points to the current functions, the currently executing functions frame, which is gonna be on the stack and we'll see how this is done. So the frame pointer is how all of the local variables and parameters of a function are accessed. So they're all different offsets for the base pointer, which allows local variables to be, to have multiple invocations on the stack at once, because each function frame, each function has its own local variables and parameters, but they're all the same offset for EVP. We'll see that too. Okay, so these are the segmentation registers. So these registers, we can use to select different segments, the CS is the code segment, DS is the data segment, the SS is the stack segment, not 100% sure exactly, I can't remember, extended segments, I can make out stuff I'm not going to. Any, the E-flags register is constantly changing based on the instructions that we execute. So things will be updated in here as we'll see based on if we can do the compare instruction and compare two registers together, then the E-flags, a specific bit will be set that says, hey, they're zero, or hey, they're less than or greater than. Instruction pointer, incredibly important. What's the instruction pointer? Yeah, it points to actually not the currently executing instruction, but the next instruction in memory. And another important thing is we can't read this or set it explicitly. So how do we modify it? How is it normally modified? Who modified it? Let's say no jump instructions, and I'm just doing some executions. So is it changing if we don't have any jump instructions? Yeah, better, otherwise we're not executing code, right? So who changes it? The EIP is essentially the program counter. I guess one of the two possibilities, it's not us. OS actually isn't executing at this point, it's just us on the CPU. The CPU is actually changing it, right? So part of the x86 architecture, what it does is it, so going back maybe to your architecture days, right? You have the instruction fetch and instruction decode cycle, right? So as part of that, it fetches the instruction, the next instruction from memory, it decodes that instruction. Part of that decoding process tells it how many bytes that instruction is, and then by the time that executes, it's gonna update EIP with EIP plus the size of that instruction, it just decoded. So this is why, so x86 is not a fixed-length architecture, it's not like a risk-style thing, like ARM. So the size of instruction can vary from one byte to, I actually don't even know, to like a lot of bytes. So, but you, the programmer, can implicitly control this, right? By calling jump instructions or call or return instructions, right? So this is a way that you have to manipulate the instruction pointer. I'll give this a second. The little trick, which may come in handy later, is that you can actually read EIP by executing a call instruction. So a call instruction, as we'll see, the semantics are jumped to this address and pushed onto the stack what would have been the next instruction to be executed after this call instruction. So after this, on the stack, is what that next value would have been, so you can implicitly read it there. And there's a whole bunch of other stacks for floating points and all of the n of x and the SMM loads, there's all kinds of crazy registers in there. But the main ones we'll use is the normal kind of EAX registers. Questions on registers? All right, so let's look at the E flags. So this is that register that I said, right? It's 32 bits, we know that. It has the current program status. And so each of these bits means something. Yeah, so the zero flag will say if the thing, when you do a compare, that's zero and then branch or jump if zeros instructions will check the value of this bit. If it's one, it will jump, if it's zero, it will not jump. All of these kinds of stuff, tons of stuff. Flags for overflows, if you've done an addition and the value overflow to 32 bit number, that flag will be set, all kinds of cool stuff. So why is it important to talk about that? Oh, the question? Yeah. The zero bit, it says it always said values to produce anyway, why are they there? These values? Future proofing probably, just in case. Like there's only so many control bits that you actually need for your programs, right? And so I would assume this is just a way for them to reserve things in case they need it in the future. I think about a hundred percent. I mean, it's definitely why it could, also it could be that they're using a different context. Maybe in the OS they get more bits because they're operating at a lower privilege level. That could be, but yeah, I mean, the direct bits here don't really matter. It matters to know that they're there and this is how they're used. All right. So data sizes, so what do we mean by data sizes? Why do we need to talk about data sizes? Is that obvious? Data sizes, yeah. So we like to think about things in terms of architecture, right? Awards, bytes, half-words, double-words. We can talk about those for pretty much any architecture, but how many physical bytes and bits each of those things are can actually vary between architectures, right? So XA6 we define, you know, byte is eight bits, a word is two bytes, a double word is 32 bits, right? And so you have quad words and double quad words and it gets me crazy. Last thing I'm gonna end on right now. So you must be aware of the Indians. Intel uses little endian ordering. So what does that mean? What does the endian miss? Are you asleep? Yeah, so it goes back to the data sizes, right? Like for a word or for a double word, where is the most significant byte stored and where is the least significant byte stored? So for instance, if we wanted to store the double word, O3, O2, O1, 0, 0, and we wanted to store it at address 0, 0, F6, 7B4, 0, right? What is at 0, 0, F6, 7B4, 0? Is it 0, 0 or is it 0, 3? It's actually 0, 0. So we'll do 0, 0 and at 41 is gonna be the byte O1 and at 42 is gonna be the byte O2 and at 43 is gonna be byte O3. If it was big Indian, these would be flipped around. Which actually makes it crazy when you try to look at these things and we'll see exactly why when we get closer to some exploitation things. But this can, if you're not careful and you're not paying attention, this will come back to bite you because you'll try something and be like, this doesn't make sense. Why does it work? I mean, you look at it and you see all your bites are swapped from how they should be. It's an endian miss problem. How are integers signed integer stored? What does 2's complement notation mean? It's like another 2, you're looking good today. That joke. What's complicated? Yeah, so we flip all the bits and we add 1. Right, this is how they're stored. So negative 1 is stored as all f's. Negative 2 is this. What is this number in 2's complement? You should have a calculator. Figure out how to use like program mode on your calculator on your laptop. Comes in really handy. All right, and that'll do it for today.