 All right, and I set it up to automatically record while it's streaming. Let's see how we doing. Can somebody let us know if we're actually live on the Twitch, and then we'll continue. Yep. Okay, ah, nobody's there. Cool, I'm there. That's the only thing that matters. Okay, sweet. All right, so as you all know, of course, because you're following along with the course, the assembly crash course is due Friday at midnight. We'll launch another module after that. Due date, TBD, I'm gonna see kind of how this goes. I want you to have enough time, but now we're gonna put, so you've been learning assembly. After this, you're gonna be putting that knowledge in action and actually writing programs that do things instead of just moving data around, right? We talked about, we want our assembly programs to actually do things. Before I do that, let's do what people were asking before class, we can do, let's say 20 minutes of maybe talking through one of the levels. I actually don't wanna solve it all the way, like I did last time, but how to approach some of them. Anybody have a favorite level that is in the teens that they would like to be level six? Level six, all right, register sizes. What are the sizes of registers? 16, 32, 64. There's one more. Eight. Eight, yeah, eight-bit size registers. What, before we even tackle that, we should remember these slides on registers. And let's just queue up, oh, there we go. Cool, so this, if we're talking about register sizes that's before even looking at the level or what it's asking for. We have the slides on all the partial access to all of the different registers, right, in this. Cool, let's just keep that in our back pocket. Actually, let's go there. And then we can go, Kimberly, can you read that in the back? The text, at least on the left, cool. Challenge run, let me wait. Okay, the interactions instructions which you already know about, cause we talked about then, for this level we'll be working with registers, yay. You will now set some values in memory dynamically. We will, so the challenge is gonna set values in memory. On each run, the values will change. Why does it do this? You don't think as you're reading these, why? Why does it do this? Why is it changing the values in the registers before it runs our code? Yeah, so we can't hard code the answer and say, well, the thing told me it's expecting 10 here so I'm just gonna move 10 in there rather than actually doing what the assignment says. Cool. We will tell you which registers to set beforehand and where you should put the results. In most cases, it's gonna be our friend, R-A-X. Okay, we have some background knowledge but we've actually already got all of this from the chat we've been having in the slides. So we already know, we just talked about, right, register sizes for the R-A-X register referring to the all 64 bits as R-A-X. The lower 32 bits is E-A-X. The lower 16 bits is A-X. And of that, the upper eight bits of A-X is A-H and the lower eight bits of A-X is A-L. Cool, okay, so it says, using only the following instructions, move. We only have one instruction to use, cool. Please compute the following. Set R-A-X to be RDI modulo 256. R-B-X as RDI module modulo 6-5-5-3-6. And we'll set the values so it's telling you what values we'll set. Send it some As. Error, it did not like all of my As, that's fine because that's nothing in there, cool. Okay, so what's it asking us to do? What's the modulo operation for assembly? Yeah. Remainder. Remainder, can we use it? Yeah, we can only use move. So actually, it doesn't matter. Maybe there's the exact instruction that we can specify exactly how much module to do, but we only have move, yeah. Well, I thought the idea was you use div and then div you'll end up with a quotient. Sure, can we use div? Well, in this case, no. Correct, then we can't use div. We can only use move. So I must mean there's some way to do this without using div. Move the lower bits to each other. Move the lower bits to each other, why? So one thing to do is always look at this and do it by hand, right? If we can't actually do this math on our own without writing a program to do it, it's gonna be really difficult for us to write a program that can do it. So we have RDI is hex D967 and we have the task of compute RDI modulo 256. So someone remind us what's the modulo operator mean basically, it's the remainder left after division. Yeah, exactly, cool. So, let's use our handy calculator. Let me quit out of all this stuff to reduce the possibility of things popping up. Sure, that's good enough. Okay, so we have, how do I even do mod on this calculator? It's a great question. Let's use Python. Ipython is interactive Python so I can type in Python extractions here and it will show me the output. I can do a classic hello world program. This is just a nice way to kind of interact with Python when I can do things without needing to use a calculator. There's a lot of stuff you can do on here. Okay, so in Python and in, I believe most programming languages, right? C, C++ and Java. What's the modulo operator? Percent. So I do percent then what? 256, thank you. Okay, so I need the result to be 103. So, yeah, perfect. Okay, so looking at that number, it's actually hex 67. How's that in relation to our original number? It's the close, yeah, lower one byte, right? Two hexadecimal digits are one byte. So it's the lower two bytes, why? That seems like magic. Yeah, 256 is two to the eighth power. So we can actually just looking at this lob off. The only digits we care about here is gonna be the last digits here, right now this 67. So, I guess is this universal? How do we test this, our theory out? Let's try some more mods, see what they do. 10, 97, 67, that's 103. 12, I think we can keep going, right? So this is only, the modulo here is actually like a little shortcut where it's actually just the last, it's essentially only keeping the lower eight bits here when we do the mod operation. So, if that's what we wanna do, so let's, what do I have, I'm gonna use emacs now, cool. Okay, so let's say, so what is it, RDI? So, I don't know, let's just move into RDI X10, boom, run it. I guess I should be showing you the cool things of emacs while I'm doing this, huh? Let's make the Vim people mad, okay. Okay, so we moved 10 into RDI, was this gonna give me the flag? No, why? Yeah, because A, I need to set RIX, I changed RDI, but I didn't change RIX, the, we have to remember to keep the description in mind. So, but what I'm doing this is showing that we failed in the following way, so RIX was expected to be F9, and we can actually look and verify if RIX was F9 and we kind of derived for ourselves that, hey, modulo 256 is the last eight bits, then that double checks, and then if we pass this check we should not see this again, we shouldn't see that it failed because of RIX. So, what's our goal with the code here? We want to set RIX to what? To the remainder, let's do an e-show, work like this, but there we go. Okay, where was my freaking moon? So we know this shouldn't work, so my target register is RIX, what I wanna move from RDI, I wanna move the last what bits, the lower, the lower byte, yeah, eight bits. We look at our handy table here and we say, okay, Mr. RDI, if I wanted to refer to your lower bits, that is DIL, so move DIL into RIX, rerun it. Okay, great, why did this fail? Yeah, because RIX is 64 bits, so I'm trying to move eight bits into a 64-bit register. There could be junk in RIX, so how can I clear RIX first? Move zero into RIX, move zero into RIX, now I know RIX is zero, now I'm just gonna move, which byte do I want to move DL into? AH or AL? AL, what if I moved it into AH, why is that wrong? It's the upper eight bits, and then it will look and say, that's like saying is 10 the same value as 10, as a thousand? Right, it's just the 10 moved over two decimals places? No, it's a completely different number, just because the digits are the same. They're in the wrong place, right? Exactly, so we wanna move this into the lower bits, let's try it. All right, is this good, do we get the flag? No, do we expect to get the flag? No. Also no, but did we do what we wanted to? Yeah. Why? We passed the RIX check, now to double check this, how can we absolutely be confident that the right value is in RIX? In three, we can add a debug statement, right? To show us, this will show us the values in the registers and we can confirm with our eyeballs that RIX correctly has the lower eight bits of RDI. All right, we got the thing here. So RIX is AC, let's look, we didn't change RDI, so RDI here is DDAC, so we have successfully changed the lower eight bits of RIX to AC, and we've successfully done our modulo operator. So if we're going forward, and now we need to set RBX to RSI, do we do the exact same thing? No, I see some head shaking, why not? Because it's a different modulo number, so we would do the same thing again, figure that out and go forwards, which is I think all I'm gonna do here. Cool? Oh, shoot, I said 20 minutes, but I wasn't keeping track. Okay, five minute approach on one other level. 19, that was slightly unanimous, did you guys find that? 19, oh, 19's a fun one. All right, we got Emacs, and we want a shell to look at some stuff, run the challenge, read the challenge, so it's not just running it, right? Okay, normal instructions. In this level, we'll be working with control flow manipulations. Great, what should we be thinking about with control flow manipulations? What instruction, all together now? Jump, thank you, yes, we should be thinking jumps. This involves using instructions to both indirectly and directly control the special register, RIP, the instruction pointer. We'll use instructions like jump, call, compare, to implement the requested behavior. All right, we're testing multiple times with dynamic values, okay, great. Wait, did I, what just happened there? Oh, I didn't know you could do that. Oh, okay, sorry, I'm getting familiar with this. Okay, the last set of jump tables is the, okay, what's going on? Is this just cut off normally? Seems bad. Nobody's mentioned this? I think I'm inside the key places. Yeah, okay, well, let me know, I can fix it. Shouldn't be like this. All right, maybe that's the keys right after here. Okay, the last set of jump types is the indirect jump, which is often used for switch statements in the real world. Switch statements are a special case of if statements that only use numbers to determine where the control flow will go. Here is an example, switch number. So if number is zero, so we've seen switch statements, C, C++, and Java, I'll have these. I think Python, does new version of Python have them? Switch statements? Case statements, kind of the same. Okay, so switch on a number, if it's zero, then jump to do thing zero. If it's one, jump to do thing one. If it's two, jump to do thing two. Default, jump to the default. The switch in this example is working on number, which can either be zero, one, two. In the case that number is not one of those numbers, then default happens. So it's kind of an if-else statement. We're using numbers. So it's no surprise if we can make statements based on something being an exact number. In addition, if we know the range of numbers, take for instance the existence of a jump table. A jump table is a continuous section of memory that holds addresses of places to jump. In the above example, the jump table could look like this. So at this memory address, the address of do thing one, at this plus eight, the address of do thing one, how come this is eight bytes above the first address? What is eight bytes long? Memory is very large. Memory can be terabytes large, but you're very close. Yes, the address of a memory location is max eight bytes large. Yes, so that's why, if it's an address, it has to be eight bytes. This was a double check, right? As you're reading this, you can ask yourself, does this make sense that this other thing is eight bytes after this other thing? So the next address, and what's the difference between these two addresses? But these two memory locations. Eight bytes again, how come it's not two? It looks like it should be two. It's in hex, so we gotta be careful about that, right? So it's a hexadecimal, so this looks like 10, but in hex 10 is what? 16, yeah, it's one of the easiest ones to memorize, right? Because every digit is 16, so one is 16 to the zero, and the next decimal place of a one would be 16 to the one, and the next one would be 16 to the two, and so on. So anyway, that's just the easy thing you do. Then this 18 address of do default thing, using the jump table, we can greatly reduce the amount of compares we need. Now all we need to check is if number is greater than two. If it is, then we can do jump to the last one. So this is checking the default case, otherwise jump at an indirect location, so not known at runtime, depending on what the number is, jump to the jump table address plus the number times eight. All right, so the goal is implement the following logic. If RDI is zero, then jump to this location. If it's one jump here, if it's two jump here, if it's three jump here, otherwise jump here. Okay, assume RDI will not be negative, so we don't have to worry about negative numbers, great. Only use one compare instruction, what's that preventing us from doing? Comparing everything and just doing a big if-else statement. So it's kind of actually forcing you to do what it's asking. No more than three jumps. We'll provide you with the number to switch on an RDI, so RDI is gonna be the number in this case, and we'll provide you with the jump table base address in RSI, so the great thing is we don't even need to worry about the jump table. It's going to give it to us here in RSI, so this would mean RSI would have at the start this memory location, and this should be, wow, okay, I don't know my hex that well, but I assume, it's super weird that that's not on an eight-byte boundary, but that's okay, I assume the difference between hex C and hex four is eight. That's probably right, yeah, so make sense, these are consecutive memory addresses. Again, we know that a memory address is there, so it has to be eight bytes, so we can double-check, and so these specific addresses here will change, so we should not hard-code in these addresses, we should be using this jump table, okay, cool. So we have that in here, let me, oh, AS, let's get that whole thing, let's write, why did you not copy, that's not fun, it's the problem when the keyboard shortcut is exactly the same as your, okay, so we want to, gonna grab that, I just want that command to run, waiting for my input, that's why I did that, okay, now I'm gonna actually copy this into my desktop, so that I can paste that, it should be exactly the same, okay, should this work, why not? Yeah, because it was, literally it was the wrong thing, okay, so it was the solution to the other level, so let's look back at what we need to do, switch on number, so I would say one thing I would actually do is, I would do here is, oh, I need to kill this, I usually don't do the terminal here, anyways, what I'd probably do is copy and paste this thing of what we actually need to do, which is technically down here, okay, so number, so let's keep track, number is RDI, jump, jump table is in our SI, yep, okay, and we'll need to implement this logic, so what's the first check I'm gonna wanna do? If it's greater than three, why? Yeah, so this is the default check, so the thing said I should do the default check, and the default check, so if it's assuming it's negative, the cases are it's either zero, it's one, it's two, it's three, or it's greater than three, and so if it's greater than three, we will jump to the last place in our jump table, so we would compare RDI with, what do I wanna do, three, and I'd probably do what's the, now this is where I'd go back, because I definitely don't remember all of the instructions here for jump tables, but we have this handy dandy stuff here of the slides that I'd go back here, I would say which lecture was that, oh yeah, that was the last lecture, this was on control flow, so we could go looking around, oh there we go, so we want to, we use three, so we wanna be greater than, right, we don't want it to be equal to, so jump greater than, JG, jump if greater than, where do we wanna jump to, so if I just did RSI, so I jumped if greater than RSI, where would that go to, that's the first one, which one do we actually want, one, two, three, four, five, the fifth one, can I do five times eight, valid use of register, type mismatch, because I can't use that, okay, there's a couple possibilities, maybe I can't do the multiply, so what's five times eight, 40, okay, another valid use of registers, yep, okay, that's what I figured, okay, so I shouldn't really do this, right, can I not do offsets of registers, no, so I need to compute something, great, okay, so I need a temporary register, move into RAX RSI, so I'm gonna move whatever the address was into RAX, I'm then gonna add to RAX what, 40, and then I'm gonna jump to RAX, I will also, why do I need to move my compare down, I need to write before the jump, specifically because this addition will actually change the flags, so that'll mess things up, okay, wait, why is it saying that, that we'll learn something new together, grand type mismatch, okay, jump if greater signed, ooh, that's okay, that's a bad sign, so one thing is we're using a signed comparison, jump if greater than, which is a bad sign, why? Yeah, it's guaranteed to not be negative, although I guess does that mean it won't have the highest bit set, or it won't, okay, but I can only use JG with a, that's interesting, x86, JG, okay, this is where, towards a signed comparison, okay, let's do this, so that doesn't work, so I think basically what I'm inferring from reading all this, and from here, is that to jump to an absolute location, we need a direct jump, right, jump and operand, whereas jumping to a label is an offset, so what I would do is then go back here, and say, okay, so I can't jump to here, that means I will need to jump to a label, something like that, default, and we can define that label, where should default jump to? Yeah, what I did here, so I should move that down there, because that doesn't really matter, so now I'm moving at an offset here, but this one takes a while to execute, because it's doing like 1000 executions, is that right? Yeah, okay, okay, what if it was not, so in the case that it's not greater than or equal to, or greater than three, then what do we want to have that happen, is that again? Jump equal to what? Yeah, but where do we want to jump, right? So if we look back at the example, we could see if number is, in this example, if number is greater than two, then jump to that fixed location, otherwise jump to the jump table address plus the number times eight, so otherwise, so this is the case that we're in now, we want to jump to jump table address plus number times eight, so how do we know what's the number? What was it? It's RDI, so move into Rax, let's say RDI, so now Rax has it, what do we then need to do with the number? Yeah, multiplying it by eight, so we could use, you've studied the multiply instruction, we can use the multiply instruction to multiply by eight, but there's actually a better way to do it, powers of two multiplications, yeah. Bitwise shift, if you move, because of the way binary works, if you shift the bits one bit to the left, that multiplies it by two, if you do that again, it's another times two multiplication, and is eight the power of two, how many powers of two? Three, so we can shift it left three times, and that's the exact same thing as multiplying it by three, or multiplying it by eight, and actually if you look at the decompilation of your C code, if you do something like multiply an integer by a value of a power of two, it will use this trick and use bit shifts instead of actually doing multiplication. Cool, okay, so what is it, SH, which way do we wanna shift? Left, shift left Rax three, so at this point we should have multiplied it by eight, now we want to add Rax to what? Rsi, so we wanna take, so Rsi is the jump table address, so we wanna take the jump table address, add it to the number times eight, store it in Rax, and then jump to it. Okay, let's see if we, this last one failed. Ah, error fetch unmatched, cool. Okay, invalid memory access, all right. Cool, I was just talking to something about this. Okay, another thing I'd like to do, I like to do is to read the code here, kinda helps if there's any weirdness, so compare RDI to three, if it's greater than three, so one thing we did notice as we were doing this was that the JG may not be exactly what we want, that's a jump if greater sign, so let's make sure we do an unsigned, so jump if above, so JA, so let's change that to a JA, just cause we thought, hey, maybe that is a problem. Okay, compare RDI to three, if it's above that, jump to 12, 12 here is move RSI, and RSI is the jump table into Rax. Ah, I see, hex 28 to Rax, and then jump to Rax, cool. So let's look at, so let's add an int three here, before that call, I wanna see what that value is that we're going to, okay? Did not break there, let's do here, cool. So now we know where that value is, okay, now we can look in here, so we're jumping to Rax, right? That was exactly what our code did, this is the value in Rax, 40, 30, a zero, let's go look at our code, 40, 43, a zero, that is completely wrong, okay? So double checking everything, so RDI, so in this example, RDI is one, so let's go back up here, so if RDI is one, where should we be jumping to? 40, 30, FD, which is stored at 40, 40, 65, okay? So FD, 65, but we are going to the wrong place, so we're jumping to 40, 43, a zero, so the question, of course, is why? And we actually have enough information here to debug this, so let's see. So RDI into Rax, so RDI is one, so after this, it's one, shift left three, let's see, I can use my calculator here, I can save one, I can shift left one, two, three, so it's eight, which it should be, that's one times eight, everyone agree? Is eight, yes, cool. Okay, add RSI to Rax and move into Rax, so RSI is this memory location here, 43.98, okay? Great, so now we can do that, so plus, plus, paste did not work. That's because I tried to copy it within Emacs, but I am without, okay. 40, 43, a zero, what's the value in Rax? 40, 43, a zero, so what's the problem? The RSI is what, say it again? The RSI register was given to us, so we have to assume that it's correct. So at some value that was given to us, we didn't, that would be one thing is working backwards and saying, did we change it at all? But we can see from here to here, we didn't change it and we even debugging it, we can see this RSI, we can see this Rax, if we go up, we can see here's an example table, so how do I read this example table? And on this example table, and on this example table, which value is given to us? When the thing, when it says that it'll provide you with the jump table base address in RSI, there's one, two, three, four, five, there's 10 different memory addresses here, which one is it going to give us? Which one? The first one, the very first one, right? So this is saying in memory at address 404398, the eight bytes that are at that memory address will be the memory address of where to go reference. And then eight bytes after that will be another memory address and then eight bytes after that one will be another memory address for 48, I guess, contiguous bytes. So when we say jump Rax, what are we doing here? Let's take something else. Let's walk through the code. This is our code. We all wrote this together. Notice how I'm distributing blame to you. Not just on me. Okay. So we said in this example, let's say we, let's say the number is zero. And in this case, we know that so RDI will be zero and our SI will be this address right here because this is what we're given the start address of the jump table. So we compare it is zero above three. No. So we'll go to the next line, move RDI, which is zero into Rax. What happens when you shift to zero left three bits? Still zero. Thank you. We add RSI to zero and sort into Rax. Remember RSI is this value here. 40405D. So what's that value going to be 40405D. And then we jump to 40405D. So what does the CPU start doing? So the instruction pointer here was at, what's this? 40,011. What's the instruction pointer going to be after executing this? 40405D. What's that 40405D? Yeah. It's these, these bytes will be in memory. And we actually know because you'd be looked at end in this exactly the way they'll be laid out. The right at this memory address of 5D will be 2D and then three zero and, or three zero and then four zero and then zero zero and then zero zero, which when you do a jump and it sets RIP to that address, it attempts to start executing whatever code is at that memory location. So it's treating what is an address here as x8664 code. What is this decode to as x8664 code? I have absolutely no idea. Not what we want. Yes, exactly. Not what we want. What's the problem that we're missing here? We want to actually dereference RAX and jump to the memory address that RAX that is inside of RAX, right? So if you think about it at the C level, we missed a pointer dereference. We're actually saying jump to a memory location, but we really want to do is jump to whatever is, where this memory location points to. Everyone see the difference? Yeah. Yeah. So we could see if we could do that. So I'd say, hmm, we can do it two ways. We can do, I think we can do it like this. We'll test it. I don't know if you can do this on a jump operation. You definitely could with like a move. So let's try that. Hey, there's some output. Okay. Something happened, but let's go back. There we go. Keyboard interrupt. Cause I don't want all that debug output now. That's what's slowing everything down. And now that I know that we can, what? Oh, there was an in three at the last one. That's right. Get rid of this. So do I expect that this works? Oh, you guys con me out of, don't worry. We have 25 minutes. It's all on assembly. So this is just hanging. If it gets in an internet loop. Yeah. I think it's in an internet loop. I think this, well, it just so happens that whatever. So if you notice, we change, this one, we did not change this guy. And so whatever that instruction is, whatever instructions that it's executing here might be an infinite loop where it's just continuing to loop forever. Which I was hoping it would actually show us that it's wrong. Which is a great, because I didn't want to solve this. I wanted you to solve this. So that's good. But now we can see that. So, we're going to go back to this. So, that's good. But now we can see that we definitely saw it together, right? That we were first crashing. We couldn't even pass any of the first test cases because this first condition was wrong. Now that we did this and made this change, now this can, now that condition works, but now we're failing somewhere else. So we'd probably add some other debug things in threes to figure out where exactly we are. There's only one other place, the default condition that we need to look at. Cool. Questions? Let's go learn. I am going to not do these slides. We're going to switch over to Connor's slides. Okay. So, what we're going to do is be putting, so you're learning x8664 assembly. You've learned how to move data around, move it in and out of memory. You've learned how to compute on that data. You've learned how to branch based on that data. But how do you get it to actually do stuff? It's kind of like a philosophical question, right? If data exists in a CPU register or some computation happened on a CPU register, does it actually make an impact or a sound? So we want to build, we want to use assembly to do things. And that's why we're now marrying. So what you first learned about in the first module was how does HTTP work from the client's perspective? Then what you're going to be doing in the next module and then you learned assembly, now what you're going to be doing is actually building a web server, something that understands HTTP using assembly from the other side. But to get there, we need to actually be able to talk to the real world. So another kind of representation of what we've been doing with the layout of x8664 code here, the memory locations here. So again, this is what we were talking about, right? At this memory address, there are these bytes, I would assume these are eight. So yeah, 0, 8, 10, 0, 8, 10 is another way to look. These are all hex, even without the 0x, I think we can all see that these are hex. All the registers that we care about are important, R-A-X, B-X, R-I-P. Cool. Okay, first instruction, move the bytes 1, 3, 3, 7 into R-I-X. And then after that instruction, so the R-I-P is updated to the next instruction, R-A-X is then set to this value. This is again just a catch up for what you already know because you've been studying this. Then move this long memory address into R-B-X. So now after this instruction executes, R-I-P register will be updated to the next instruction and R-B-X will now have this hex value in here. We then want to move R-A-X to... So what does this... Somebody remind us what this brackets are around R-B-X? What does this mean? As an address. So write out the eight bytes at R-A-X, store them in memory at the memory address of whatever's in R-B-X. We know exactly what R-B-X is. It is 4-0-0-0, which is here at the top. Sorry, I couldn't spot that. Which is here at the top. So we should go through it and say, okay, what are the bytes going to be there? And we also know because of the way in saying the little indianness is, the littlest byte is written first. So actually in memory here, we have 3-7-13-0-0-0-0-0. Whereas in the register, we had the value 0-0-0-0-0, and then hex 13 and then hex 37. Then we can do things like add 42 to R-A-X. So now the value inside R-A-X is 1379. And then push R-A-X. So that will change. So how do we know? How do we know what push R-A-X? So push R-A-X will change some memory. How do we know what memory? The stack pointer, RSP. RSP tells us. So the stack pointer is currently pointing here at 10. So push R-A-X will decrement the stack pointer by 8 bytes. So it'll point to 0-0-8 and then write the value there. So our value will now be at FE 0-0-8, right here where it's in red. And again, because of little indian, the bytes will be in memory 39-13-0-0-0. Cool, right? This is all the stuff you've known. You can all do this. Awesome. But now we want to actually, so yes, to my philosophical question, but if you can't actually see any of these registers or anything in memory, who cares, right? We gave you the ability, this level is kind of special of this assembly module because you can have that int3 that allows you and dumps out some memory, but normally you don't have that. You have to actually get your programs to do stuff. So this is where hardware comes in. So we have our CPU. We want our CPU and our programs to be able to do things like send data through the router, send a packet out, print this on the user's screen, which is a physical thing. Print something. And so if we wanted to do this, the hardware actually tells us no. So if you don't know this is Linus, the guy who created Linux. So this is the entire purpose of an operating system is to prevent you from talking to the hardware directly. Why? Otherwise hacking would be too easy. Yes. In some sense security, right? So if you think about there's there's different types of SSD of hard drives. Anybody have like a spinning this hard drive in a desktop machine? Nobody has that in one person. You have two people. Yeah. You can have literally like 18 terabytes on a drive. You can have a super fast two terabyte SSD drive. You can have a like an SD card that has 256 gigs on it. How to actually talk to that physical hardware drive to get it to store the bytes you want is different depending on the device and maybe different depending on the device manufacturer in crazy cases. And the idea is the operating system should be handling that for you. So it's actually a nice abstraction layer so that you can just tell the operating system store this data to this file. And the operating system goes great. What device is this file this location tied to? Is it a hard drive? Is it an SSD? Is it this thing? And then it will figure that out. So this is why you're not allowed to touch things directly. It also gets into like then if you have multiple programs let's say they all wanted to write files directly to a hard drive at once. Who wins? Right? So that's why you need the operating system to act as the mediator there. So this is exactly the purpose is we need the operating system to do stuff and to do everything for us. It is not just talking to files it's sending packets out on the network it's talking to USB devices it's literally printing out to you. So whenever you see something on the console the program had to ask the operating system print this to the user. If the program does not do that it just stays in the memory process of the program and does nothing. So this is where there's a special instruction called the sys call. So you can think of it as a system call I guess it would be you're calling into the operating system and this is the way for your X8664 program to say hey do something please operating system. Now it's just a single instruction called the sys call. So you have to have a way of telling the operating system what you want. Do you want to write to a file? Do you want to read from a file? Do you want to quit and stop executing? You actually have to ask the operating system yo I'm done like please stop me. And so we'll get into there's a specific calling convention just like we saw with when you're calling functions in a program calling into the operating system kernel is exactly another type of calling convention. In this case I believe it's RAX that's used to specify the exact one. What's 42? It's a nice number and it's connected. Okay it's not something you just use on its own I hope it was like exit or something or a no argument sys call. Anyways. It's good okay great so we have 42 into RAX we call sys call and then the operating system will read the values and registers and figure out what we actually wanted to do. So in this case we can ask the operating system hey do something and then it can go from our CPU to the actual physical hardware like in a router and make it do things like blink if that's like something that your hardware can do. And that is where we're going so to do that we need to understand how processes work and other types of programs. So we're going to go in and learn a little bit more about how the operating system manages all of this by going through and understanding different system calls. This is the ORC correct order right? I didn't check the number. So very important system call is read. So the read system call will so we'll talk about FDs or file descriptors basically this is a very special number that the operating system gives you. And then you can give it back and says hey read from this file descriptor count number of bytes into this memory location into buff. And so depending on what the file descriptor actually is if it's the hard drive it will if it's a file on a hard drive it will do all the dirty work of talking to the hard drive if it's a can you read on a network connection? Maybe? Is it only stream based though? Yes it could be a you could be reading from a file you could be reading from a network connection and so depending on the semantics of the operating system it also may hang because you may ask for some data and the operating system goes great I will give you back if there's no data available wait for it if it's your hard drive your hard drive needs to spin and figure out the exact place that everything goes other ones so there's the analog so it kind of I always have to re-examine what these functions actually mean in my mind when you're reading you're reading from something like a file or oh the other way that you read and write is from the user so standard input output like we've been talking about so you can read from standard input so a read reads from a data source and writes it to your program's memory so this is the important thing to keep in mind a read actually writes to your memory and changes your process of memory where does it change? Whatever the address is that's in the buff and we have the reverse so we may want to write to a file and write into a file reads from our memory so write says write to this file descriptor at the address located in the variable buff count number of bytes this is all you literally need to do to read and write files and the other important thing what we're talking about here is literally at the operating system later so you may have used other things to output data like print s maybe I used print s before right handy dandy very nice you do not need to specify when you use print f how many bytes you want to write because print f is in a library and the library handles everything does it and then calls write for you so it's a wrapper or an abstraction layer on top of the underlying file system concepts but we want to actually get files how do we get a file descriptor we'll see there's actually three special file descriptors that are past to your program standard input, standard output, standard error these are some things that I don't think we'll talk about right here but definitely memorize that like standard input is file descriptor zero standard output is file descriptor one and standard error is file descriptor two on all unixy systems let's say because I don't know how general that is and we can also ask the operating system to create a file for us this is the open system call opens the file so you specify a string as the path and again this is an address in memory so it's some bytes in memory that how does the operating system know when it's reached the end of a string or any C program a zero byte yeah so strings in C and in the operating system are terminated by a null byte so zero so it will try to open up that you can specify different flags of different things to do you can specify so O create will create a file if it doesn't exist mode will be what kind of mode you want to read the file, write the file and write the file and what it returns is an integer which is the file descriptor and you can then pass to read and write system calls to do stuff with these three things open any file, read it and write it and now boom, your program the x86 assembly code you're writing can now talk to the file system so to manage, oh what is a file descriptor great, glad you asked to your program it's literally nothing but a number so you can see the type here in terms of types as an integer the int that is returned by open is exactly a file descriptor what it means actually doesn't matter to you as we'll see the operating system keeps track of what files you asked and it maps oh, file descriptor 5 for your program means this file on disk whereas another process may have the exact same file descriptor number but it means something completely different the other crazy thing is you can use the shell you can when you execute a program you can change all kinds of stuff about file descriptors and there's a lot of complexity there that you can do so if we actually like dig into the file into the operating system to see what's going on under the hood in Linux every process that's executing has this data structure you don't even memorize this the point is that it's there because if you just like start learning oh okay open read write file descriptors they become these really opaque processes or a process but you should understand what's actually happening under the hood of why these things actually exist so here we have some task structure this is actually like a C struct that you can go look at all of the definitions here here we've abstracted it out so a process has a PID so the process identifier this should be unique number on the system that identifies that process PT ID is I believe the parent ID the real user ID effective user ID when we come back to access control we'll study this more but this is basically what user on the system who's running this process is root the admin running this is it a normal user and then the thing that we care about here right now is the file descriptor table so this is showing file descriptor 0 through 1024 and the operating system when your process first loads if you're running it from the terminal so this dev PTS is I believe a pseudo terminal that's what the PTS is it just means the user's like he's at a terminal and interfacing with it when you do things like use the pipe operator or we showed when you do like slash command or slash challenge slash run and then use the angle bracket to say give it some file like we did that with an assembly file what this actually does is the operating system runs it and changes let's see standard input as 0 to be that file so it actually reads from that file but for our purposes right now we just have to know that there is data that the operating system stores about which file descriptor matches to a specific to an actual underlying file and that information has to be there so the operating system can act on our behalf cool okay so open call so question how do we know how to do an open call and this gets into the calling convention x8664 syscall we'll have links to these I like Ryan Chapman has a great system call table Connor prefers a different one and that's okay but so the way to use this table is like we said rax is the register that tells the operating system which of all of these if you've never looked at this before linux has over 300 different system calls that you can call you will not need to memorize or know all 300 of these it's okay but so so the way to read this is what do I put in rax depends on which system call you want if you want to call read it needs to be 0 if you want to call write it needs to be 1 if you want to call open it should be 2 close 3 and these are actually if you look at that's kind of nice the things that we talked about are literally well we didn't talk about close but the opposite of open then these are the registers that you put in the arguments so we saw that literally so if we go back to the slides if we go back to the slides if we go back to the slides there we go so if we went to open and we go back to here open takes a character pointer path name an int flags and a mode and if we go back to the syscall table we'd say open takes a file name so that pointer to some file name that we want to to the bytes of the file name needs to be an rdi the flags are in rsi and the mode is in rdx and that's how we look this up so we say okay if we wanted to call open give it a specific file name we would first figure out where the bytes are we would get the bytes of our file name into memory we'd put that memory location into rdi flags into rsi, mode into rdi rdi sorry rdx and then 2 into rax and then call a syscall that's it so we can look in an example of this so here we have in memory we are doing a syscall we can work backwards a little bit to figure out what syscall this is so at the top here is the c code so we want to call the open syscall we want to open the file slash flag we want to pass in a flag oread only and the mode is zero how do we figure out what this oread only means we read the documentation he said answering his own question so one thing to know is that all syscalls in the man page are in section what they call section 2 I think if you just did open oh you'll get that but if you did read if you did man what ok there we go write so if I did man write this parentheses one means section one which is I believe has to do with bash in the shell itself so this write allows us to write messages to another user but that's not what I'm interested in I'm interested in the system call so I would do man to write and I would get the entire thing you'll see that this uh this function signature is exactly the same as the one we showed in the slides we are interested in open we were using open path name flags mode so is last not here now you just said just like a normal Linux distribution that I've literally never had to do that before ok so we're interested in flags the open system call opens the file specified by path name if the file name does not exist it may optionally if o create a specified in flags be created by open there we go the argument flags must include one of the following access modes o read only o write only or o read write these requests open the file in read only, write only, read write so this then the operating system is doing the permissions and if we open a file for read only and we try to write to it the operating system will tell us no you can't write to this file we can open a file for reading only where we can't for writing where we can't actually read to it so that's where we can find that in here so let's walk through this so our assembly code is first going to move this 4000 into rdi it so that's in that register it will move this value into rax it's then going to move rax and write it into rdi so which byte in this value is going to be the most is going to be written at exactly this memory location with the endian nest here 2f which is what in ASCII I know it because I do this stuff a lot but there's like one of the things that actually comes up a lot 2f it's a slash it kind of makes sense we're opening a file slash if you had to put money on it what'd you say 6x is 6c l 6 1 a 6 7 g and 0 is the null byte let's see if we are correct so if we're correct then after this instruction executes at this memory location of 54 which should be up here we should see slash f l a g and this on the right is showing us the ASCII values of these bytes move 0 into rsi 0 into rdx so it turns out o read only is 0 we'll show how to actually look that up later and then we move 2 into rax why are we moving 2 into rax because that's what the syscall table told us so open corresponds to 2 we will call syscall and then that goes into the operating system the operating system will then with our process read the bytes that we pass in for the first memory address here it will read it actually from our process which is kinda cool and it will say oh slash f l a g is what they're trying to execute and it will return 3 as the file descriptor in this example and so after that yeah that's actually seen here so after the syscall so the operating system does some complex stuff then it goes back and our program resumes control and now rax has the value of 3 which is the file descriptor that the operating system wants us to use so this is part of the syscall the syscall calling convention is it puts the return value in rax that we can then use ok cool oh yeah look at all them system calls and that's how we can get stuff done alright thank you sorry I went late