 All right. Hello everybody. So today we will be dispelling some magic and Talking about kernel mode, which is like the core part of the operating system So first off but a housekeeping make sure you try and log into the GitLab server You'll be greeted with the nice your account is pending approval message and you won't be able to do anything I have to run a script so that you actually get access Because anyone on campus can log into it, and I don't want them to actually use it so just Bear with me on that you guys are mostly nocturnal, and I am not so if you don't get access right away Don't worry about it. I have probably asleep so Just let me wake up, and I'll run a bit later also by show of hands How many have some experience with like git and github and using SSH keys and all that stuff? It's like half or judging by last class. That's like everyone, but no one wants to put up their hands So I'll so first lab. I'll give some instructions with that. Oh, is that a question? No. Oh, whoops Lower the screen down Sorry, didn't see that All right, that's better so Since people are comfortable doing that that actually should be good So we'll go with that. So just make sure you log in everything's gonna be released on Monday So that will be your first lab It's probably a bit on the shorter side Which students generally don't complain about because I'm assuming setup may or may not be awful, but I guess we'll find out so back to course content so Everyone's kind of had some experience with assembly in general. There's three major instruction set architectures or ISAs used today and It's just the machine code or just magic numbers that the CPU actually understands because at the end of the day Everything it's just numbers in computing and it's just how you interpret them It's just the magic computers are actually fairly dumb things So there's x86 64 bit which is aka AMD 64 You might see called there the same thing That's like desktop servers non-apple laptops things like that then there's a arch 64 or arm 64 Which is like pretty much everyone's phone tablets And if you're lucky enough to have one of the newer Apple laptops that also runs that and then there's a newer kind of instruction format Called risk 5 aka RV 64 GC That is like an open-source version of arm that you could actually implement your CPU on without having to pay arm like millions of dollars. So that's One way the market's kind of going people mostly use it for embedded and it'll probably switch over in schools at some point Some of the examples I might show are risk 5 because there's an MIT project that actually implements their own kernel using that So we'll touch on all of them a bit in this course Hopefully less x86 because it's really really really ugly. So as part of ugly This is kind of it's permission It's permission checks. So they have a concept called rings to control instruction accesses and each instruction so like each assembly of structure Belongs to a certain ring of permission where you are allowed to use it and all the rings are like complete Sub or supersets of each other. So if you're using the Kernel ring you can access everything in the rings above you so you can access all user mode instructions and But if you're in this kernel ring zero here, you can't access anything in hypervisor mode. So these are the three Modes, there's generally a hypervisor mode on x86. It's called ring negative one Which is why one of the reasons why x86 is pretty awful And we won't touch hypervisor mode to the end of the course. So don't worry about it Then the next is kernel mode. So kernel mode is ring zero and those are the CPU instructions that can actually interact directly with hardware And there's a separation that current the kernel of the operating system is the only thing allowed to touch hardware and Your normal applications aren't allowed to do anything with it And it does this by running the actual processor in a different mode that just doesn't have access to the instructions So so far you've been using user mode Which is ring three and you might question why ring one and two don't exist and the answer to that is it's x86 Don't worry about it. So which is a stupid answer, but it is how it is So user mode is like all your applications. You've been running so far in your life You can't touch hardware directly you have to go through the kernel Even though you may not have known the name of it before but you can't touch hardware directly Something does it for you on your behalf, which you might have thought is the operating system before but more technically It's the kernel and we'll see that boundary Really really up close and personal today So you can think of it as like yeah this hard boundary where there is something called user space So that's all of your normal applications that just run in user mode hence the name user space It's kind of a good name and then there's kernel space Which is the code that runs in kernel mode that can interact directly with hardware and You so far have not been allowed to use it or even probably know about it now, but now you get a chance to So there are these things called system calls to transition between user mode and kernel mode Which is what you have been using even though you might not have known it so Funnily enough you can think of them as kind of like C functions if you want and You can think of the kernel as a big library if you want to make things easier and amazingly the whole kernel only has 451 functions at the time of this lecture and that's it That's all your programs can ever do to interact with hardware Which is pretty amazing that the Linux kernel survived like how old is it now? Like 30 40 years and it only has 451 calls that do everything So we'll get a taste of those calls today. So Yeah, we'll get a taste of those calls today. So as a quick aside So there's a difference between an API and an ABI So an API just is like a high-level description of something and an ABI is like the nuts and bolts Bolts details like when you were writing assembly Exactly how to pass your arguments and everything like that is part of the ABI So like the first argument should go on the stack or the first argument should go in a register or something like that So that describing that as part of the ABI an API is much more general It's like hey, I have a function that just takes two arguments. I don't tell you where they are It's just like yeah, it takes an X and a Y and does something and gives you a number back so that abstract just a description of like inputs and outputs and Just being very general about it is an API and then a bi specifies exactly how to lay out that data and how to like Concretely communicate with it. So like that same function So a function that takes two arguments using the C calling convention So that's one of the nice things your compiler does is if you give a function with arguments It will make sure that generally they're passed on the stack and they're Specified in an order because you know stacks could grow up. They could grow down The stack pointer could point an empty element. It could not so all that is defined in the C calling convention and That's all done for you by the compiler. So you don't have to worry about it. So Till now you haven't really had to deal with an ABI But that's a important distinction. We'll see later in the course when you get to libraries But it's good to know that API high-level description ABI like nuts and bolts details on that CPU Like what exactly goes where? So doing that this is a system called ABI for a Linux on Like an arm64 machine so Unlike assembly where we were just doing a call instruction or something like that to do a system call there's a special instruction called SVC and SVC just lets the kernel take over and do something and the ABI between that is it passes arguments and registers so the machine has a bunch of registers it only uses was a 7 for For system calls so in the x8 register the number that's there specifies what function to call so they're just randomly assigned numbers and then You are allowed to have up to six arguments x0 to x5 So can anyone tell me any limitations of this as an ABI? Yeah, yeah, so your function would use all the could use all the registers So the the function could overwrite the arguments or registers do whatever with them And then they could restore them at the end if they wanted to that would be part of the ABI But for now we don't care about that so like that's caller save and colleague saved. Yeah It's just a special instruction that initiates the kernel to take over and Do something in this case it would be doing whatever that system call is based off whatever magic numbers in x8 So you can think of it as like doing a system call and it's like equivalent to a call Function in assembly, but you don't need an address or anything. It just goes straight to the kernel Yeah Yeah, well, you can't really use other registers, but so I guess another way of thinking about this is How many arguments could my function have using this ABI? six right Well, if I had one to make a function with seven arguments So yeah, one of the responses just put the stuff on the stack, but the stack Is not part of the ABI so you wouldn't know how to get the seventh argument because you have no idea where it's Where it is because it's not defined Right, so this is the complete ABI. There's nothing else So this means that I can't have a function or a system call with seven arguments Because I didn't define a seventh argument. You don't know where they are and Also, all my arguments are exactly the size of a register. So what about if I want to pass? I don't know like a 128 bit number. Well, I can't I'm assuming like 64-bit CPU so all my registers are 64-bit and if I wanted to pass this argument as a smaller value That's like, I don't know four bytes or something then I'm wasting half the register because I have to use a register And the rest of it would just be wasted space So, yeah So as the stack would exist, but it's not defined as part of the ABI so it doesn't use it at all Yeah, yeah, so it would be possible to find more argument But then you have to change the ABI and then everyone has to agree because this would be like compiled into your code and As soon and like the golden rule of kernel development is you never change the ABI So as soon as you set this you have to follow it until the end of time, which is like why Windows and stuff like that has a lot of weird old edge cases. It's because as soon as they implement something They can never undo it because someone that pays them will have their software break and they will get very very angry So this is like the full ABI you can assume it saves the rest of registers and doesn't touch them But this would be the full ABI for Linux so if we could Pass more arguments if we wanted to like one of them could be a pointer and then we could say hey That points to something that has a lot more space But in general this is how you do system calling our faces. There are other ways around it So we can represent the system calls just like regular C functions except We know that the arguments are actually going to be passed in registers And it's not going to use like a C assembly call. It's going to translate it to something like we saw before so for example if we have a Right system call that takes a so it right writes bytes to file descriptor So it's API. It's high-level description is it has three arguments first one is FD Which is just a file descriptor to write bytes to which is just a number and then an address of a contiguous sequence of bytes so a buffer and then a count of how many bytes to write from the sequence and then they all have types so the type of File descriptor is int FD. So it's just a normal int Then const void star is just a fancy way of C saying it's a pointer to something I don't care what it is. Don't worry about it. It's just a pointer and Then size T is just an unsigned integer that is as big as the architecture So it's an unsigned 64-bit number and then S size is a signed version of that. So it's a signed 64-bit number. So it could be negative and Then there's this exit group system call that takes one argument that int status And it's just the exit code for running the stats and as soon as you call this Whatever program is running is now exited. It's dead. Everything associated with it is now gone And so the exit code you'll see that the types an int But it says the exit code is only valid for 0 to 255 And that's because it's defined that it just chops off the rest of the number and truncates it and whatever the lowest byte Is is the return value. So you may have seen the return value for that or when you run some applications So that's where it comes from So I left you off with this teaser. That's 168 byte program, which is actually print hello world and Those two system calls I introduced are exactly what this uses so everything on every binary file is just some binary format and Everyone just happens to agree on it and otherwise. It's like fairly magical So for example the magic to an elf file So elf stands for executable and link linkable format elf So that's just a file format and it describes an executable program So any elf file will start with four bytes what we kind of saw before we saw 7f followed by the ASCII ELF all in capital letters and then if you dive into the file format There's some bytes after that that signifies, you know, if it's a 32-bit or 64-bit machine if it's little endian or big endian And most file formats follow the same thing. Everything just has some magical numbers on It and that's how you know what the file is and to show that I am not lying So we kind of saw it before that Before when we wrote, you know our Read first four bytes program. So when we wrote that program, we you know played around with stuff. We figured out that That our hello world program, you know, it started with elf, so it's an executable file, but funnily enough So does pretty much everything else. So even like libc, so the C library that's just an elf file as well and Pretty much every other file has follows the same magic thing But just the magic numbers happen to be different. So for example, where is it? like jpeg files they always start with those four bytes and Spoiler alert the only way your computer knows what a jpeg file is is it just reads those four bytes And if they are exactly this It knows it's a jpeg file and that's like the magic. Yeah Yeah, the question is can I just like Say it's a specific format and then format it badly and then do some fun tacky stuff And yeah, of course, that's like a whole whole category of exploits So if you're writing a program that uses jpeg files, well, you have to write it so that it Can handle invalid files and hopefully just throw them out, but reading like Reading invalid files wrongly is all sorts of hacks you can do so that's like sending was it like if you sent a Specific crafted jpeg file to like an iphone before you could get access to the entire phone So of course you can do things like that. Yeah So indian so is like how multi byte numbers are formatted So like everything is byte addressable, right? But if I have like a four byte number Well is the zero with byte like the largest part of that number or the smallest part of that number and Indian just says oh, it's the most significant bite or the least significant bite. So it's just the order But everything we have is little indian and yes, so was it Baseball and yeah PNG same thing just has a magic bite at the top and their magic number has the letters PNG in it but this is Magic you'll see everywhere. So sorry. I dispelled that but like all file formats follow this it's actually called a magic code and That's the illusion for everything so Like Mac OS will have that their magic code on a 32-bit machine is like the binary if you read it out in hex it says feed face and Their 64 bit is feed face up So everyone does that. It's a very common thing to do So if you want to read more about the file format, you can use this read ELF So it will read everything about that file format We'll skip a bit of it for now So basically it contains like a header file and none of this is really important It's more to show you that there's actually no magic here So there's like a header file that contains that information about the machine So what specific architecture should this program run on and where's the entry point of it? And the entry point is where the kernel will start executing code for that and that's it And it doesn't correspond to your main function if you compile something with C and Then it takes like Something called program headers that are required for executables and section headers required for libraries We don't need to know about either of them Just know that this like minimal example will just have one program header that says hey There's some executable code here So let's go ahead and look at that so if we just do read ELF on our Fought on our small file We can see some information about it. So it says hey, it's little indian. It has a version number Has an OS ABI so that's where that ABI comes from so it's defined there So it's saying that hey, I run on unix and I use that ABI So it knows to expect system calls to happen in a certain way And then it has a version generally you don't touch this because you never touch the ABI or you're going to break literally everything And then the type the machine it runs on so this is an arm64 CPU and then version And then here we have an entry point that you actually specify and Then later it says hey this the size of this header is 64 bytes and the size of all the program headers are 56 bytes so Those are the two things we need in our ELF file So just right there that's 120 bytes of our 168 so like 71% of this file is just completely wasted just in describing the file so other than that this is the program header and Basically all it does is say Hey load this entire file which so a8 is a hundred and sixty 168 it just says load this entire file into memory and then you get to say what Address you want that whole file loaded into memory at so here I just say load the entire file into address ten thousand why ten thousand because I felt like it You can literally say any number you want except for zero. You can't say zero But other than that you can say whatever number you want so all I say is load everything starting an address ten thousand and that's the hundred sixty eight bytes and then I Say start execution at 78 78 and hex is 120 so that just says the first byte after the header files And again, this is like way too much detail than you need to know So if I have that my header files my information about it waste some space and then the next 36 bytes are instructions and Then we have 12 bytes for the actual string. Hello world so here just adding instruction start at 78 which is 120 in decimal So that was our starting address. So it just says start executing this instruction and then this is the address of the string So it's like ten thousand nine C because nine C is a hundred and fifty six So That's all and that's all hello world is So This is the actual important part of the lecture is this S trace program so this is really really helpful when you do all the labs and Not even just the labs in general so there's this system call interface and Literally every program has to pass through this interface. So S trace will tell you all of the system calls that actually occur So we can go ahead and play with that for a little bit then So if I use S trace whoops So if I use S trace on my program I had before I See all of the system calls it makes and even some it doesn't make and so first it starts with this exec Vee So we'll get into that later in the course. That's basically someone else called this and that's how you execute a program So something else executed this hello world program for me So I can thank them and then here it has a right system call so writes to this magic number one and writes the string hello world with the new line and writes 12 bytes of it so 12 bytes is Hello, that's five space six World that gets us to 11 bytes and then a new line, which is a single byte. So that's just 12 bytes and Then we see the actual hello world and a new line which kind of looks ugly that these two outputs are getting mixed together So you'll learn this later in the course, but I can unmix them together and do something like that. So it just looks clear So here is my right call I write hello world and I write 12 bytes So it returns how many bytes actually get written to and then I call exit group with zero and then my program's done So all my hello world does is make two system calls. Yep Yeah, so the Question is does that mean this this file can't be run on Windows because it doesn't have the right ASA and yeah It won't so if you have a virtual machine, you could run it because you'd be running Linux But anything else won't run this so I even have another file here. That's the exact same thing So it's for Linux, but it's for x86 So it's for a different is a which would have a different, you know It has different machine code and it would have a different a bi So if I try to execute that it says exact format error because That's not what this machine is so that so if you download the examples thing and you're actually running it on like AMD or Normal more normal CPU. You can actually run that one, but not the other one that I just ran So that's why there's two versions of it because now they're specific and I can't transfer it I can't run that on Mac. I can't do it on anything. I have to use Linux and I have to use whatever the actual ISA is Okay, so there's just a little bit of magic here, which is I write to just this magic number one These everyone's used like printf. Has everyone heard of like standard error standard out before? Yeah, thumbs up. Okay, so those are just magic predefined numbers and on Linux They're just certain file descriptors so you just memorize them and we'll see them later but like File descriptor zero is standard input file descriptor one is standard output file descriptor two is standard error And that's it. So if I write to file descriptor one That's writing to standard out and I will see it when I execute something And to show you that I'm not lying and there's no magic going on Let's see So remember we had this hello world. So now we have this S trace utility that tells us all system calls something makes and Because we know that there's user mode and kernel mode Everything that does anything has to pass through this. So we can go ahead and S trace S trace hello world Which is our C version of it So let's go ahead clean it up a little bit So if we do that it's got a lot more crap But at the end of the day the last two lines whoops the last two lines are exactly what we had There is a right To file descriptor one of 12 bytes and then there's an exit group and then there's a whole lot of crap above So this is all the stuff C did which isn't that bad. So it used Some BRK it has like some n-map. We have some weird things preload What I don't know what that is. Well, I do but you don't then goes through And then here we can kind of recognize this that's the C standard library So your program if you compile it and see one of the things it does is it loads the C standard library Which has to go through the operating system? Well and more specifically has to go through the kernel Which is part of the operating system so this we can kind of see here is it's doing something with libc and Open returns a file descriptor, which is just a number. So file descriptor three is the standard C library and we can even see that it reads from three and Hey, it looks familiar. It says elf So we saw before the C standard libraries just now file to stupidly enough It's the same like 7f For some reason. I don't know why the default Output format for binary is an octal. So that's why it looks like this So if you translate octal to hex, they're the same number Don't ask me why the default option is octal. I will never know and then it does a bunch of other crap and Does some more stuff some stack limit does something probably to protect you from yourself And then at the end of the day all it does is it writes Is it writes our hello world after a bunch of work? We didn't really need to do So just to really show you that hey, you know C is not that bad and this is what Anything does no matter what language you run Well, I have hello world in Python. So hello world in Python. Let's clean it up Well, we'll kind of do the same thing it has something before but at the end of the day It's exit group at the end and then there is the right hello world But instead it does. Oh gee Yeah, just wait for it. Uh-huh. Yep. So it does a lot of stuff All right, so just for fun someone's I think I heard it last time JS so JavaScript well exit group at the very end and then before that oh God, I don't even see right Did I miss it? Is it there? that that Doesn't look like hello world Wait, does the JavaScript one not even work? No, it works See more evidence why you should never write anything in JavaScript Wow, that is the ugliest thing I've ever seen Okay, so if we can do S trace we can say just give me certain function calls So I can say S trace just tell me about all the rights. So Java writes Fought to file descriptor five. It writes a star. What and then it writes to file descriptor 17 So it opens 17 at least like 17 other files for for hello world it writes Just zero or one and a bunch of zeros eight, huh eight Then it does it well. What the hell is JavaScript doing? So JavaScript does a whole bunch of crap, but somewhere in all of itself. So it does there's that right to hello world And it actually writes hello world, but For the love of God don't use JavaScript Okay so if you wanted to like really know about You know take those Instructions that like 36 bytes and turn it into something you can actually read like assembly you might read You could like copy and paste those 36 bytes into a disassembler and you'd get something like this to see that Hey, it actually match it like it actually matches the ABI so there Remember x8 was like the register that has a system call number So all this means is if I put just 64 in it and then later do an SVC instruction That means I want to write something so the first argument would be one So I want to write to file descriptor one and then the second argument on arm I have to do it in two steps, but just make basically makes the address point to the string and Then I say that the count in the third register is equal to 12 That's how many bytes there are and then this would actually do the system call So whenever you start executing the next instruction The kernel would have run and actually printed hello world out for you And then the next one is I'm setting up a call to exit group so exit group just wants 5e or the number 94 in that System called number register x8 and then I just give it the value zero and make another system call So that is the actual assembly of my hello world program So it just makes two system calls write an exit group which everything else That's like the minimum thing you have to do to do hello world So the remaining 12 bytes well, of course, we're the string itself so ASCII encoded they just look like that including a new lane and If you ever have to do like low-level ASCII stuff where you're reading characters Fun little thing you could do is bit 5 tells you whether or not it's a capital or a lowercase letter So and they differed by 32 because that's what bit 5 is So if bit 5 is a zero, it's a capital and if it's one, it's a lowercase. So that's a That was an intentional design to make really slow Machines actually be able to tell you if something's a capital letter or not really really fast So that's a count for every single byte. We saw what C does So anyone tell me the difference between our string and like normal C strings Yeah Yeah, it's not null-terminated So null-terminated that's just a C thing Strings don't have to be null-terminated. In fact the kernel doesn't want null-terminated strings because that write system call Just writes bytes and you don't want to restrict what bytes the user can write So if I said hey write has to have a null-terminated string suddenly I can't write a Zero byte to any file and that'd be why it's a virtual machine So anything we do to it if it wrecks it, it's fine Especially would not let you test your software and like write kernel code on their machines Because if you're writing kernel code while you've direct access to Sorry The live stream stopped on YouTube or The discord one oh Seems to work what what died? Okay, I assume it's still alive If I think the stream might have dropped I have a recording of it anyways just in case so When you actually write kernel code, it's a bit different in that the kernel is already running so you can Actually inject code into it on demand and that's something called a module you can think of it kind of like as a library so whatever loads it would execute some Function that you tell it to whenever it gets injected into the kernel and then execute We'll see an example and we'll see an example of it later, but basically there's no main There's no nothing kernel developments a bit weird But you have full control over the hardware and everything else right in that machine because you are all powerful so there's two type two major types of kernel designs or architecture designs, so we saw before this is clear user space and kernel space So there are different architectural designs So the first one's called a monolithic kernel which basically runs most of the operating system services in kernel mode directly so that's Anything to do with virtual memory process scheduling inter process communication like file systems Everything in this course and device drivers everything will be running in kernel mode So Nothing is done in user space kernel mode just they throw everything in the kitchen sink into it and The other design is the opposite of that where you try and put as little code as possible in kernel mode Because it is all powerful The more code you have the more bugs that you could have So the larger the attack surfaces as someone gets into kernel mode they get into the whole machine So one idea is well, I'll just put a little code there as possible So it is possible in some designs to move file systems and device drivers and more like advanced types of communication between processes to user space and just have like a minimal set of things in kernel space But the things that have to be in kernel space Our virtual memory like managing the physical memory process scheduling saying what runs when and basic Inter-process communication because it has to go through something And then question if I'm running parallels VM fusion or UTM. I'm using QMU Which is basically what UTM uses so answer that is pretty much UTM But I will give setup instructions for Windows and Mac For the lab and hopefully that will work But I'm setting aside some time for lab one of like setup issues just in case because it'll be new and hopefully more modern so monolithic kernels and Micro kernels are kind of the two Main architectures, but there's always fuzzy lines depending on who you asked So like in Windows some emulation services aren't in kernel mode. They're actually in user mode At one point Windows actually had an HTTP server in kernel mode Because I don't know why but that would obviously be a bad idea because if your web server gets compromised Hey, they have access to everything which you can tell probably wasn't the greatest idea but the benefit to having everything in kernel mode is It takes time to do a system call and it takes more time than our normal function call So the more system calls you have the slower your thing is going to be So if your web server one is really really slow because it has to go from user to kernel to user to kernel Well, one performance thing you can do is to say put the whole thing in kernel space and I don't have to worry about it so But that's obviously not a good idea for security reasons and then Mac OS has device drivers in user mode, but like file system stuff is still in kernel mode So it just depends on the underlying architecture there's also like research areas to like try and push the boundaries of micro kernels more and you know Surprisingly enough they called them nano kernels and Pico kernels as like, you know never doubt the imagination of a computer scientist So they just try and push even more and more stuff into user mode instead of being in kernel mode But that's like research stuff that you might get interested in later. So there's many different types Real ones are typically messy and don't fit that exact definition So to wrap up so The kernel that's the interface between CPU mode boundaries and those are hard boundaries And what we can definitely say is that the kernel is part of the operating system and past that it gets fuzzy But the kernel there's a real nice boundary there that so it's that transition from the CPU being in user mode to kernel mode and what instructions it can access so Clear separation between what's a kernel what's not a kernel a little bit fuzzier We just talk about operating systems in general and what makes an operating system So code running in kernel mode is part of your kernel That's an easy definition and then system calls are that interface between them and every program No matter what has to use that if it's using Linux It will use that so you can s trace literally any program you ever write in any language And you can figure out exactly what it does because guess what they're all running on the same kernel and You get to see what they do so no matter what language you write in this course is definitely Definitely going to help you because hey, that's what's actually going on and then we saw some file formats Less important, but we saw how to find a simple hello world and like the minimal Minimal program we could ever write that takes away all the magic the difference between an API and an API and Most importantly that s trace command So you really want to know that s trace command and then we just saw it Different archa kernel architectures that basically just shifts how much code is in kernel mode so Just remember I'm pulling for you