 back to system software. Thank you for joining me again today. So, where are we left off? Lecture one, we argued what software was. At the very bottom, there's hardware and then our software on top of it is an operating system and then we have our applications. In the previous lecture, we finally discovered what one thing in the operating system is. It is a kernel, what's a kernel? It's the software that runs in kernel mode and kernel mode is another privilege mode on your CPU that lets you execute instructions that can actually interact directly with the hardware. So, lecture three, we can finally go on a little bit more about what actually is an operating system. So, like I just said, the kernel's part of it, but what else? So for instance, if you're an Apple user, is Mac OS, iOS, iOS, iPad OS, Watch OS and TV OS, all really different operating systems, even though they're like 90, probably 99% the same. Well, that's what we get to argue about today because the definition of an operating system is kind of wishy-washy. So, to argue about what is in an operating system, the only thing we actually care about is applications and your operating system consists of a kernel plus libraries. You've used at least the library at the bottom there, the C standard library that runs in all user space and that is a pretty important piece of software. Pretty much everything relies on it. So, up at the top in the gray boxes are possible applications and depending on the application, they might use different libraries. So, network manager, well, that's the software that actually manages your Wi-Fi card and everything. If it doesn't even have a graphical user interface, it just might use something called a system daemon. That's just another user space library that helps you kind of configure hardware and ask the kernel things, yep. All right, I'm loud enough, ish. Okay, I just turned it down, okay. So, network manager probably goes through a few things, probably goes through the system daemon, helps you configure software, while other things like, I don't know, LibreOffice, well, that probably doesn't interact with hardware that much, maybe like a printer or something like that, so it might use the system daemon and then it actually displays things to you, so it might use a display server. That's a piece of software that's just responsible for taking some bytes and then translating them to something that can actually draw on your screen and maybe it uses that, maybe it uses a GUI toolkit, so it doesn't actually have to worry about configuring buttons and all those things, so maybe it uses that as well, but that toolkit would have to use the display server here and then that would probably use the C standard library. If you're Firefox, you probably use all of these. You probably use the C standard library directly, probably use the system daemon. You'd use the GUI toolkit for some of your own tools, but when you actually render web pages, well, those don't use a toolkit, so use the display server directly, so depending on what your application is, you'd be using different libraries. So the definition of what an operating system is actually just depends on the application, because all the operating system is doing is making sure that you can actually run the application you want. So for instance, the Linux kernel is literally just that, it is a kernel, so if you have an Android phone, that runs the Linux kernel, and Debian, when you set up your virtual machine, that is a Linux distribution, and that uses a Linux kernel, but you probably wouldn't consider them the same operating system, even though they have a kernel in common, kind of similar to all the Apple, iOS, so on and so forth, they all have a kernel in common too. They might be the same operating system like Android and Debian might be the same operating system if you only care about terminal text applications. So if you write Hello World in C, if you wanted to on your phone, you could just execute it if you actually had a terminal on your phone, and that was actually a good user experience to you, so you might consider Debian and Android the same if you're running just little text applications. But generally, whenever we refer to a Linux distribution, the whole operating system is probably like a GNU slash Linux if you're that type of person, because there's all types of libraries you need if the application you care about is Firefox, well, then it needs a whole bunch of libraries that would not be present on your Android phone, so Firefox running on your Android phone would be much different than the one actually running on a Linux distribution. So GNU provides some standard things like the C compiler, the standard C library, and some common utilities, but other people create other pieces of software that are built on top of that, so you probably don't even want to call it GNU slash Linux, it's probably just a Linux distribution. So the operating system consists of a kernel and libraries required for the application, and that's the definition of an operating system. So now we get to talk about libraries. Oh, yep. What does a distribution mean? So Linux distribution means that they give you the Linux kernel and a whole bunch of libraries with it to support some applications. But the OS is still the same, so Debian and Ubuntu, would they be considered the same OS? Yeah, so Debian and Ubuntu are directly related, they'd be considered the same OS and they're all the same libraries and same applications. And in fact, Ubuntu is just Debian plus someone else, so Linux distributions get weird. It's just basically whoever builds the software. All right, so now we get to talk about a library because we probably have never created a C library before, I assume. So we get to talk about how compilation works normally. So normally, if you have a bunch of .c files, when you compile them, you get a standalone .o file, so that has the machine code and some information about what it actually compiled. And then after you have all your .o files, you link them together, so that's what the linkage step is, and they all combine together to get your one application that you can run. And note as well, those .o files, guess what, they're just elf files as well on Linux. So one way to organize your code if you want to reuse it, say with those four C files, I had three that were actually I wanted to reuse all again, so say I wanted to reuse like util, foo, and bar. Well, I could still compile them and get the .o file, but instead of just combining them all together into one executable, I can split them off and compile them all together, link them all together in a library, so like lib.a, so it's called an archive, so it's just a bunch of .o files all sandwiched together. So if I have this now lib.a, which consists of like util, foo, and bar, well, instead of linking all the .o files directly, I could just take my main, which is unique to that application, and then I could link it together with that static library, and at the end of the day, I'll still get the same executable with everything sandwiched together. So everyone with me so far, basically the same thing, we can just split them off into a library file that just consists of a bunch of .o files. All right, so that's kind of bad because you still, essentially what it's doing is copying and pasting all of those .o files into that executable as still, so if I create another executable that uses that library, it will also have a copy of all that code contained within it, so it's a little bit wasteful. So there is something called a dynamic libraries, usually in computing, if there's something called a static something, there's a dynamic something. So dynamic libraries are for reusable code, and for instance, the C standard library is a dynamic library, we've already seen it a little bit, that libc6.so, the .so is short for shared object, not really that important, but it is a dynamic library. So it's basically the same idea where it's a collection of .o files, except that it exists on your system and then any application can use it without actually copying, pasting that code into the executable. So that way, I can have two applications that use the standard C library, there's only one instance of say, the function printf, it only exists in libc.so, and that's it. While if I did it with static libraries, both application one and application two would have printf embedded in that actual executable. So because of this, sorry one sec, because of this, yeah, the operating system can just load libc once and then everything else can share it, there's only like one copy of printf. Sorry, yep. Yeah, so the question is, well to run the code, does it just jump to the instructions and that address, and the answer to that is basically yes. So that whole, when we straced that hello world program, a lot of what it was doing is opening that, finding what the address is of, I use printf I think, finding the address of printf and then calling it. So that's a tool called the dynamic linker. So that was part of it. Yep. Yeah, so the question is, can that mess with the cache at all because it could be a weird address? Well, really it's the operating system that has control over what that address is and it's all virtual memory anyways. So the operating system can pick what it is. We might see at the end of the course, your operating system is actually going to just pick a random address for security and we'll figure out that that doesn't have any cache problems at all. But we'll get to that when we get to virtual memory. Okay, so dynamic libraries, pretty much set up the exact same way. So to create a dynamic library, I just link them all together with something called shared linkage. So that .so file is still essentially going to contain all of those .o files, which will have all of the definitions of whatever function in them. And then the only difference is what happens. So when I actually create the executable, all I need is my main file and then it's just linked together to form an executable. Sometimes you'll have to specify what the library is, but that's mostly for sandy checking. The only time it actually uses the code that is in the library is at runtime, and that's what the dynamic linker will do. Yep? So it is 100% better than normal libraries, right? Right now we have some dynamic libraries. Should we always just use that? Yeah, so the question is, should we always just use dynamic libraries or shared libraries or what should we do? So give me one second and I can throw it back to you. So for some useful command line utilities, won't really be helpful for the course, but might be helpful for your software career. So you can use LDD and then executable name, and it'll tell you what shared libraries that executable uses. So you can figure out, without having the source code or anything, just exactly what an application uses. So here's that slide. So static and dynamic libraries, when do I use which? So the other option is statically linking everything, which basically, like I said before, just copy-paste all that .o files into the executable. It doesn't copy and paste all of them. It only copy and paste functions that you actually use. So it won't be horribly inefficient, but there's going to be some drawbacks. So if I statically link things, I can't reuse libraries. So for that standard C library example, if the standard C library was a static library, that means whenever you compile your application, every single application that uses printf gets its own copy of printf embedded in the executable, which, well, it'll waste some space and it'll actually prevent the operating system from doing something more efficient, like only loading it into memory once and sharing it between applications. We'll see that once we get into virtual memory. Also, the other big drawback is software changes all of the time. So if I update a static library, well, any application that was linked to a static library, it's not going to get the update because it just got copy and pasted at the time you compiled it and any updates you do to it are not going to be there. So if you want to update it, you have to recompile it from scratch. All right, well, if that is a bad thing of using a static library, what would be a bad thing if we used a dynamic library? It's pretty much the complete opposite of that. So the answer was, well, you might update a dynamic library and then suddenly every application is using it. Well, if you have a bug in that dynamic library, now you just bricked every single application on your machine and that's not good. In fact, that actually happened. I forget what year it was, but they essentially broke the standard C++ library and then suddenly you might imagine that a lot of things depend on that and then to fix it, they had to literally recompile everything because, well, it broke everything and you were lucky if you had to recompile everything. So that is, it's easy to get updates but you have to really trust the developers that provide updates and they have to test their software because if you get a bad update, you will probably start hating dynamic libraries and then switch to static and then eventually you'll run to the problem where your out-of-date software has issues and then you'll probably prefer dynamic and then usually it switches back and forth as we go along. So I forget the period on that, but generally it's whatever you get burned by last. Yep. Yeah, there's certain things you can do, certain practices you can follow but sometimes people write bad software. So the too long didn't read is if you have a dynamic library, which I'll show during this, you better be very good at writing it. Yep. Yeah, so the question is if it's being linked at runtime, does that slow it down a lot? In some cases, this can actually be faster so if it's a library that other applications are also using it, it might already be in memory and then I don't even have to load it so in some cases this might actually be faster but if it's not already, then I have to load it again, right? So there's a trade-off, usually this is actually faster and waste less space too. All right, so yeah, like I said, dynamic libraries, well, they can break applications, break executables, that's not good. So even very, very subtle changes can change the ABI. Remember what that is, the application binary interface and you may not even know that some things will change the ABI. So for example, let's consider a dynamic library that has a struct with multiple fields and if you actually read the C spec, a struct is laid out in memory a certain way and that is part of the ABI and it's defined so it is a very defined things where all the fields are laid out in memory in the same order you declare them in. So if an executable accesses fields of a struct used by a dynamic library, well, it's accessing certain bytes of memory. So if you then reorder the fields, which would not change the API, so you just reorder them, well, you didn't change the API at all because, well, struct still has the same fields but you change the ABI because now they're laid out in memory in a different order and now your executable and your library may disagree with where the information is and that will be a bug you will take probably about a month to debug. So do not do that. It's okay if a dynamic library never exposes the fields of a struct but you don't really have to remember this. All you have to know is that if, like, in your library you also include a header file that has a struct with certain fields, guess what? You are never allowed to change them ever again, otherwise you will break people's applications and it will be very bad. So an example of that. So let's say we have a library version one and our struct is called point and it has a y and then an x. So it will be laid out in memory in a specific way. So for version one the offset, if you haven't heard the word offset before it is just like the number of bytes from the beginning of memory where something is. So here in version one y would be at the very beginning of it so it would be offset zero. So it would be the first, in this case four bytes would represent y and then at offset four because an int is four bytes that's where x would begin and then later if I said well you know what defining a struct with y and then x I don't like the way that looks it bothers me and if I made version two of my library and all I did is switch the order of them well it's actually laid out in memory a bit differently. So in version two x is now a byte zero and then y is a byte four or starts a byte four. So okay with that ish so far this is more of a cautionary tale than anything. So to summarize it a little bit both of the structs get essentially if you want to think of it at a higher level both get represented as an array of ints so this array would just have two elements and in version one or sorry this is version two yeah in version no sorry version one yeah in version one y would be if you don't want to think of it as byte offsets y would be at index zero and then x would be at index one and then in version two x would be at index zero and y would be at index one so why would that be bad yep you mentioned that this is going to have the same API but different APIs doesn't the API also change because the way we define it is different no the API does not change because the API remembers like high level all it cares is there's a struct with two fields and they're both ints nope no when you program if I just say you know struct if I have a struct point and I say struct.x it doesn't care right still works still compiles okay so if I go ahead and create a library out of this let's say I call it libpoint.so and it has four functions in it so it would have like a point create function that takes an x and a y and then internally it's going to malloc some space for that struct and then return a pointer to it and then we're going to have some other functions that operate using a pointer to that struct so there's git x which will get the value of x git y that will get the value of y and then it destroys so it frees it so we're good programmers so this is my code that uses the library so say I put it in I have like a point.h if my main all it does is create a point so it creates a point this is supposed to be x so x is one y is two if I use the library it should always be consistent because the library defines all the functions so within this point create function whatever version of the library I have well if it's version one of the library it's going to think y it comes y than x and then if I use git x well it created it so it knows exactly where it is and same for y but on the other hand if I have p arrow x actually in my executable that will actually compute the offset of the struct whenever I compile it and not when I run it so this if it's found at run time so does point git x so does point git y but these offsets would be compiled into my executable and that is where I might have some difficulty here so let's try it so here I have in my point example and I already went ahead and compiled things ahead of time so I have it if I compile it with version one and link it with version one both of them are going to agree because they both think that y comes first and then x but if I go ahead and I do this oh no if suddenly you updated your library and then say these were actually coordinates and then you update your library and then the image just flips that would probably be very very very confusing and I didn't change any code whatsoever I just updated the library and that was it so why did that happen well point create in this case since I linked it with version 2 well it put x first and then it put x and y in that struct because well that's what we had in version 2 of the library point create is found at run time so it's whatever the library actually has and then here that get x would find x at the beginning and then y at the end but if I compiled it with version 1 right here at this line it would think x comes second because that's where it was in version 1 and then here it would think y came first in version 1 because that's what I compiled it with so any questions about that yeah yeah in which one and both of them being v1 when we created point create is the x and y still in the same positions for point create so the x and y the positions I'm talking about is like within the struct itself yeah within the struct itself yep so this only happens because of this since I defined the struct in the header part of it gets compiled into my executable and then part of it gets compiled into the library and then they don't agree with each other so if you have a struct it has to be the same in both of them if they don't agree then it's chaos yep so the question is basically well how do I get around this if I want to change the struct so yes you can forward declare a struct and then that application can just never use the fields of it it always has to use it by pointer so you can do that and that is the solution if you want to change your struct that way the only the only code that actually knows the definition of the struct is the library itself and it can change it because it can be consistent with itself it's not a problem so this also happens you know if I do it the other way if I compile with version 2 and then link with version 1 still flips if they're consistent it doesn't flip and then there's some fun environment variables you can use to actually change what library gets loaded at runtime so let's see this one where I had version 1 and version 1 it was consistent with each other but there is this fun environment variable you can use called LD load library and that will essentially change where the dynamic loader actually looks for the library first so I can tell it to start looking for it in version 2's directory and that way whenever it tries to load lib point it will find it in version 2 first and even though it was compiled and linked against version 1 whenever I compiled it at compile time now I can just update it change it at runtime oh no I flipped it again not good any other quick questions about that don't worry too much the only important thing from this demo is that the struct layout is part of the ABI and if you change the ABI with the dynamic library you're screwed so be insanely hard to debug you won't have to deal with this in this course but at some point you will have to deal with this so be careful and if you write a dynamic library take this really confusing lecture as warning to do it properly and if you start changing the fields of a struct be very very very very careful or just don't do it alright so this is just more if you didn't understand it just to see where each thing makes the actual x and y are I'll skip it in the interest of time so if you want to go ahead and run it yourself you can do it you should have access to the materials repository now so if you don't have access to all of the repositories oh yeah by the way lad what zeros posted so please go ahead and do that and get all of that stuff out of the way so lab zero is just set up and like a fun little questionnaire fun ish so sign in if you haven't do that I think like 80% of you or more have already done that so that's good so these are just instructions if you want to go ahead and run it and play with it by yourself like I said the too long didn't read is the fields of a struct well that's what's at the bottom if you had a stable ABI you would just hide the struct not define it in point.h and you would only have your library functions have a pointer to it alright so in terms of developing a dynamic library there's something called semantic versioning that should convey to developers what your changes are so it takes a version number and actually defines meaning to each of the numbers so usually our version numbers are like some number dot number dot number while it organizes them as follows as major dot minor dot patch and then the rules to increment these numbers are you increment the major number whenever you break the API or ABI so if I go from 0 to 2.0 that means you have to recompile it don't expect it to actually work and then you can go ahead and break the API or ABI without people getting very very angry at you and this is why we had like Python 2 and Python 3 for the longest time because they broke things and then some people stuck back on Python 2 but it worked and then some went to Python 3 thankfully pretty much everything's 3 now for the minor version you increment that whenever you add some functionality you add some functions or something like that so you can say that oh I added this function in 1.2 so if I have 1.1 my program won't work I actually require 1.2 in order for it to work and then the last one is the patch and then you just increment that whenever you make a backwards compatible bug fix so I'm changing some behavior I'm fixing some minor issue I'm not adding new functions I'm not breaking anything I'm just incrementing something so you should always be able to safely if you have version 1.1.1 you should be safely able to upgrade to 1.1.2 yep so the question is when do those yeah when do those indexes get updated so it's whenever you recompile the library yep so inside the library it would have the same thing where it uses p arrow x and then if I update and recompile it it would get whatever the structure is at that time yep alright so those are the rules that you should follow if you are have a dynamic library and then you don't go crazy generally this doesn't really work for user applications and then if anyone's seen the version of Chrome we're on version 156 or something insane so doesn't quite work all the time gets big numbers if you like breaking things all the time but it is what it is so sometime dynamic libraries allow you to debug things easier too so I can change what library gets loaded with like ld library path and there's also this ld preload to force a library to load before something else so for instance this should be a pretty readable program right so it mallocks size of an int so I have a pointer to an int and then I print the actual pointer value like print whatever the address is and then free it so anyone disagree with that program or have major issues with it let me switch if I go ahead and run it I should just get an address printed out right nothing insane should happen cool right I just get an address not really that exciting but if I really wanted to and this is kind of like how valgrin works if anyone's used valgrin is I could write a library and then force my library to load before the standard C library and I can write my own mallock and free that actually log what's happening so I went ahead and I did that so I wrote a library called liballockwrapper and I have my own version of mallock and free there where I log what's actually happening and then I call the real mallock and free to actually do the work because I don't want to write my own memory allocator so if I run this how many times have I mallocked this program one right everyone agree I mallocked Jesus how much did I scare you guys alright I mallocked once right how many bytes did I mallock four right size of an int is a four how many times did I free once so I should free whatever the address is I print right that would make sense alright so if I log it one mallock one free you guys lied to me so yeah there's one mallock that's four that address ends in what a zero something looks same address as x and then I free it but the hell is that I thought you guys told me I only mallocked once any guesses as to what that is a leak ish yep yeah printf called mallock makes sense right they had to write it in C too so it needs some memory to create this whole string here so it turns out it just mallocks like a kilobyte and that's the memory it uses but it doesn't free it jerks right so the weird so looks like there's a memory leak here right so have we all used valgrind before ish yeah so if I run valgrind with it so if you haven't used it valgrind is just a tool that should tell you that if there's memory leaks or what have you so if I run it it didn't so it said there's two allocations two freeze so valgrind lies to you and the lie is well documented so here is what actually happened and then if you actually read the documentation valgrind it actually says this the gnu the gnu c library which is used by all programs may allocate memory for its own uses usually it doesn't bother to free it that memory because when the program ends there'd be no point since the kernel reclaimed all of the processes resources when it exits anyways it would just slow things down to call free therefore we just assume it's freed even though it doesn't really free it so that is their excuse this is not an excuse for you unless you can argue it as eloquently as they can basically as long as you the memory may be used then you shouldn't free it because if you freed it you might get a use after free so it's used for print f it doesn't know when the last time you're going to call print f is so it doesn't actually know when to free it and it should last as long as your process lasts that's valid if you can argue that that memory should last as long as the process lasts then you don't have to free it that's our new rule yeah sure yeah yeah the memory may be used right up determination it wouldn't make sense to free it because it might be used so you don't want to free it yep yep that that bit of memory can be reused so if you try and print print f again it'll just use that space to throw in the next string to print oh so all print arrives at every time you use another print time? yeah yeah that one kilobyte is just kind of temporary storage for print f and it'll reuse it over and over again yep we do not know what threads are yet we will get the threads later but yeah you can imagine oops some bad things can happen if we have multiple things going on not for a bit yet right now things are nice independent so one process uses print f it doesn't affect the other one don't have to worry about it alright so some other things there's other tools you can use to detect memory leaks so there's something called address sanitizer and this is how you would use it with meson which is what we're going to be using for the labs so these are just instructions for you if you want to go ahead and detect your own memory leaks and if our continuous integration works this will be run for you but hopefully yeah hopefully that works for you and that will be a new thing but that's still racing to be done before lab one alright so last little bit system calls you don't really do in C they're really rare mostly you're using functions from the C standard libraries or their actual C functions most system calls have like a corresponding C call but in some cases the C wrappers for these function calls do a bit more so they set some like global variable called error no so you can actually see what went wrong if the system call failed for whatever reason if you do raw system calls it's kind of annoying to get the error out of them because the format changes depending on the call so they clean it up a little bit it might also buffer reads and writes to reduce the number of system calls like for instance we know printf will eventually call the write system call with file descriptor one but if you use printf have you ever been in that situation where you call printf you're sure your program execute that line but you don't see a damn thing has that ever happened to anyone so if that has happened to you it is because it just buffers and didn't actually do the system call yet it's waiting most of the time it's waiting for a new line or something like that in order to actually do the right system call so so why it does that is system calls are slow so if you can reduce the number of system calls that's much better sometimes it simplifies some interfaces so it'll combine two system calls into one to actually do something more complicated sometimes it might add some brand new features by itself so for instance in C there's exit we saw like an exit group system call or exit which should just immediately terminate the process so the C version of it doesn't quite do that so system call exit or for now exit group like the actual system call will terminate the program know ifs ands or buts about it but the C exit has some other features where we can like register functions to actually run whenever the program starts exiting so how you would do that is let's go just look at an example so to do that I can call this at exit function and just pass it a function and then whenever we return from main it will actually execute that function before this process dies so it does a little bit more work so if I go ahead and run that I'll see do main and then when I return it prints do fini and then so if this was an actual system call so another thing to note here is that return zero is exactly the same as just calling exit so if I wanted to the same thing would happen if I just did this compile so if I just did this and called exit itself they would do this exact same thing it would actually run that function whenever it exits so any questions about that new feature so if you were insane and you like registered stuff to run at exit and then you decided to put an actual system call exit there none of those functions would actually run anymore because again system call exit process is done so yep sorry yeah so these functions they don't get to read the status at all no they have to take no arguments and have no inputs no outputs nothing no yeah if they want to read something like it have to be a global variable or something like that so it has access to it yep yeah so if you return whatever your value return from main so say I return X returning from main is the same as calling exit X or whatever another way to think of it is that whenever your program actually runs something is calling your main function and then taking whatever gets returned from it and then doing something like that so it actually looks like that returning from main just immediately calls sees exit which will run all the at exit stuff like that it's not like a real system call exit so you can kind of guess sees implementation of exit it will first go through all the functions that you registered run them and then eventually when they're done it would do system call exit so a more fun question so this is mostly to illustrate that hey just because a system call you can call it in C it's probably a C wrapper and you should probably look at the documentation for it to figure out what it actually does alright yep can I call main at exit sure I mean what's that going to do oops X is undeclared change that back whoops so what's that going to do any guesses hopefully this isn't terribly surprising yeah it essentially recursively calls itself over and over again because it'll just call exit exit will run at exit then call main it calls exit which does the at exit thing because it goes through it over and over again so fun ways to infant loop ourselves but this isn't any different than any other infant loop you could get in right so to wrap up so operating systems well they provide a foundation from libraries we found that dynamic libraries and explored dynamic libraries compared them to static libraries played around with the dynamic loader you don't really know how to do this for this course it's like fun party tricks and it might help you when you actually have a job and you need to figure out something and actually debug it or simulate an update or something like that we also saw examples of issues from ABI changes without API changes so even fields of struct are part of the ABI so you have to be very very careful about updating them so the too long then read of the lecture is where the dynamic library is and be very very very careful whenever you update it so with that just remember pulling for you we're all in this together