 Hello. Timbits if you want any. Early comers special. Timbits if you want any. You're good. Timbits if you want any. To find you. Timbits behind if you want any. For people coming in? Yeah. How's that so weak? Like was it week four now? Something like that? No, that's still the exam. I thought there was 13 minutes. Yeah. Timbits to your side. This is lecture 13. So three a week. So you've done four. Oh, Timbits there if you want any. You can help yourself. And if you're quick you can go pet the dog in the lobby. All right, so today we'll take a small detour and talk about libraries since they're kind of related to operating systems. And I guess no one's created libraries before, right? Has anyone created a library? No? Okay, really weird disgusted faces. All right, sweet. So we'll kind of learn about them today. So why are libraries important? Because remember they're also part of operating systems depending on what your definition of an operating system is. So for instance, we know for sure the kernel is part of the operating system, but like we said back at the beginning, Android and Ubuntu both use the Linux kernel. So are they technically the same operating system? Are they not? The only difference is what applications run on them. And this is the typical kind of library you'll see on a Linux machine. You'll have the C standard library and then you'll have other libraries built on top of it like the system demon library that's, you know, is run on just to kind of make it easier to talk to devices using the kernel. Then there's going to be a display server that actually renders stuff for you. And then on top of the display server, you could have a GUI toolkit built on top of it that uses the display server. And then you have a bunch of applications and they might pass through multiple layers of libraries. So for instance, your network manager might straight up call the C standard library. It might call the system demon and it might not display anything at all. Well, if you have LibreOffice or Firefox, you know, LibreOffice might just use the GUI toolkit, which would use a display server, which would use the C library. And Firefox might just use everything because, you know, you have it displayed to you. It'll, you know, capture your webcam, your microphone, talk to your devices and do all that fun stuff. So we should talk about a bit about, like, good software engineering practice and what makes up libraries. So as you're kind of figuring out our lab two, right just as an aside, the C has an ABI, which is a binary interface. And there's a certain convention for each individual architecture. So for x8664 bit, you're discovering it's kind of also insane. By default, C's stack base, where arguments are pushed in left to right order, and then there's caller saved and calle saved ones. But in 64 bit, some arguments might be passed in registers instead of the stack if they are simple enough. And I guess for lab two, your thread stub code that you're creating just has two arguments that are simple. So they'd actually be passed in registers, which is kind of in the handout. There's Wikipedia, there's lots of other conventions, there's fast calling conventions, and there's tons of other calling conventions, and just the caller and calle function just need to agree. And then some advantages that gives us, remember with system call ABI, we just are using six registers, so we end there all a set size. So when we're calling system calls, there's only max six arguments, and they all have to be the same size, which is the size of a pointer. Well, if we are in C, we can have any number of arguments we want, they can be variable size, and all of that fun stuff. It just might be slightly slower if we're using the stack instead of registers. So the question is, is C the basis of everything? And yeah, pretty much. We can figure it out, so like one lecture we saw, we could see every program that has the C standard library open, and it was all of them. So C just makes things a bunch easier, but pretty much everything's built on the C standard library. Like each one will have its own C standard library that abstracts all the system calls, so you can use pretty much the same stuff. Except for dealing with graphics hardware, that's different per operating system. So Windows will do its own thing, Mac OS will do its own thing, Linux will do its own thing, Android does its own thing. But that's the idea, that if you have a C standard library and you write most of your applications you've written so far, or just terminal applications, so you could run them on whatever you want because they pretty much just use the C standard library, right? So normally when you compile, it looks something like this, right? Everyone understands this, so if I have four C source files, when I compile them, they get compiled to a .o, which is just basically, it's an L file and it's going to just have all the assembly code that implements that function and some information about what that function's name is, right? And then at the end, after you compile every single C file, you link them all together and kind of smoosh them all together to create an executable and one of them has to have a main in it, right? We all have done this before and understand this. So static libraries are just a way to decompose it so you can reuse parts. So say of those four files, I know my util, my foo, and my bar files, I know I want to reuse them in different executables. So what you can do is you can essentially, oops, link them together in an archive, which is basically just you smash a bunch of .o files together and they just all live in that .a file. So lib.a will just be those three object files all kind of smooshed together. And then instead when we were linking our executable and we used each individual object file, we just link main with our library and then create an executable. So now if we wanted just to slightly tweak something and use otherwise the same libraries, we could modify our main file and link to the same library or add another file, create a new executable for it and use the same library, right? So we're just reusing things. This makes sense to everyone because we're going to take a leap soon. Okay, so what we actually need to talk about so static is just kind of copying, pasting. It's just one way to decompose what we already know. While dynamic libraries are usually used for more reusable code. So the C standard library that we kind of saw before is not a .a file, it is a .so file. And basically it's the same idea. .so file is just a collection of .o files that you want to reuse except there's a little special stuff that we'll see so you can actually reuse it between different applications without having to link anything, right? And without having to recompile anything. So the fun thing is if you have like two applications here they can both use the C standard library and then your operating system gets to do some fun tricks. So for example, if every single program uses it so if you have 100 programs running and they all use the C library well instead of having 100 copies of the C library that are all the same thing because we have virtual memory so one of the fun things you could do if you are the kernel is for each application that is using the C standard library it would be at a particular virtual address and then you could make each application's virtual address that corresponds to the C library correspond to one physical address so they can all reuse the same physical address even though they all have their own virtual address space. So you can pretty much just read this C standard library and execute it, you don't modify it so you can actually share it between applications which gives you some nice speed ups and benefits so you essentially if the C library is already there when you launch your application you don't have to reload it or anything, it's already there you get to use it essentially for free which is kind of cool, yeah. So yeah the question is doesn't that cause a lot of data races so what's the condition for a data race? So the condition for a data race you have two concurrent accesses to the same location and one is a write in this case we're just reading and executing code so there's no data races, it's all reading we can have a billion if we want yep, sorry? No, so the question is does the dynamic library have to be compiled to an executable and the answer is no we'll see how to create the dynamic library it's basically just a bunch of .o files so we'll have an example that we can go through so this is how it works just to answer that question as well so it pretty much looks the exact same as a static library so all of these .o files that we would have had in our .a file we just specify a different kind of linkage to the compiler and it creates a .so which is basically the same combination of .object files just with some extra stuff on there so it can actually resolve this difference so the main difference is when you actually create your executable you just link your file that has main defined in it and then you just tell it what library you use and then whenever you execute that program your operating system will find the library and actually resolve all the symbols for you all the hard work so you can just use the library so every time you run your application if you're using a dynamic library as soon as you hit run it will go find where it is and then start using it to call and that's the main difference so you can think of the library as having the exact same functions whether or not it's compiled into our executable or it's found at runtime and therefore can change which is the difficult part so some useful things if you want to see what's actually using dynamic libraries you can use LLD on all your executables and just double check so let's see that so I have some examples here one of them is just called Alloc Example so I can LLD it and see what libraries it uses so we can see here it uses libc and then on the right side of the arrow it will tell you where that file actually is so it's at userliblibc.so.6 and then here there is this Linux thing that doesn't resolve to a file because actually the kernel provides it for us so it's not even a real file it's just something that looks like a library that the kernel provides for us which is kind of sweet and then there's this LLD Linux architecture and that's a library that kind of that does the dynamic loading for you and does all the hard parts with kind of finding the library whenever you run it and loading it and all that fun stuff so if your question is hey what uses the C standard library I mean I can use this tool on everything and see what everything uses I can do it on LS LS uses the C standard library it also uses libcap which no cap that's fun we can do some other stuff cat so LS is just written in C you could write LS yourself now it wouldn't be that bad you kind of use stat in lab one LS just calls stat over and over again and I mean we know how to check what applications do if I run user bin LS if I want to see what it actually does that's trace right we know all the secrets of the operating system now so it writes out everything opens a bunch of files so it opens a file called dot and then just queries what's in it it does get all the directory entries and then it just prints them out LS doesn't do anything that complicated it has to use a system call and we know how to monitor system calls so if you want you can figure out what system calls it uses and write your own version of LS if you really want so that's fun so you can also use object dump dash t to see what symbols a library has in it or that are defined in an executable so another example is today we'll see kind of a our own memory manager a little bit so I made a library called alloc what I call it allocwrapper.so so inside allocwrapper.so I defined Ralloc, I defined free I defined malloc, I defined calloc which we should know are all of our memory management functions sorry so this is in my library so I have a file here called allocwrapper.so and I just made a library that just contains that file so I just define four functions in it so these are all functions and then if you do dash t on like an executable or something so if I call alloc example it'll tell you all the symbols that that executable needs to execute so like for example this one uses printf so it needs printf to run so you can play around with libraries like that and of course there's our fun object dump dash d if we want to see what the code is within a function which might be useful for lab 2 because it's kind of messy alright so to go on to the fun thing so the difference main difference between stack and dynamic libraries is basically whether or not that code is included in your executable or not so big drawbacks to using dynamic libraries the big sorry the big drawbacks of using a static library is that you can't reuse a library anymore it essentially just copies and paste that code and puts it into your executable so if you have 100 programs using libc and they're all statically linked and they all use for example printfs then you would have like 100 copies of printfs just running in your system which you don't need to so it will actually waste some space and the real big drawback is that if you use if you want to update that static library because it's included in that executable well if you just update that static library the executable doesn't change so if you want to update it you have to recompile it right so given that what might some issues be if we use a dynamic library then yeah one thing so if the api of the dynamic library changes it might screw up our thing that already works so it might break our already running application where you're going to say the same thing yeah so you could also change the behavior of the library and then suddenly your application doesn't work and there is a third even more subtle worst thing is you could change the api and you will probably not realize you've done that so we'll have an example of that because that is the fun one so let's just go into the example okay so we like kind of know how to write correct applications right we write good code so I'm going to have a very very very very simple application and make a library so say I want to make a library that represents a point points are simple right they just have an x and a y pretty simple so I will define so in the library it's just a collection of functions so I will go ahead and say my library consists of four functions so the first one I'll say I'll make it point create that returns a pointer to a newly created pointer and it has two fields x and a y so you expect if you create a point with an x and a y then it's x is set to the first argument it's y sent to the other argument and then it's going to become clear very shortly why I'm using a pointer and kind of hiding the struct so in this case for this library I just say I don't say what's in it I hide the details from the user and just let them be none the wiser then I'll have like a get x function that just takes a struct pointer and then a get y and then a destroy which should just free it right so my library consists of these four functions and here I'll show you version one of the library so this is going to be the entire contents of the library so the library itself will have four functions and these are the implementations of it so in here I will include lib point v1 and that is just saying what the struct actually is so this struct is going to start with a y and then an x those are the two fields it's going to have right makes sense so that's the only thing in that and then inside here I just have point create that mallox the size of the struct like accesses the structs fields to set x and y to x and y makes sense and then I just return a pointer to that and then in get y or sorry get x I just access x and return it get y I access y and return it and then point destroy I call free so any questions about this pretty straightforward library right cool alright let's see how straightforward it actually is path equals alright so in my point example code I'll just you know include my library and then I'll compile it so now there's a file here called lib v1 and it's just called libpoint.so so that's where my four functions are so create get x get y and destroy so ld library path so that's just saying run use the library that's in that directory so this is my v1 code so in the point example I go ahead and I just use point create and then if I print out the point using get x and get y then that's all good right it's kind of what I expect so any questions on that so within the point example that just contains the main the only thing that's compiled in it is point create like the call to point create and then point creates defined in that library so everyone on the same page because things might get weird soon okay so someone might come along and say hey dummy why do you why is it x y in this struct like I should be able to change the fields around right and do x y instead that seems like it makes more sense and then it's like well it's not going to change any APIs right it still has an x and a y it's not going to change the order so APIs aren't going to change all the names are the same everyone agree with that but this change shouldn't really matter so just to show it really doesn't matter my lib point v2 code is actually exactly the same right line for line or character for character it's exactly the same the only thing I did was change the orders of the struct now when I execute this because I'm using a library I get the nice you know I get the output I expect which is good but now I will show you what will happen if you actually if you expose the struct to the user then it would look something like this so if in the header file you gave the definition of struct then whoever is using the library in their code they will be able to use that struct and then I could be like hey these function calls are stupid like why don't I just I know it has an x and a y why don't I just compile that in there right which I can should be faster I don't have a function call or anything right so that makes sense to me that makes sense to everyone that should be the exact same thing so instead of calling you know point get x and point get y I'm just assuming that library exposes a struct to me I'm just using the struct directly yeah so I'm just going to use the struct directly cool well should that be the same thing yes or different since I'm showing to you it's different I don't break the world so if I do the same thing it's the same right so I'm using version one of the library and I'm using version one of the struct it's compiled in my executable so they both agree it's all good well then the question is what if I do that and update my library and then say you were actually depending on this because we embedded all the code in our executable now it's going to happen yeah now the order is swapped and all I did was say use the other library anyone want to explain why that happened because I didn't change the API or do anything I just changed the order of the fields in my library updated my library and just said yeah that's fine yeah so the answer was the order which I declared actually changes where it is in memory so structs in C have a very predefined memory layout and they're guaranteed to be ordered in memory in the same order they are in the struct so in version one so by C standards it's going an inch for bytes right so by C standards if you read the real boring documentation it will say that this struct is going to be 8 bytes and they're going all the fields are going to be in the same order they're defined so at offset 0 is going to be y and at offset 4 is going to be x while in version 2 I swapped it right so now x is at offset 0 and y is at offset 4 so then here if I go back to my libpoint example if I embed it like this and I use version 1 of the struct well it's going to think x is at offset 4 and then y is at offset 0 but if I use version 2 of the library that calls the point create with version 2 it's going to create a struct where it assumes that x is at offset 0 so just swap them around so in here I'm using the library call to create the point always so it will be whatever library I use so if I do LD library path L2 it will use version 2 of the library if I use v1 it will use version 1 of the library and then so point create will always use the current library and same with these functions so I can have both so it's just using it's kind of simulating what would happen if you actually just compiled in that struct in your application so if the library defined the struct you might use it in your application and then when you compile it the offsets are going to be whatever they were at the time when you compiled it the difference between the two libraries is the offsets within the struct like I swapped the fields but if I just use the struct directly in this executable it will have whatever I embed in it so here I'm using v1 of the struct and then accessing its fields so no matter what because the version 1 of the struct doesn't change it's always going to be offset 4 the x and then y is always going to be offset 0 and then if I interpret it as version 2 of the struct x is going to be offset 0 and y is going to be offset 4 so those two lines so it's like this is what happens if I use the library this is what would happen if I compiled this executable with version 1 this is what would happen if I compiled the library with version 2 so that way if the library I use matches what I compiled with I'll always get the right number so if I use v1 now this one matches but the point is in either case if I actually use a properly encapsulated library that doesn't say where the struct is using library 1 is right in both cases it's never wrong because it only uses what's in the library code so that makes sense that even changing so if you make libraries you have to get these details right unfortunately otherwise bad things can break because imagine if someone somewhere in the C standard library someone did this to you and you think you could debug it so the solution to this is if a user knows the layout of a struct you are not allowed to change it anymore if you expose a struct to the user in C you have the stat struct and a bunch of other structs if they ever change even the order of the fields they would break pretty much every single program in existence what the C standard library committee so this actually happened I forget what year it was like not that long ago they broke an ABI in the C standard library and the only fix to that is to recompile everything or else things just broke in very mysterious ways that were impossible to debug because you have to all agree you can have different implementations of the C standard library if they don't work if you change fields no programs are going to work anymore so this should make everyone amazed that computers actually work because even if you change just that little subtle thing things break you would trust me you think data races is hard this is actually worse I guarantee you will not be able to debug it so here's our example even if we change the ABI and if we don't understand how programs are compiled we might not understand the ABI and what actually changes the ABI so one very common thing that will break it is just changing the orders of things in the struct it doesn't change our APIs or anything but it does change the ABI in which case your whole application anything that uses your library doesn't suddenly break in very mysterious ways so if suddenly I updated a library and then that was using the points and then everything got flipped around that would be really hard to debug imagine you updated some game or something and then your screen was flipped upside down good luck so here's that just to point out that C does have a consistent ABI for structs that's why it works that's why everything agrees that if you see the struct you know exactly where the memory for everything is and then the only way anything works in computers is literally if everyone agrees to do the same thing if someone doesn't do what you should then systems just don't work again it's kind of amazing that things work so here's that example so any questions about this example huh? so it's slower to use a function call and they are just they are a committee and they're saying since we gave you the struct we will never change it so that's why if you look at one of the golden rules of the Linux kernel is you never break user space and there's a lot of structs floating around so if you want to add a new feature or do something and you just have to create a new struct yeah you can't you can't modify them ever that's why you'll see there's like getPID and then there's like getPID2 because they need they doubled the size of it but if it's in a struct and you double the size of a process ID number you have to have a new struct because it suddenly doesn't line up anymore right? so that's why that's why if you look especially at x86 curls that's why you'll see multiple copies of the same function call because they're all going to use slightly different structs right? if I change the size of a pointer or of a process ID from 32 bits to 64 bits I have to change everything that involves that right? I change the size yep you have to recompile everything yeah it was like so it was like a Linux C1 that I think it was C++ actually where they change the ABI in the list by accident so if your if your application ever used standard list it broke and the only solution was to recompile everything and that's a big pain in the ass if your C compiler also uses that and you can't recompile the C compiler because it's that so yeah so yeah and then that's like really bad too if you're using a Linux system you get like partial updates so like when I was updating my computer like half the packages were updated half them weren't so half of my stuff just didn't work anymore that was a fun that was a fun month but yeah it can happen and if you do do it in the rare case but they've only done that like once or twice in 30 years which isn't that bad but it's terrible but it's slower you can't optimize it away because optimizing away is the same thing as knowing about the struct right so it's yeah it's unfortunate so unfortunately if you want things to be fast you have to pay the tax and this is one of the taxes right the alternative to this world is living in a world where your operating system and whole thing is written in Java and that would be terribly slow and no one would use it I guess if you use Android but Android sucks sorry they don't use Java that much the core operating system of Android is not Java so it's not that bad right alright ooh I didn't think I'd say that alright I didn't say that don't worry about alright so to get so to get rid of this problem there's something you can do which is just kind of a contract between developers called semantic versioning so if you use something like that it says hey if you have a version number just have three digits do major digit dot minor digit dot patch digit and then increment them in the following way if you break something that is incompatible you change the API you change the ABI if you do anything like that increment the version number like the major number so if you go from library one to two that means you can't reuse it you have to recompile and if I just get a random update and it breaks my application I'm going to be mad but if it goes from one to two I can be like oh okay well I'll just recompile it and then I can use the new library that's fun then you increment the minor version in a backwards compatible manner so you add a new feature for example so I added a new function I didn't touch anything else that's existing I just added something new then you would increment that number then for the patch number you should increment that if all your functions all your APIs are exactly the same and you just do some normal bug fix so fun story to try and make everyone follow good practices this is how my slides are versions so I increment the minor version if I just fix text on a slide or reorganize a slide but all the slide numbers are the same so I don't change the slide numbers I increment the minor version if I add a new slide so the slides might not correspond exactly and then major version I just increment every year when I redesign stuff right so even your slides follow a number so if you look at your what slides you've downloaded and then you check what I have given the difference in version number you should know what to expect so if you have 1.0.0 downloaded and I'm showing you 1.0.1 that means I screwed up something on the text of one of the slides and I fixed something so you should follow that and that makes your life a lot simpler than just hey I changed something good luck maybe it'll work maybe it won't work so also the funny thing about this is so if you go off the Linux kernels primary rule where we don't break anything ever well in that case the major version doesn't matter because you never break anything anyways and then you would essentially just increment the minor version and if anyone's ever followed the Linux kernel version numbers where we just hit we just went from 5 to 6 the rule for that is if the lead developer runs out of fingers and toes for the minor number he increments the major number that's it so that's literally the rule I'm not joking about that so if you don't break stuff then you don't ever have to increment that well libc is like on version 2.2 or something so I broke it at one point which I think corresponds to what I was talking about so the cool thing about dynamic libraries because they're included at runtime and resolve then is you can do a bunch of fun stuff so I kind of already use ld library path to change what library I used when I ran the executable I never recompiled it aside from when I added code to it so if I have the following example this really simple program it just runs malloc prints off what is in what the value of the pointer is and frees it so if I run this I should just see x is equal to some random memory address right so let's go ahead and do that just to make sure we're all good malloc exam so I just get some random memory address right changes all the time cool so say I want to be like hey I want to implement my own valgrind cool I'll just you know I'll essentially just overwrite malloc and free and just start monitoring them so I went ahead and wrote that because it's actually really short so and then I can use ld preload to say hey load this library before anything else so in this library I wrote I define malloc and I define free and I have some implementation for them that essentially logs and then calls the real one to do the actual work so if I do that I get to monitor my program and this is essentially what valgrind does right so I can see I malloc four bytes cool I got I got this pointer that was 2a0 which it matches exactly my x call and then I called free on it and it matches that pointer too anyone confused by something here what yeah what the hell is this so I'm literally just monitoring all the calls so if you look at this right it looks like I malloc a kilobyte and I never freed it I have a memory leak right but what I wrote I just wrote malloc and free what the hell anyone guess what the hell is going on yeah for what so so the answer might be is it for the library itself and the answer is no because the library is already loaded right by the time I call malloc my first malloc so it's already loaded at this point and it the malloc the big malloc didn't happen yet yeah so it's the print function so this actually gives you some insight as to what print does so that malloc is actually internal to print so print creates a little buffer because it needs to essentially copy this string in and then find the percent sign d and replace the bytes with the value for x right so that malloc is actually printf's buffer and we can see it now yep yep so printf has well printf is not doing what we said we should all do and free right but printf never frees it but anyone guess why it might not free it yeah the program exits so we don't care also another reason yeah yeah it would reuse it right why would it malloc free like it's going to malloc every time you call printf seems like a waste of time so optimization just malloc a buffer and reusing it over and over again yeah so malloc to malloc's to freeze so this is why you use malloc instead of you know doing it yourself because there's special cases so if we do that so if you look at the malloc documentation it says the C standard library which is used by all programs may allocate memory for its own usages usually it doesn't bother to free that memory when the program ends there would be no point since the kernel reclaims all the process resources when it exits anyway so it would just slow things down so why bother so they know that and they'll specially handle that case for you because it doesn't free so because if you ran you know if you were in valgrind on this and right if I ran valgrind on this and I got two mallocs and then one free kind of free the other one right it's going to give me some it's going to tell me I have a memory leak that I have no hope in solving ever so this is why you wouldn't write your own one because there's a lot of stupid special cases like this yeah okay first question does that mean we never have to free anything let's see let's yeah I see the note so if you you don't have to call free if you know it's going to live for the whole duration of your program that is fine but if it just uses the memory for a bit and then can free it and then use it for other stuff that's better but so you have to if you want to keep it alive the whole time you have to argue that it should be alive the whole time and that you know of that right so you can't get away for it for your thread example and then let's see if we make this bigger I don't know the easiest way to make this bigger how would this explode really quick if I just type like a thousand characters like let's see if that's long enough there's no equal sign oh god yeah there's still an equal sign do I have to make it one line missing terminating character yeah I do oh jeez this better be worth it because see there's a better way to better way to split it up you can do ending colons on each line but I'm just doing this freehand so there we go what do you want you want me to run it again jeez oh it's not that big I didn't reach a thousand yet but you can do this later it'll just malloc it'll probably malloc another thousand yeah probably just double the size of the buffer yeah so the question is what does LD preload do and LD preload forces this library to get loaded before anything else so in this library it has malloc so I force this to load and load first and then since it has malloc it will point to this malloc so this is super implementation so I'll show my code so my code is actually this so this is my malloc function so this is super hacky you never need to understand this but this will essentially be like okay I'm malloc you call me find the next malloc that's not me and then I'll call the next malloc and then I just print off some logging information so you can do some funky stuff with this if you understand this but this is like this is one hacky way you could implement valgrind if you want right so if I just use my malloc wrapper and keep track of my mallocs and freeze I could actually line them up to see if you you know to see if you have leaked memory yourself right but then I'd have to have special cases if there's a malloc and c-stand library just ignore that because you never free it anyways yep there's pretty much always assumed c-stand library is perfect otherwise your life's terrible you also assume compilers are perfect no my research was in compilers they don't work well there's like very subtle bugs compilers are very good so fortunately compilers are very good the c-stand library is very good the Linux kernel is very good is what rust is good oh rust is written in rust so most languages right when you get to compilers the fun thing to do in compilers is always write your compiler in the language you implement and that's that's the compiler course so if they let me teach you that compile course I'll teach you about that too it's lots of fun anyways for today to wrap up we learned dynamic libraries comparison to static libraries how to play around with the dynamic loader and example an example of breaking the ABI which completely screwed over our application so just remember I'm pulling for you this is this is our world now right thankfully people are good at writing code so and hopefully you will be too so I'm pulling for you we're on the