 All right. Welcome back. So today we're learning about libraries. So libraries will be very important because lab two you'll be making a library. So in lab two you're gonna have a lot more freedom than lab one. So it will actually hopefully be more challenging and more fun than lab two. So we should probably learn about libraries if you're gonna implement a library. So show hands who's implemented a library before. One, two. So like two people in the whole class have implemented a library before. All right. Four, maybe five, maybe. Okay. So let's talk about libraries and possibly some bad things that will happen that even if you have implemented a library that you might not have realized. So to tie this into operating systems two, sometimes libraries can be considered a part of your operating system because for sure we know the kernel is part of the operating system because the kernel is a code that runs in kernel mode on your CPU. So there's a clear division of responsibilities there and we now know how to cross that boundary using system calls. So libraries are all in user space. So they aren't part of the kernel at all. They're all implemented on top of it and they may or may not be considered part of the operating system depending on who you ask. So for instance, you'd have your kernel space and user space. The standard C library would just be completely in user space. It'd be implemented as a library. So some code executes whatever you call printf written by someone else and then pretty much everything is built on top of the standard C library. So there's things called like the system demon that's like something like Udev which basically just is a way for you to interface with some devices that do a bunch of system calls. There's a display server if you care about a graphical user interface and actually seeing some images instead of a dumb terminal. So that'd be implemented as a library as well. If you're using just a desktop Linux, it'd be something called Weylin. And then on top of that, that's very low level. Like hey, give me an address where I can write some pixel values. So generally, you don't want to do something like that. But on top of a display server would be like a GUI toolkit, something like JDK or something called QT that would actually give you like an API where you can actually create a button or something that you might be more familiar with. And then all of your applications would run on top and use some variety of these libraries. So like they have an application called network manager. It might communicate through the system demon to like get all your network interface cards and all that stuff. And maybe it doesn't even have a graphical user interface. While LibreOffice, well, it would probably use the display server, some the GUI toolkit because they wouldn't just invent something themselves. And then Firefox would probably use everything. So Firefox would use a toolkit display everything. It would probably do some low level display stuff to, you know, render your web pages because it doesn't use a GUI toolkit, but the buttons might. And then it might use the system demon because you're dealing with a bunch of network stuff that it can do fairly low level things. So the C A B I or calling convention. So we talked about this like in lecture two. So this will be especially important for libraries. So this is more like a kind of fun fact. So the C A B I or calling convention for like most of your desktop servers or non Mac laptops would be something like this. So system calls use registers and remember the Linux A B I for system calls, you can only have like six registers. So you're only constrained to six arguments there. While C tries to be a bit more general. So the hypothetical way is to pass arguments on the stack in left to right order. And there's going to be some registers that are caller save versus call these save, which you might have ran into in your computer organization course. For this, we don't really care. In general, you the more general approach is the C A B I uses a stack to pass arguments. So that gives you a bit more flexibility where you can have any number of arguments you want. And they can also be different sizes. So if you want to pass a bite to a function, while you just, you know, get a bite of space on your stack and put the value there. So it's not true. So Wikipedia has more details. There's lots of conventions even in C. This is like the standard C convention. But there's like the C convention for that's a bit faster that tends to use more registers. But in general, you can kind of consider C is just passing arguments on the stack. So advantages, this gives us, like I just said, any number of arguments and they can also be variable sizes. Disadvantages are it's a bit slower. So because we're not using registers, we're using a stack, which would use the memory, which hierarchy of devices, memory is going to be a lot slower than actually using registers directly. So you've hopefully seen this before, but if you haven't, so this is what happens in normal C compilation. If I'm making an executable, that's comprised of a bunch of files. So say I have like main.c on the left there, util.c, foo.c, bar.c. Well, as part of my compilation, I would compile each of those and get an object file, which would be that dot o thing. And that's machine code. We now know that would be an elf file, and they would contain the actual machine instructions of whatever the function you wrote is. And then something will be called a linker. So there's two steps. Things get compiled to machine code. And then a linker will take all those object files, combine them all together, make sure that you call the right functions and create a single standalone executable in this scenario. That you can just execute directly, and that would be like that hello world file. It's just an elf file, nothing that special about it. So there's two types of libraries. Oh, so some of us don't know what a stack is, so who doesn't know what a stack is? Okay, so you can message me after, and I'll explain what a stack is because that's fairly important, but bear with me for now. So a static library, there's two types of libraries, there's stack and dynamic. So a static library are included at link time. So if you go back here, maybe in that, you know, you till that foo in that bar file, that's fine if you just use that in one application. But typically we like to reuse code. If you've written code before, you want to reuse it in other applications or something like that, and you don't want to just include it wholesale in that application, you want a way to kind of decompose it so you can reuse it over and over again. So instead of doing that, what you can do is make a library. So it's the same thing. So all those C files would get compiled to .o. But instead of combining util .o foo .o and bar .o into an executable, you would just include that into a library, and it's called a .a, and all it is is just a combination of all those object files so you can reuse that in different applications. So now when I come along, I want to create an executable now. Instead of using every single object file, I'll just use my main file, and then I will also say I want to use things in that lib.a file, and then I get an executable, and then if I want to reuse those, I might just have a different main .o from somewhere else, and I would just include that, or I would link with that library, and then I get an executable with all that code, and I don't have to relink everything. So what that does is that executable will still contain essentially that copy and pasted machine code, but I didn't have to recompile everything from scratch, or maybe those object files are somewhere else that .a file would be given to you. So the more general way of just not having your code copying pasted is something called a dynamic library. So the C standard library is a dynamic library, and that's where that .so comes from, and you can think of every library kind of just as a collection of those .o files, so it's just the machine code of all the functions that you've written. So the idea behind this is I don't include all that code in my executable, I just say, hey, my executable needs this .so file or this library file to execute. So in that way you don't have to duplicate anything, so if I have two applications that both use the C standard library, they don't have their own version of printf embedded in them or something like that. So they would just both use the C standard library, and on your system it would be libc.so, and for, I don't know, trivia sake, .so just stands for shared object, so it's basically just a slightly different format, it's just a bunch of .object files all thrown together so you can actually reuse them over and over again. So why this is important is the operating systems is mainly due to efficiency, so the operating system just has to load that libc in memory once, and then if nothing modifies it, which you can't, it would be read only, every process can share that, so it only has to read it from the disk once, and then every single program can actually share that memory since they'd all be reading from it over and over. So we'll get into what a pages and all that when we get to virtual memory, but we'll revisit this, but all you have to know is that hey, it can be reused over and over again. So yep, so the question is what's the difference between a dynamic and a static? So they look the same, so in here a dynamic library has the same idea, it's a collection of those .o files which is all the machine files, but in this case it would, they'd all get compiled together instead of an archive, a .so file, and then what we had before was we had a main.o, and we had our static library here to create the executable, and that executable will contain essentially all, like it will copy the code from that library and put it in the executable. So every single executable would have like a copy of printf or something like that if it's a static library. So you'll note here in the dynamic case, well I don't even have to really, I don't have to link it while I'm compiling, I would just take my main .o file, link it so I have to tell it where libc is, but it wouldn't actually use that other than to do a few sandy checks, and then I would get an executable that wouldn't contain printf or anything like that, and then at runtime when I run it what your system's going to do is try and find like the C standard library, and that's what we saw when we like S traced the hello world written in C, it was looking for the C standard library, so that's one of the things that we'll do, and that's all done at runtime. So because of that it's also a nice thing and a thing that will prove to be kind of a pain in our side. So that's the major difference between them. So we'll do a bit of practice and there's some useful command-line utilities for dynamic libraries, so there's this LLD command that you give an executable and it shows you which dynamic libraries that executable needs to use, and then you can also do this obdump.t on the library file and that will show you all the symbols that are in the files, or in other words all the functions that are just that are contained in that library. So if you do that on the C standard library you'll see printf, you'll see everything else that's in it. You can also use obdump-d if you are really getting into low-level stuff and that will show you some disassembly of that, so you can see you know what are the assembly instructions that make up libc. I don't recommend reading them. So getting into static and dynamic libraries again, the static version basically just copies, just creates a big archive of all your kind of compiled code and would essentially copy it into the executable as part of the compilation process. So at runtime you don't need anything special, that's good about it, but if you need to change anything or like that you essentially copy and pasted the code so whatever you copy and pasted in at the time when you compiled it is what you're stuck with forever. So in the static case I can't reuse libraries unless my operating system is really smart because essentially my executable is just contains all that copy and pasted code in it. So if I want to update that library say I had a bug in printf for whatever reason, if I as the person that created that lib.a file that contains the printf code, if you want an up-to-date version of it well you have to download my new version of that library and then you have to recompile all your code so that you would get that new version embedded in your executable. So there's some more issues with dynamic libraries where okay well if I have to recompile everything whatever there's an update what might an issue be with a dynamic library? Yep, yeah so if it's a pain in the butt to update it well the dynamic library they just have to update that library and it would use a new version whenever there's a new version but software developers aren't that great sometimes and that new version might do something completely different and without you changing your code suddenly your code just doesn't work anymore and that becomes a real pain in the butt and we'll see some like real subtle ways of you can break everyone's code by not doing anything that difficult. Yep so a dynamic library you only need one copy and sometimes to get updates automatically it's actually really good but with it's like the spider-man thing right with great power comes great responsibility so if you're using a dynamic library you really want to trust whoever wrote that otherwise your stuff might just break on an update if they really bad. Yes a static library like it's all all that code it goes into your executable so if they update the library doesn't matter it doesn't affect your code so if you have a working one and you don't want to touch it then you can just leave it alone unless you recon and then if they do update that and then you recompile it a month later then your code might break anyways though right but you could always go back so the major reason is why I've been harping on that ABI versus API is dynamic libraries because they are like just compiled code you can break a lot of things through just standard ABI changes even if you don't know you know what exact how things get translated so like we said an update to dynamic library easily caused the executable to crash if it was a bad update so you're using new code who knows what that code does so let's show a little example and make sure everyone falls along if not ask me questions so we'll create a dynamic library so all it will contain is a struct that has multiple fields that represents a point and if you look at all the documentation a struct in C has a very defined ABI so we know in memory you know everything is byte addressable and if we have a struct with fields they would be in memory in a certain order and that's defined as like a stable thing and we'll go into that so if an executable accesses any fields of a struct in the dynamic library you essentially can't change your struct anymore because you'd be changing the ABI so if you re or even if you reorder the fields which wouldn't break the API because if I had like an x and a y and I just changed the order of them that struct would still have an x and a y then I actually might break something so then your code would just behave really weird and that would be really really bad so if you essentially give a user a struct you're not allowed to change it ever otherwise really weird things will happen and we'll see that so let's just dive into the example so here is our a or here's some code that uses a library so open the library header file real quick where's my mouse cursor please come so in here we'll have a library and it's like a c library so it just defines some functions for you to use and in this case we're going our library is going to have four functions so it's going to have some point create function which will take an x and a y and return a pointer to some struct called point and this will be like the good api design so you can say in c you can say hey I have a struct called point I won't tell you any details about it just know that there's a struct called point and I'll give you a pointer to it you don't have to worry about it so this point create will give you back a pointer and you don't know anything about that data other than you just got an address and supposed to represent something called a struct point then I'll create a function called point get x that just takes a struct pointer and will return that x value and then same thing for y and wait how many of you have done c++ so this is like a scuffed version of a c++ class so all c++ does for you is essentially make the pointer argument a lot better so this would just be like a method but I'd automatically get this so that struct pointer p is basically this okay and then finally I'll have a point destroy function that takes a struct point and it's supposed to destroy it and in this case they're just free it or something like that so let's dive into so I have two versions of the library that looks somewhat similar and this one in point create all I do is I define the actual struct so here I'll dive into it so in v1 this is what my struct looks like it just has a y and then an x so it just has two fields x and y and then in my code point create all does is all does is malloc you know the size of that struct which would be two integers so each integer is four bytes so the total size of it would be eight bytes so it would malloc me eight bytes and then I just copy the arguments I got in the function so it's x field is x it's y field is y then I just return that pointer back and then in git x well I access my struct and I get the x value I return the x value y I return the y value and destroy all I do is free it so let's close this so if I go ahead and compile so everything works and in my point example here all I'm going to do is create a point where x is one and y is two and here I will comment out the code so after I create a point I access it whoops so I access it using those function calls so I use you know git x and git y and then I do something that is ill-advised is I'll just say hey no I know what that point is it's a struct you know it's a struct v1 so I know it has an x and a y so I don't need any stupid library calls I'll just you know use the fields directly because I know where they are so if I do that and then I run something called point example v1 I see that both my results are the same so using the library calls I get x is one and y is two same if I use the struct directly so it's going to get a bit weird in a second if you haven't seen this before so I'm going to create version two of my library it's going to be great and in version two of my library all this code is the same for create git x and git y it's all literally the same the only difference is the struct is this struct point v2 and in point v2 I was just like yeah I didn't like the order of the fields I'm just going to swap them around so literally doesn't change anything with the code all the apis are the same like the high level names all I did was change the order of that but because I changed the order of this I actually changed how they are arranged in memory even if you didn't know it so if I go ahead and then I yeah so I'll go ahead and compile again yeah so okay so I already have it created so if I go ahead recompile it I would have a point example v2 so in v2 this code would use the new v2 library but in my point example I like directly said oh yeah I know it's a point v1 and don't worry about it I got it so this would use the new library if I do that I see that hey if I use the library calls even if I updated the library they're still correct but if I use the structure directly it's in the opposite order which would cause really unexpected stuff so think if you use this library as part of a game or something like that and then suddenly you updated and then everything got flipped you'd probably be pretty confused so any questions as to why something like that would happen or is that so again let's see why this happened so just to be clear so in our libraries so here is this I'll just make up a pointer address so this would be like your uh so this would represent our struct point v1 and because it's cc cares about details the base pointer would be something like 1000 and because it's c and there's a defined abi so like low level details where bytes are actually located your layout of your struct in that case they're going to be defined in the order you define them so at the first address would be the value y and it's an integer so it'd be four bytes so it'd be you know so it would start at address a thousand take up four bytes and then at address 1004 there would be an x and then if I reorder my fields well in struct point v2 that memory address of the pointer well now that is actually start with an x because I reordered the fields and the next field is a y so what happened is in here and here I can show as well so I can uncomment this as well so what would happen is if I run this I can see that when I so the v1 is what it actually links against yep oh whoops sorry so all I did was uncomment it so here go back so I just uncommented this one so in my executable in the first print f I'm going to use the library functions and then in the second print f I'm going to use that struct v1 as if you know don't worry about it I know how it's laid out and then I'm going to do a print f with the struct v2 so if I do that and compile it the name of the executable here is let's close that the name of the executable is what it's actually using at runtime so if I do run build point v1 what happens is the first one should always be right because I'm using the library which would know how things are laid out and then I can see if I am matching what version I'm running then I get the correct output I get one and then two but if there's a mismatch then everything's reversed and then the opposite thing would happen if I run point example v2 so the library calls correct and if I access the struct directly it's correct because my uh I'd have all the version two offsets in my executable so they would all match up but now if I use version one then everything's going to be reversed again so any questions about that or like why those look that way uh yep so yeah so the question why do I point v1 and point v2 well the answer to that was like I didn't like the order of it like I just didn't like the way it looked so you might look at code and it looks like a fairly like standard change you might make right so I made my code it was x and a y I stylistically I don't like that so maybe you just go and update your code so no other reason to that like why it's just I felt like it yeah so the crate is always done in the library so the order that they're defined in would be from the library oh yeah no problem so it'd be defined from the library so whatever version of the library I used is the order of the struct because it creates it and then that depends if I use the library you know everything would match up but if I compile it in my code then I it never changes so the offsets wouldn't update yeah so you should be careful so that was a pretty like innocent change right all I did was change the order of a field but if you were to debug something like that you would probably go nuts right if suddenly your thing just flipped you would go insane so this is why you have to like be very cognizant of what an ABI is and be sure you don't change it because see when I like hid stuff this using library for both of them they're exact they're correct so because I hid it the details of everything it was correct you never essentially compiled anything in but if I give you a struct I gave you an ABI and you have to adhere to it and never change it so that's why like the C standard library they're not allowed to change anything so that's why you might see weird things because if they even make the slightest change all the code that has been written for like the last 30 years will just break in fact this actually did happen when was it so the C standard library broke its ABI and the entire world had to recompile literally everything and it was awful and C++ did this as well because they changed like the ABI of a list and it broke everything so everyone on the entire planet needed to recompile and it was awful yeah so this is I'm creating a a dynamic library here yeah okay so just to be super clear so this is what the struct looks like in v1 my library at top and at the bottom is what the v2 version of the struct looks like so the order changed so what happens and what code gets included where is in the executable well if I use the standard library or if I just call to my library so my get x get y all that would be in my executable is calls to that library so if I update my library you know if I at runtime I say go against v1 while these calls will call into this and in here would contain all the code for get y and get x and then if I updated my library well instead of calling the v1 version of my library they would just call the v2 version of my library so and then also the first thing it did was create so in the library would contain the actual implementation and code for create so in this one it creates something it creates a point that's like x y and in this version of the library creates a point that's x y or sorry yx and then if I use oh sorry yep yep so the c standard library is just a dynamic library and it has gone through several versions but it's been through like six and like 40 years which isn't that bad so that's why that what I like way lectures ago there's libc.so.6 that means it's like the sixth version so what they're one of the things you can do is put different numbers so if you do break anything you can be like okay well I compile against six so just keep six the same and I'll be fine and then if you update to seven you have to like opt into it yeah okay so what happened is in that executable so my first printf line I used the library calls which would change depending on whatever library I use and for the printf1 essentially what I did was when you compile it in it's going to know that hey I'm using a struct and use whatever layout so I always printed x and then a y so in here it would use if I compiled in x or the first version of the struct while x is located here at offset four like four bytes from the beginning of the struct so it would essentially print you know an int starting at byte four and then an int starting at byte zero and then for the other version of the struct it would print an int starting at byte zero and then byte four so it would just use those offsets and they would be compiled directly into the application so in here my application if I used version one of the library the thing I get back from create would be y and x so the memory I get back it would be y and x and if I call like git x well it would call into that into the v1 library and the v1 library in git x well it knows it's essentially at offset four so it would always agree but here because the structure I got back was x y that first print one it would agree with the library so they both match so they're both at offset four they both agree so you get the correct result but if I use the v2 one what's compiled in my application is hey you know x is located at the beginning which is not true so that's why you get the flip and then if I had point v2 well my create would have created a struct the other way so it would have created this and it would have been x y and then in here git x would actually know it's correct offset so it'd be at zero so the first print one if I print whatever is the second int then it would actually be y instead of x so it would get flipped while this other print statement would be the correct because they both agree okay so any questions about that okay so we'll see something even a bit weirder yeah so just kind of in summary if you have mismatched versions of the library it causes really unexpected results so all I did was change the definition of the struct I reordered the fields but because c lays them out in a specific way in memory well the offsets change and an offset is just how many bytes from the beginning of the pointer is x or y so if you change the order you change the offsets the way they're laid out in memory again x y versus y x and if you use all the library calls it everything lines up because the only thing accessing the offsets would be the library while if I give away the definition of the struct you would essentially have the offsets compiled directly into your executable which means you don't get to change them without recompiling so that's why they didn't change and they were flipped if there was a mismatch there so if you give a user a struct as part of your library you are no longer allowed to change it which is again why you might see some weird things and like the CABIs so here you go that you can try it you can mix and match so there's this environment variable called ld library path so you can change it to build v1 or build v2 to simulate an update without actually you know using the linked version of it so in this example I'd be running everything I can the point example v1 and I would be able to specify hey use this library at runtime and then you'd see one set of results and then you can simulate an update and say okay when I start executing look in this directory to find the library which would have our v2 which would disagree with what was compiled in it and you would see a flip so there's this thing called semantic versioning and that really helps you be able to argue about updates and no one bad things like this happen so there's semantic versioning it actually gives some meaning to version numbers so usually your version numbers are like major dot minor dot patch and the kind of rules for that is you should only change the major number there when you make some incompatible API or ABI changes like you just break code so then you know as a developer if I see that number go up I'm going to have to recomp- at the very minimum recompile my application and you know if I make API changes I might have to rewrite something because they change library calls or something like that and then for the minor version the second number there you should only increment that if you add some functionality and some backwards compatible manner so you don't break any ABI's or anything like that you just add a new function for example and then you should increment that version so I know that my application requires you know a function you added in I don't know 1.3 or something like that so that makes your life a lot more difficult or a lot more easy and then you could should increment that last number if you just make like a little bug fix and you know don't change any behavior or anything like that you just fixed a bug so you typically see that number go up because we screw up code a lot so they also allowed for some believe it or not easier debugging because you can do some tricks with them since they're you know they find that whenever you start running your application you get to change whatever things are so I'll show you some tricks with that mostly this LD preload so let's consider this really simple example we have main mallocs size of an int which hopefully everyone knows is four bytes so you get a pointer back and then we're going to print the address of that pointer that's it then we're going to free it and then run so if I run this how many bytes does my program allocate four everyone agree with four anyone disagree with four all right so we said that I allocated four bytes so go back so one of the fun things you can do I won't show you how I do it yet is malloc is just a function it's not terribly special so I can write my own malloc and I can force it to use my malloc instead of lab libcs malloc so if you do that this is like one of the debugging techniques has anyone used valgrind before or something like that okay we have a few hands so there's tools that will show you if you leak memory or things like that basically checks that whenever you malloc something you free it so this is a kind of scuffed version of that where I'm just going to log uh I'm going to create my own library create my own malloc call and all malloc will do is log any every single function every single malloc and free you make was it yeah okay so here it is so I see that I have a malloc with four so that's why I called it int which is what everyone said is like the only thing I made but I see something really weird there's malloc 1024 that gives me an address that is different so it ends in four two c zero but I can see for sure I like printed off whatever the address of x was and if I look at the address it matches that so that's definitely my memory that's my malloc call and the free matches so what the hell is this anyone guess what this is yeah inside the memory allocator so if this was inside the memory allocator that would mean malloc needs to call malloc which is like a circular dependency which wouldn't quite work it'd be like infinite recursion yeah print f didn't get changed but print f needs memory so yeah yep yeah so yeah so the first one was actually correct so I called print f print f malloc some memory because it needs to essentially create a buffer and what it needs to do is oops not this one what it needs to do is take this string with that percent sign p and essentially put the address in so that string becomes you know zero x whatever all the numbers and stuff so that called the malloc is actually from print f so that's like an internal malloc and print f which is kind of cool right we wrote our own little wrapper and we see what the C standard library does so but it also looks like we're kind of in trouble if like you want to check that every single malloc in your program gets freed well that's kind of a bummer because I didn't make this call to call the malloc so how could I possibly free it like what's going on like did I leak memory am I bad what's going on so so there's this tool called valgrin that will detect memory leaks for you and you just use it you just save valgrin and then you're executable and let's go ahead and see that real quick so if I used valgrin on this alloc example oh god okay this worked before but I updated something so it broke cool all right don't update stuff right before lecture but anyways if you run that tool it says you make two allocations and it says both of them get freed which is kind of weird but if you read the man pages for valgrin it says oh well the C standard library which is used by everything may allocate some memory for its own usage so that's that malloc 1024 we saw and then it also says usually it doesn't bother to free that memory because when the program ends there'd be no point since the kernel claims all the processes resources when it exits anyways so it just slows things down so why would I bother so that's okay because that memory actually lasts as long as your process because you don't know how many times printf gets called but this disclaimer here doesn't you know doesn't excuse you from not calling free so you should still get in the habit of freeing resources when you're done with them but this is like a fun little caveat yeah if you don't call free technically it's true if the like if that memory should last until the process ends but if it doesn't last that long you should free it as as soon as you're done with it so it would just allocate that like 1000 whatever and that would be accessible as long as that process is going because you'd be calling printf over and over again so just reuse that memory which is fine okay so we saw operating systems provide the foundation for libraries so we learned dynamic libraries how they compare to standard or static libraries static libraries essentially is just the straight copy and paste of code dynamic libraries actually you know can change whenever you run the application and then we saw an example of api changes without api changes we just changed the order of a struct and we saw some bad things that can happen so i will release lab 2 while i'm on the train so just remember pulling for you we're all in this together