 All right good morning everyone, so if you're watching the recording or not in class in class There's Tim bits as a thank you for showing up at 9 a.m. In the morning and hopefully to get some sugar to help stay awake Last lecture. We are all kind of asleep So hopefully that helps if you want during class you can help yourself to more if you leave them with me I'll eat them all and probably like get diabetes or something. So please don't kill me all right So today we will be talking about libraries is kind of a nice little detour oh 10 bits if you want a there All right, so let's quickly talk about libraries because they're Kind of part of an operating system that are not included in the kernel so We already kind of know what's in kernel space. That's your kernel That's everything that interacts directly with the hardware But on top of that in user space there is a bunch of system libraries including the C standard library Called libc, which generally everyone builds upon and generally you consider libc part of the operating system And then way back in the like lecture three we talked about you know is you a bunch to a different operating system than Android even though they use you know the same kernel and then So you might draw your line of what an operating system is at the display server level So this is showing what it's like in Linux So the display server should be called something called Wayland And then on top of that you might have some gooey to toolkit built on top of the display server so no one has to interact directly with it and Then you might have like a system demon where you can kind of interact with Devices that the kernel exposes to you and then on top of that your applications might use any of these Libraries that they want so for example, you know Libre office might directly use the C standard library, but you're gonna see some things so it might actually talk to the GTK library, which will then go ahead and talk to Wayland Which will then talk to libc, which will then talk to the kernel Right and Firefox could pass through any number of these libraries and use them all because it kind of does everything Nowadays right you can access your camera Get your microphone all that fun stuff. So and then display things to you through a gooey So if you have an application they can pass through as many libraries as you want and as you build up libraries You can make more and more interesting applications So we kind of saw this before the C ABI you kind of saw it in lab two if you're not done lab two yet This might be helpful But remember when we did system calls we used registers while on C It's stack-based and then arguments are passed in right to left order That's just the C spec and then there's some registers that are caller saved and then the remaining are callee saved But sometimes arguments can be passed and registers instead of the stack if you're on x86 64-bit which you know see Wikipedia for that be and there's also in the lab handout So you'll have to essentially in lab two you're calling that stub Thread stub code so it wouldn't be passed on the stack because the registers are just you know memory addresses So they'd be passed in registers and that's kind of in the lab handout So the advantages that gives us first a system call ABI and disadvantages of is Remember with the system call ABI since we're using registers We can only have a max six arguments and they can only be the size of registers While if you have the C ABI you can use as many arguments as you want they can be variable sizes And the disadvantages of that too is it may be slower than just using registers And that's pretty much it, but then you have to have a calling convention that agrees so if you're not familiar with you know how you create More complex pieces of code. This is kind of what we've been doing in normal compilation of C So every one of your C files gets compiled to a dot o file And then something called a linker goes ahead and links all of those object files together And then you get an executable so everyone's familiar with this Yeah, okay, and the dot o files are just L files that we saw way back in lecture two with just you know Just the code for the functions that you define So if we talk about libraries, there's two flavors of libraries There are static libraries and dynamic libraries static libraries are included at link time So if you were going to reuse say you wanted to reuse util foo and bar well instead of you know just compiling them into object files and then Linking them all together and you're executable Instead you could link them all together in an archive called a you know dot a file Which is just a static library so dot a is just a collection of dot o files and Then instead of linking all those dot o files together when you link your executable together You would have a main dot c which produces main dot o and then you would just link Yeah, then you just link main and then lib dot a together To have your executable and that way you know if you wanted to create a different executable You could just reuse that same lib dot a and have everything included in your executable without you having to recompile everything right But yeah So lib dot a is just a collection of dot o files So in this lib dot a is just like util dot o foo and bar and they're just all included together Yeah, so that's something you would create so instead of doing this where I have four files And I compile them all together. I might decide that hey those three Files I reuse all the time I want to reuse them again and again I can just put them all together and live a and then just compile main and lib dot a and then I get my executable So they're not done automatically you have to say so like in your make file You can say hey, I want to produce a library out of these files instead So that's yeah, that's all up to the make file and the make file is just something that I guess so far Has been provided to you you haven't had to write them yourself But we'll see today a more interesting library called a day not dynamic library Which are generally used for more reusable code So if you remember when we saw libc before it was a dot SO file not a dot a file So the C standard library is a dynamic library So that's one way to tell the difference if it's dot SO. It's a dynamic library if it's dot a it's a static library So dynamic libraries are basically the same thing whether they're just a collection of dot o files all together And it is define a bunch of functions for you But the benefit of that is if you have a static library They're all included together in the executable code at compile time so the compiler just kind of or the linker just rams all that code into the same executable and On the contrary with dynamic libraries, there's just one dynamic library and then multiple applications can use it So instead of each application, you know having its own version of printf embedded in it It would just use the printf. That's in bet that is defined in libc dot SO and Then because of this too the operating system can do a bunch of fun tricks So it can only load, you know, libc in memory once and then share it between every single application and Every application on your machine probably uses libc So that's definitely a good optimization So you don't have to have you know hundreds of copies of libc when they're all basically doing the same thing Then the operating system can do some trick of because we have these virtual pages or virtual addresses and physical addresses and you don't know right each process has its own Virtual address space the operating system can do something smart like map, you know Where the libc is in every single process to the same physical area of memory and they all just read the same thing So you don't have to have multiple copies of libc so in terms of kind of using it the only so The top thing is pretty much the same as the shared library So the dot SO file is just a collection of dot o files But there's special shared linkage and the only difference is what happens when you actually compile your program So when we link our program and you know create an executable from it we just use that main dot o file and then At runtime it will go ahead and include you know, whatever the lib dot SO is so it will only be used at runtime and It won't actually include any additional code into your executable except for saying hey I need like print f or something like that Yep Yeah, yeah, so in this example you make a main dot C file that has the main function defined in it So I'm just assuming that main dot o has main defined in it No, so you would just have your main dot o file and then when You link it together. It just uses that main dot o to Make the executable and then they'll it'll also embed some runtime information that says hey I need this library to actually execute So main dot o is just your like standard main dot like main dot C or something like that So it's just like your normal object file that just has main defined in it Yeah, so generally if you want to use the functions, you need to know what functions you need to call and they'll be in a header file but it's actually not super formal where You know, you can just say you can just use functions and then hope they get to find later We'll see examples. So I'll show some examples too, but it's kind of a crafty system. Yeah So this is any C functions You can yeah, so we'll create a dynamic library today and play around with it and see what bad things you can do So as part of being software engineers, you should know how to make libraries, which no one does yet, right? All right, so we'll make some libraries today So there's kind of some useful command line tools for using dynamic libraries and checking Then kind of like sandy checking So this LLD is a good one that shows you exactly what dynamic libraries and executable uses so we can go ahead and see that so For example, I have a but I have a bunch of programs today Yeah, I have a bunch of programs today we can say LLD Alloc example and then it will show you exactly what Let libraries it uses so this library Actually doesn't correspond to a file. So it's a kind of pseudo library. That's defined in the kernel So it only lives in the kernel and then it says hey I use lib libc and it corresponds to this file So it's just a normal library file Then it also uses this standard library, which we don't know what it is yet It's like called LD Linux something like that But that's essentially the dynamic linker that figures out how to resolve printf at runtime So those two things will always be included in your executable and then you know, we have libc So this is pretty much the lowest amount of libraries we'd have But we could also LD LS or something like that. I have to give the file And we can see, you know what libraries it uses so it uses pretty standard libraries But it also uses libcap, which is kind of funny But that's one of the tools we can use and see what dynamic libraries any executable wants to actually execute and Then another thing we can do is use object dump dash t for the library to see what files it See what functions it defines or symbols that it uses and then you can also use object dump dash D Which we all probably know by now So if we use object dump dash T so In this I wrote a alloc library. So alloc wrapper So in this we can see that hey in this library file. I define functions called reoc free malloc Caloc so those are all the functions that are defined and If I go look at something like some executable I have it'll essentially say what functions It wants to use so like this one. It wants to use something called print f and Then of course we can you know do the fun thing and disassemble our program and all that which we kind of know about So let's get into the real fun stuff So just a word about static verse dynamic libraries So again, you can just statically link stuff which basically just copies the dot-o files into your executable and the drawbacks of this of course are Statically linking prevents reusing libraries. So you're essentially just copying and pasting You're copying pasting object code just directly into your executable and that's it So each executable would have a version of print f So there would be you know 200 versions of print f in your system and then if you have any updates to libc Well, since it's not using a dynamic library and you essentially copy and paste the code into your executable You'll always be using that version and if they update libc then you're out of luck You would have to completely recompile your program if you want to use that new code, right? So given those Kind of bad things. What would disadvantage is of or what would be disadvantages towards using a dynamic library? So one comment is would it be slower it? Might be slower initially, but if you have like a hundred programs were using the same thing It would actually be faster So it depends if you're the only user it might be slightly slower. So that's one disadvantage anyone else Yeah No, so if you make changes in a dynamic library, nothing needs to be recompiled because At link time it's resolved at link time or sorry. It's resolved at runtime So if I run my executable it uses a library and then I update it next time I run it It's going to use a new library So if that's true, is there can anyone envision any problems with that? Yeah No, it's generally pretty good at resolving so it can find the libraries Has anyone ever updated their computer and then it broke or something refute stopped working that previously worked Yeah, yeah, everyone's so that's one big disadvantage, right if I everything uses a new version of the library and You update your system and there's a bug in that library suddenly all your applications are broken, right? So you have to be if you're using a shared library You have to be very very sure and trust the authors of it that they don't mess it up because otherwise everything is going to go down and the reason why that is Primarily the most common way is you know, you can change the bit behavior to do something You know, it just does something it didn't used to do before or you can also suddenly break it through a bi changes so You can update your dynamic library easily your executable can just start using it start crashing So we'll go ahead and make a dynamic library today that Basically just has a struct so if you have a dynamic library that uses a struct with multiple fields it corresponds to a certain layout in memory because it's C and Structs are kind of the layouts defined in the C a bi So if you try and access the fields of a struct library, and then you go ahead and then later Re-change the order of them then suddenly all the fields don't resolve to the same memory address and Havoc will in few ensue, but this is also Okay, if the library never ever exposes the fields of a struct So let's go ahead and see a very simple library So This is going to be my library. I'm going to have four functions and All I'm going to do is to find a library that has a point that has you know an x and a y so all So this is good. Good library design. We'll see bad library designs. So here I just say there's a struct called point I won't tell you any of the fields whatsoever and then in the point create I'll just you know give me an x and a y and I'll give you a pointer back to that struct and I won't tell you anything about the struct Then we can have a function called you know get x and get y that take a pointer to the struct That should do the obvious thing and then a point destroy So if we go ahead and use that in our yeah Yeah, so so if you has anyone seen like line one and two before Make header files. So the way C works is you don't want to redefine functions. So this is just some Pre-processor fun stuff that just says hey if this name isn't defined Define it and then it would actually run all this code and then if it gets included again This lib point dot h is already defined. So it would essentially do nothing and that's just a standard thing I'll see had headers do Which is kind of a pain and kind of silly, but that's the way it works All right, so this is our library. It's pretty cool Why do we have an error? All right, so that's our library. It's pretty neat so in our main we'll go ahead and then just use point create with one and two and We'll comment all this out So if I create a point and give it an x of one and a y of two and I go ahead and just you know print off Whatever that points x is so get x and then get y. What should I see? If I run it Anyone? Yeah So it should be you know, I get one and two. That's my x and y I said that would make sense So if later we come a lot so say we didn't hide the struct at all and you know as part of our library We gave the users the struct and we said the struct looked like this So the first field is a y and the second field is an x Well, then you might be tempted to go ahead and in your program instead of using get x and get y and using a pointer And all that stuff Well, I know what the struct layout is I can just use x and I can just use y right So if I run this they should be the same, correct? Like so I see one and two So if I go ahead and run that I See, you know if I use the library functions, it's one two and then if I use the struct directly. It's one and two cool All right, so I'm going to simulate an update to my library So I'm going to go from v1 to v2 And I expect to always see one and two right I created a struct Where I created a point its x was one its y was two Yeah, yeah, so So the only difference with v2 of the library is in v2. I changed the order of the fields in the struct So now it goes x y instead But if it's using v1 and it expects v1 then it would Think x y so if you expose the struct like in the v1 that's Going to be included in your executable and that will never change so if you know if in v1 the structs exposed where y goes first and then X goes first well the offset to X here You know will be four bytes in and then the offset to y in will be zero bytes in and it will never change because It's compiled essentially into this main function, right? So if I update my library that switches the struct around so then x and y change Well this these offsets right here in the v1 will not change and You know, I'll get the wrong thing But if I you know use the v2 library and use the v2 struct then When I do that whoops then when I do that I get the right answer, right? Because now I'm using the updated struct, but I had to recompile my executable All right any questions about that fun stuff because this is a very common way to completely screw up your users So the question is how'd I swap the number in the v and v2 so Here I have I have two libraries one v1 and one v2 So in v1. This is my whole Whole definition so I include, you know the functions I need to define And then include this which is the struct So in here, it's just the struct with x y And then I just define my functions here. So in create my malloc and this is like good library design Where you you hide the actual struct I? Go ahead. I create it. This is only within the library. I malloc I set the fields within the library so that they you know correspond to Whatever order my structs in and then I just return the pointer and then in git x You know, I just use the struct directly because I'm the library I define the struct if I don't expose it then you know as long as I keep it consistent within myself It's all good and then get y is like that and destroy just freeze and then actually in this one in v2 the implementation is exactly the same It's just x y is the same thing get x is the same get y is the same destroys the same The only difference is I'm using a different struct and the structs is just in the different order so So this .c file will correspond to one shared library V2 so all the code in here will be embedded in it Which is also the offsets of the struct and then here in v2. It'd be the same thing It bed all the offsets of the struct but between the two the offsets of the struct change, right? so if in my user application I Use the offsets of the struct directly and you know and the library changes them well So Here so any struct in C is like a predefined layout. So the order matters So there's just if you read the CBI it will say that hey This struct is going to be 8 bytes at offset 0 is going to be y offset 4 is going to be x and Then for v2 It's also going to be 8 bytes, but the offset of x is now 0 instead of 4 and the offset of y is now 4 So just swap them around so if When I compile this point example So if I use the layouts of the struct I'll essentially will embed the offsets directly into my program instead of just using the library functions, right? So that way when I run my program You know if I'm using the library always get the right example But if I use v2 it will go ahead and swap the values if I Originally compiled my program using v1 and then similarly if I had Yeah, so if I had v1 Or sorry if I use v1, you know It would be right if I used the offsets in v1, but if I updated my library and For some reason compiled in the v2 offsets, then it would be opposite again. Yeah No, so I'm just simulating. So I'm just using straight up I'm just using straight up v1 here as part of this using v1 code. So this will be the same every time, right? it'll essentially access X as Offset for and then y is offset zero and then if I use v2, right? It has to find offsets. So x is always at zero and y is always at four right Yeah, so They're there all the time. So that's like essentially if I embed it and then this library Remember the struct is actually created by the library. So that will define which order. They're actually in So if I use v1 the order is like it goes yx so So be yx. So if I am embedding v1 in there, it's going to agree It's going to be what I think it is But if I use v2 Right embed that if I use the library, it's still correct But now only this one agrees and now my old version does not agree So the library version uses what's actually in the library, right? So it's essentially going to access this but using The struct that it defines internally, which is good. I'll always agree with the one that's created Right so if I use v2 because I create the point You know if I use v2 to create the point x is going to be at zero and then y is going to be at four So if I directly use that struct, it's going to agree with this but not agree with v1 and Then similarly if I use v2 as the library the point creates going to go ahead Create the point where y is at zero and then x is at four and then if I embed it This would agree with that, but now version 2 would not agree with it Right and all all it takes to completely brick an application make an update completely screwed over What I do I just changed the order of two things in my struct, right? That's all it took so if Imagine if libc did something like that, which it actually did it like I forget what year it was But it was a complete disaster So if libc does that suddenly you have to recompile literally everything Or else you're going to get weird subtle bugs that you essentially can't debug Right, so does that illustrate why that's kind of bad? So you have to be really careful if you expose essentially if you expose a struct to the user you are no longer allowed to Change the order of any fields whatsoever Because they might use the fields and might access them and they're in defined memory locations And if you change that memory location, they will no longer work and weird things will happen All right, so So here it is so see does have a consistent a bi for structs, but you have to keep them consistent So this was our example where we had v1 and v2 and all we did was flip the x and the y so That was showing that mismatch for versions of this library cause unexpected results So the definition of the struct point is different in both libraries where all we did was change the order of the x and the y so if our Executable and our library agrees there's no problem, but if that struct is exposed and Now suddenly the library and the executable do not agree that is when you have problems because now they'll be accessing different data All right, and this is just how to run that previous example for yourself if you want to use the examples So it's in the lecture 13 directory So to So this is also a fun fact how I kind of versioned the slides There's something called semantic versioning which hopefully meets your expectations as to what you expect So generally every version number looks like major and where they're just all digits. So major dot minor dot patch Where if the major version changes that means there's an incompatible API or a bi change Which means you have to recompile everything So if you see you know libc goes from version one to version two you should expect to have to recompile everything while if The minor patch version changes you just added Functionality in a backwards compatible manner aka like I just added a new function or something like that I didn't break anything that existed I just added some new stuff and then dot patch You should only increment when you make some backwards compatible bug fixes. So I didn't change any functions I didn't mucky around with anything. There's just some bug fix and my library works better before that did before So if you follow that Generally, it makes your life easier when you have to go through dependencies and other software you use So if you see the major version change, you know, you're in for a long day at the office If you see the main minor version change Hey, they might have made your life easier by giving you a new API or something like that And if the patch version changes, you should be able to just update it and hopefully maybe it fixes something for you So that's also what I kind of used for the slide So if I have you know, the patch version is I just Modified the slides. I didn't add any new slides or anything like that So all the slide numbers still line up and if I increment the major minor version It means I added a new slide somewhere Right and major I increase every year when I redesign slides So fun fact, that's why your your slides are versioned like this So you can kind of figure out so if you see my slides go from, you know, zero point zero zero to zero point one zero It means there's probably it means there's a new slide But if it went to one point zero point one, it means I fixed something and you should probably use that so My slides follow good, you know software engineering practice and hopefully that helps All right, so let's go so dynamic libraries also allow you to do some other fun stuff So you can control because they're Kind of resolved at runtime you get to control them if you're on Linux And you can control them through some environment variables, which are just variables you define that are essentially kind of like globals So there's LD library path, which says, you know, where I should search for libraries whenever you run it And then LD preload where that I can use to force a library to run So let's consider the simple example I'm just gonna have a main it mallocs Size for an int so it should malloc four bytes, right and then give me a pointer Then I just print the value of that pointer and then free So if I go ahead and execute that I go ahead and execute that, you know, I get a memory address That's cool and it changes all the time because some security reasons But if I want and like kind of how you can think of Valgrinds being implemented I can define my own malloc and my own free and my own whatever and I can essentially just intercept the call print some debugging information and Go ahead and forward it to this C standard library. So I can essentially make my own Valgrind So if I want to do that, I can do LD preload. I made some library called alloc wrapper That's so and then I can go ahead and call alloc example So if I run that I can debug every single malloc call I make except I see something very strange So here's a call the malloc four bytes. That was me it got, you know, this memory address that ended in to a Zero and when I printed it that was the memory address I got and it was also the argument passed to free But if I do that and monitor all the malloc calls in my execute that my program makes there's this one That look weird to anyone anyone guess what the heck that is Stack for print f very close oops, I'll show Alloc example any guesses as to Who made this malloc call of you know one kilobyte because there's not much going on in my program Well, it was print f print f calls malloc itself and it doesn't free it which sounds kind of bad, right? that's like It looks like a memory leak that you can't fix because malloc did it But if you run this in Valgrind, it's fine. It doesn't say there's any problems whatsoever And it kind of makes sense to because you know in malloc or in print f It just takes this format Specifier and then it needs to replace this, you know percent p with whatever the value of x is So it's got to rewrite some bytes and it does that using an internal buffer and now we actually get to see the buffer call so that's kind of cool just by trying to make our own Haki valgrind we saw that print f, you know, it doesn't free its own memory, which is kind of weird Everyone's told us to free our memory before So here again, here's moderating all allocations with our own library So we just force it to load and essentially just wraps malloc calls and prints whenever they happen so Everyone's kind of familiar with valgrind you can use valgrind see the executable But if you actually read the manual pages This is what it says. So this is why we don't write our own libraries because there's weird cases where mallocs happen and they aren't actually memory leaks So it says this the GNU see library live see so which is used by all programs may allocate memory for its own usage Usually it doesn't bother to free the memory when the program ends. There would be no points It's the kernel reclaims all processes resources when the process exits anyway So it just slows things down So if you really want to make the argument for not freeing stuff, let's use the same argument They used right and it's gonna live for the whole program anyway. So why would I bother to free it? Right, although this shouldn't really excuse you from calling free But it at least gives you some type of excuse, right? So that's kind of fun All right any questions for today or yesterday or literally anything All right, well, I'll wrap up and hang around and you can eat more 10 bits. So I don't get diabetes So offering systems provides some foundation for libraries We kind of learn the difference between dynamic libraries in comparison to static libraries and also how to play with the dynamic Loader a little bit which might be useful for debugging stuff in the future And then we saw example usages of a bi changes That broke without even changing the API at all So just remember I'm pulling for you. We're all in this together and eat some 10 bits