 Cool. So moving on to the next topic, which is package management. This is going to be a little bit more high level than profilers, which is profilers in debugger, which is already pretty high level, because it's really language specific, but a lot of the ideas behind package management are common across languages and tools. So we'll talk more about the ideas here. So software usually builds on other software, right? Like any program you write is using language libraries and probably a bunch of other packages. And so that means you need to manage these dependencies somehow. So a couple of things that are involved. You have package repositories. When people write programs or when people write libraries, they need to put them somewhere in order to make it so that other people can access them. And so different languages and different build tools use different websites for this sort of thing. For example, for Python packages, the place where most people put them is in the Python Package Index, which is that pack.org. Or for Rust crates, they go on crates.io and so on. So it's all very language specific. But in general, the communities usually agreed on some place to put all the libraries that are written for a certain language. So they can be found in a single place. And so what these websites do is they store basically source code and then often pre-compiled binaries for a bunch of different platforms for every version of a package. So if someone writes some software and for every version of that software, this will host the source code along with some binaries. Now, software evolves over time. We need someone to refer to different pieces of software, right? It's like, how can we do that? Well, maybe we could just give a sequential number to each piece of software, like as the program evolves. Okay, this is version one, version two, version three, version four and so on. But that's kind of clunky and we can do a lot better than that in terms of communicating meaningful information. Maybe one thing we could do is refer to different versions of a library by their git commit hash. We had the version control lecture last time, right? So every time someone makes a change and version control commits it, it'll be associated with some commit hash. But that's also not a very informative thing. A lot of it will tell you exactly which code it corresponds to, but it's not really interpretable to a human. So people come with better ways of reading software and there are a couple of different approaches. But I think what most people are standardizing on these days is something called semantic versioning. So do you know what I'm talking about? It's in the box. And so semantic versioning is something that's like, you might see version numbers like 1.2.4 and then the way you're supposed to interpret version numbers like this is that this is a major version, this is a minor version, and this is a patch version. And then the way this is supposed to be interpreted is that, okay, you read your software and then when you actually change these version numbers, well, if you fix the bug and it doesn't really change the behavior of anything else, then you increment this thing. So you get 1.2.5. It has all the same functionality as your earlier version, but you fixed a bug. And any software that works with this version of the library should also work with this version of the library because all you've done is fixed a bug. When you increment the minor version, well, you're supposed to do that whenever you've introduced a new feature in a way that's backwards compatible so that you add a new function to the library without affecting any of the other functions in the library. Well, then you might release 1.3.0. And when you increment the major version, you only do that when you introduce backwards incompatible changes. So any software that originally worked with this won't work with version 2.0 anymore or is not guaranteed to work with this new version anymore. And so now these version numbers are actually super useful, right? If I write a program and I want to specify which version of the library it will work with, if I give a semantic version number, then it should work with anything that has the same major version number, at least the specified minor version number, and then any patch version number. Does that make sense? And you can see how this is more useful than just having sequential version numbers or something like that. So you can see, for example, let's just make sure we understand what exactly this means. If I say that a software package depends on at least version 1.2 of some software, that means that I can run this on top of anything where this is exactly the version I'm looking for. And this could be anything equal to this or greater. Because the minor version numbers only increase when new functionality is added that doesn't affect backwards compatibility. It can't break any software that relies on anything that's in the older version of the software. And also, I can rely on any patch version because that's only incremental when bugs are fixed. And so it shouldn't matter when bugs are fixed. My library that builds on top of this should still behave the same way. So that's version numbers. And then one thing you can do in addition to specifying version numbers. I might be writing some software that has a couple dependencies. Say I'm depending on package A, version 1.2, or greater. Let me use this notation to study. Okay, I'm depending on package A and I need at least version 1.2. Because maybe I'm using some new feature that was in 1.2 that was not in 1.1. But also I don't want version 2.0 because that could be backwards and compatible. You can only say I need a dependency B, that's some other version number. And here maybe I might even specify a patch version. Like, okay, I want 3.5.4 because some critical bug was fixed here and I don't want to be running on some older version. But I also kind of want to be running 4.0. So I might have some requirements in terms of the different dependencies and what versions I want. So as we talked about, version numbers are super nice. But maybe something I can do in addition is specified exactly which software I want to be running. And this is nice for making the whole process really reproducible and more reliable. So if I release my software called X and it relies on these versions of A and B. If I just gave somebody my software X and told them, like, these are my requirements. And then they went ahead and installed things that kind of satisfied these conditions I gave but were a little bit different than what I did. Like, say I installed, this is what I did, say I installed version 1.3 of this and version 3.4 of this. But then I would say I was running my software and installed version 1.4 of this and version 3.6 of this. In theory, if everybody does semantic versioning correctly and follows the whole, like, you have a patch version when you fix bugs and increment minor version number when you add new features, but increment major version number every time you introduce any backwards and compatible change, well then everything should work out. But sometimes people make mistakes and, like, accidentally introduce new bugs in later versions of software, things like that. So maybe in addition to specifying just my constraints, I could specify exactly which versions of dependencies I want to install. Like, maybe I might specify my constraints, but I might also say, like, make sure you install exactly version 1.3 of A and make sure you install exactly version 3.6 of B. So when I have a situation where, like, say, I install some software and it works, but then Psego has installed the same thing and he has different behavior. And then in addition to this, you can do one other thing. You can actually give a cryptographic hash of the contents of the dependency. So this is kind of like the commit hash. It's kind of equivalent to that. In addition to all of this, and that gives you an additional benefit that you don't need to trust the package repository anymore. Like, your tool will check, okay, like, B has commit hash something and A has commit hash something. And when you go and actually fetch the dependencies from online from Pi Pi or wherever else, and you actually check that the contents match what you're expecting, well then you can be really sure that nothing was tampered with, right? Otherwise, say, a malicious package repository could be like, okay, I go and install my packages and everything works fine, but then when Psego has installed my software, it gives me different, it gives them different versions of these dependencies that actually have some kind of malware or something in it. So by giving cryptographic hashes, you enforce that whoever installs their software along with dependencies gets exactly what you install as well. So that's a concept usually called lock-down dependencies. Also, outside of lockpiles, like, you may see that when you're downloading some Linux ISO, like this image, they provide you an MD5. That means, like, if you get the thing, you can run a hash, like, say, that works out. And as long as no one is kind of tampering with the main repository, then you can get it from anywhere and just check that the hash is the same. Any questions about cryptographic hashes or have you guys heard of that before? Should we do a quick summary of those? Raise your hand if you have cryptographic hashes. Any questions about lockpiles? Can you explain what they are and how you set them up? So again, this is all very language and tool-specific, but basically a lot of tools will just do this by default, so you don't need to worry about it. I mean, all the ones that support this will do it by default, and the ones that don't support it, well, they don't support it, so you can't really do anything about it. Yeah, so, yeah, it's language-specific. But usually it'll be in the form of, like, you have one file which will describe these kinds of constraints, and then you'll end up in another file maybe called, like, something.lock, and it'll have the actual hashes or specific version numbers or things like that. And it'll keep both of these, both the kind of high-level descriptions of dependencies, and the lockpile contents under version control for your own project. Any more questions about lockpiles? This will all seem a little bit abstract, but again, it's all very language-specific. So if you understand the concepts, then you can apply them to any particular language and tool. So the next thing I want to talk about is how you specify version numbers. So your software X might depend on a couple different packages. How should you specify which version of the package you need? You can do a couple, you can imagine you do a couple different things. Maybe you could give constraints like I did here. Like, okay, I need at least 1.2 because I'm relying on a feature that was in the .2 line of release of major version 1. And also, of course, I don't want to go to the next major version number because it might be breaking backwards in compatibility. And like, similar thing with this one, except I have the additional thing that I'm specifying, the patch version. It was like, oh, some really important bug was fixed and my software doesn't work with older version of this because that tug messes it up. But both of these are given in terms of constraints. You could imagine, so here it's like I'm specifying minor version. Here I'm specifying minimum minor version number. Here I'm specifying like a minimum patch version. Maybe I can also imagine giving constraints in terms of like I want exactly a version of 1.4.3 of this software. And there's different trade-offs in terms of specifying these constraints in different ways. So giving these constraints in terms of kind of like I want at least this version number is good in terms of letting the developer of these libraries fix bugs and then those kind of will automatically be used by anybody who installs my software later, right? So I released my package X and it uses version 1.2 of library A and then it was someone discovers a bug in version 1.2 and that's fixed in version 1.2.1. Well, 1.2.1 still satisfies this constraint and one of the tools I'm using to install dependencies should hopefully install the latest version of the software that satisfies the constraint and so updates and bug fixes should propagate automatically if I do this style of thing. But there are also benefits to specifying exact version numbers because maybe my software relies on specific quirks, maybe it relies on a specific bug that was in an earlier version of library, or maybe I just don't trust the developer to follow a semantic version like maybe I'm just thinking oh they might mess up and break my software accidentally and so for that reason it might be good to specify exact version numbers and so there's some trade-offs there and you need to think when you're releasing a library or releasing some software which is right for you, I think there's no single right answer but this is something you need to think about. People try really hard to follow a semantic version and often doesn't work out in slight corner cases and so that's something you need to be aware of. So any questions about how you specify version numbers? Like these general ideas we present in most tools you use where you can specify things like what minimum version number you want or whether you want an exact version of something or things like that the syntax will be different for different languages but the ideas won't be the same. Any questions? So the next thing I want to talk about is how do you actually go from things like these constraints to I say okay I want to install software X it has these dependencies, go fetch these dependencies like how is that actually done? Well so package managers use different dependency resolution algorithms to look at these requirements and then figure out how to satisfy them and it can actually get really complicated and I think for a certain class like if you allow pretty sophisticated ways of specifying dependencies you can actually make this problem computationally hard which is kind of neat but this is again just something to be aware of because you can remember really complicated situations like I can have package A and B to C and I suppose like if I look at the dependencies for package B like B might actually depend on C so here's like dependencies for X and here the dependencies for B and say B depends on package C and it has this constraint that needs to be at least version 1.5 and less than 2 Now I have a problem right if I want to install X X needs specific versions of A and B and it needs this exact version of C but the possible allowed versions that B needs of C is not compatible with this so things can get pretty complicated and especially if you're doing things like installing software system-wide which you might do if you have programs that use dynamically loaded libraries you can end up in situations where you want different programs that want incompatible versions of libraries and it can get really complicated and so again here's a kind of situation this is really a solution to it in all cases it's more about this is something you need to be aware of that this kind of issue can exist and here's maybe another reason to be kind of careful in specifying your own dependencies so you don't end up in situations where you get stuck like you want to be as expressive and like clear as possible when saying like okay this is the true requirements I don't want to say something more strict than I need to because it might cause problems somewhere else and trying to figure out how to satisfy my own dependency graph and yeah if you're curious you can look at the source code for how some of these tools handle their dependency resolution some tools take a really simple approach like the Lipscope, you want to install X like look at the requirements one at a time and for each one I'll just install the latest version of the software that satisfies the constraints and then when I try to recursively satisfy dependencies well I'll just install the latest version and so on and if there are any incompatibilities well I'll just ignore it and pretend it works I feel like it works out other tools might do something more sophisticated like it'll look at all the dependencies and try to find the versions that satisfy all the different constraints and then it'll report back to you if things don't work out or things like that again all very tool and programming language specific but the Python package manager for example does not do anything particularly sophisticated here while apt which is one package manager for Linux like installing system-wide programs it does something much more sophisticated here it will report errors to you and even give you an interactive prompt to say oh here's the incompatibility here are the choices you can make to try to resolve it so anybody have anything to add? so any questions about the idea of dependency resolution in these tools? okay so the next thing I want to talk about is something called virtual environments so say you're developing multiple different software projects like I'm writing software project X but I'm also working on some project Y that might have some different and maybe overlapping set of dependencies and maybe the dependencies the versions aren't exactly the same like say I'm developing software Y I'm going to use the same format I used over there so here's dependencies for Y and say this depends on library A but here I need something that's like version 1.3 or later this is kind of fine if I'm working on both version Y at the same time even if I install all these libraries kind of system Y because I could install version 1.3 or later of A and that's still compatible with the set of dependencies for X but it's really crazy really quickly like now suppose I want to use library B in project Y as well maybe here I need some older version of B so maybe I need version 2.0 of B right now this is a problem there's no set of versions of like A, B and C that I can install to work on both projects X and Y for A it works out but for library B like I need some version 2.x for project Y and I need some version 3.x for project S right so like maybe I install this version when I want to work on Y and then go and install a different version whenever I want to work on X and keep switching versions they're really annoying and so there's a better solution for that and one solution is something called virtual environments and the idea is that instead of installing these libraries kind of system Y you install libraries in a way that are specific to each particular project and I can give a quick demonstration of that in the context of virtual and which is a tool for doing this for Python we do a lot of Python demonstrations that I think the undergrad process here is pretty useful so I would do a very complicated demonstration but I'll just show you a simple thing so there's this library called NumPy and I've installed a system Y on my machine so if I just open up Python and look at what version of NumPy I'm using version 1.15.4 well one thing I can do is create a virtual environment and what this does is I have this folder N I can give it any name I want and I'm going to install dependencies for whatever program I'm working on in this folder N and so it's going to be separate from my system Y dependencies and don't worry about the specific commands I'm using for Python or you can look them up later there's a link in our notes for this section but I can go ahead and do like they've installed NumPy inside this virtual environment and I can even install a newer version inside this virtual environment like inside here I'm running version 1.16.0 of NumPy whereas for my system-wide install of NumPy I'm running version 1.15.4 and so this just shows how I can easily have different sets of dependencies for different projects I can have a system-wide install and I have as many virtual environments as I want and just activate one or the other and kind of swap in a whole different set of dependencies that will be loaded whenever I run my code and so they're like this is the virtual environment that's a particular software that's used for Python programs there's even other alternatives to virtual and for Python and then for different languages like say for Ruby there's a tool called Bumbler and so on so you'll need to look up for whichever language you're using what's the particular tool but this is the general idea you maintain separate sets of dependencies for projects some other languages don't really have this problem like say Rust for example what it will do by default is compile a static binary so when you're building a program it'll pull in all the dependencies and just make one fat binary that contains kind of like in-lined or like stuck into the binary all the all the code from the dependencies and so if you're working on two different projects well there's no kind of like system-wide install or shared libraries or shared anything between the programs so you don't really need to worry about this issue so it's not only sharing the contents of certain languages any questions about this idea of virtual environments okay and then the last thing we're going to talk about is something called Vendoring so this is a very different approach to dependency management but one thing you can do rather than bothering with all this complicated stuff is if you're working on a software project and you need to use another library just take all the source code for that library and copy and paste it into your project because you just have one gigantic tree containing all your code and all the other people's code and just kind of build everything at once and don't really worry about dependencies in the traditional sense and so there are some advantages to doing this like you're not depending on packet repositories anymore you're not relying on dependency resolution anymore if you compile your code it's like always the same code every time and so you know exactly what you're building against and certain people do this like for example Google I think means one gigantic repository and it has all their code and they've also vended all the code that they're using from other people because they don't want to be relying on packet repositories they don't want to be worrying about like author to library A accidentally broke something in a patch version and now they have to deal with it and so Vendoring is a pretty good approach in some ways and so it might be something you want to think about if you're working on a software project it's a very different approach to dependency management and so that is all we have for package management all the high level ideas we think you need to know and then of course to actually use of any of the stuff you need to look at the documentation for the particular tool you're using for the particular language but that should be easy once you understand stuff so any questions? okay so let's take a 10 minute break and then after that we will talk about OS customization followed by remote machines covering topics like SSH and SSHFS and so on