 Yeah, libraries. So what do we... What's the point here? Yeah. Well, it's sort of like a big tour, so you can basically sit back and relax and we'll discuss some. Maybe we can make, we can even make a panel discussion of different instructors. I'd like to point out this one quote here, which a student once told me when I was asking them about their computing career. So they said, when you're a student, it's sort of expected that you make everything yourself, otherwise you're cheating. When you're a researcher, you have to reuse what others have done and not just researcher, but anything professional. And this is something that you don't often learn in courses. So I'd like you to keep that in mind and sort of think about it. So it's good to do things yourself to practice and learn to do new things, but it also slows you down and spreads your mental effort around. So if we can convince you that it's a good idea to use other libraries, then we'll have done our goal in this course and this lesson. And I'd say there's two main categories here. So there's sometimes, some of you find very well-maintained libraries that are used by others like NumPy or SciPy or Matplotlib and many other relatively large things. These have many users and many testers and it's a good idea to use them. You can also find a wide variety of other public code, which might work for some purposes, but it's probably not well-maintained. The author might not even fix bugs if it's still there. It's like published as part of an article and it's like, OK, good luck, use it if you want, but I'm not there. We're going to talk about the first one of these, the SciPy ecosystem, and then the second one. Maybe it's worth pointing out some of these terms we've been using. So I think these are according to Wikipedia. So a library is a collection of code used by a program. So basically something you can import. A package is a library that has a little bit more work and has been made easy to install and reuse. For example, in Python, this Python package index, once you put something in there and make it a package, you can do the pip install, and then people can use it very easily. So when I first started, I made libraries and sort of collected code and would import it, but not package it, and that code never really got much use. And then a dependency is a requirement of another program not included there. So basically if you make code that uses SciPy, that whoever uses your code has to install SciPy. This is good because you're using something that's professionally managed. It's bad because they have to deal with installing that. Usually the good outweighs the bad. So what ecosystem do we have here? It's pretty big, and we've been discussing many of these tools already. So of course, Python is the thing that is the base of it all. NumPy is probably the second most important base. It has arrays, and all of these other packages use it as the main tool for arrays. SciPy, MatPot, Live, Pandas. I mean, Pandas directly uses NumPy as arrays, so does SciPy. And so do many other packages. This was a question on day one. So how do all the other ones relate? So if you scroll down, there's a huge list here. It's not even worth trying to mention everything here. If you see anything you like or you think is missing here, can you comment on HackMD and let's see what other people use? Yeah, these are basically copied from lists of other things. From the SciPy stack. But yeah, as you scroll down, you see all of the different kinds of tools, especially say if you're in machine learning, you have several things which you're using all the time. Then there's this point about connecting Python to other languages. So this is done with SciPy and where it's wrapping these other C or 4-Tran modules. It's a pretty common thing. I mean, NumPy also does that in practice. The operations are written in a low-level language. So I guess many people don't need to go this far, but when you're writing something that needs to be fast, it's really not too difficult. And there's a variety of tools that make it easier than writing directly in C and making that interface basically automatically. So what's next? There's this section evaluating packages for reuse. So when do you use something? Yeah, like Gerardo and anyone else who's on the call. How do you tell if you'd want to use a package you see? Are there any things? Great question. I mean, I'm probably looking for something for a very specific purpose. So I will try to find a bunch of options and figure out the best one. That's a good point. Oftentimes, I might find several things and I try to evaluate which one is currently being developed, which one's better packaged, and so on. For me, I think one important factor is whether the documentation is good. Yeah, that's a good point. Yeah, the first thing I would start with is start reading the documentation and figure out if it does what I want it to do and how to use it. And if I can't figure that out easily enough, then it's kind of not an option. If there's a public repository for it, either GitHub or GitLab or Bitbucket or something, then I take that as a very good sign. Because I know that if there was an issue, people could post it. So I can see the general activity. I would go to the GitHub page when we're the last commits to it. If everything is from 2008, then maybe I'm not going to rely on it. It depends on how big the project is or how much you expect it to do. So if it's implementing something published in a particular paper, you probably don't expect a lot of changes. And if it's doing a really simple thing that you know you could write it in a week, then again, maybe if it hasn't been changed since 2008, that's not a huge problem. Maybe you bring it up to date. But then for anything bigger than that, it's very suspected if there are no changes since 2015. Yeah. This automated testing is a good point. So if every commit gets some tests run off of it, then that means they have some built-in error checking, which is actually something that you learn in a big code refinery workshop, which we'll mention at the end. Oh, actually, that's mentioned here. So if you want your work to be well-made so that both other people and you can reuse it, we have workshops called Code Refinery where you learn about testing and documentation, and making it modular, and version control, and all these kinds of things. And yeah. Okay. So in the upcoming lesson, we'll talk about dependency management. So once you have all of these different tools you're reusing from Python, how do you record them so that way other people can easily get the same environment? This is something that's important for a lot of research. Because if you can't reproduce the environment, you can't redo your work. And here's some exercises, which we don't have time to do, but are more of discussion than anything. If you would like, check out some of these packages and see which ones you think you would use and not use. As a hint, several of these were written by me many years ago, and they're not good. So they're basically this either code on a web page or no test, no documentation kind of things. But it's worth opening them and having a critical eye and seeing how you would tell them apart. And that's basically it. So does anyone else have any comments about tools? There's a good comment about the Scython language in HackMD. I'll answer that in the HackMD. I have used Scython before and I quite liked it. I mean, there are other similar options. And perhaps if you're starting now NumBuy is something to try out, because then you write your code in pure Python. Right, yeah. Yeah, maybe Scython. I used it in like 2013-14, that era. And I think there's probably some better things by now. But if you basically want to write and see or make an interface between something, well, yeah. I mean, if you're comfortable writing C, then Scython is a very good option. There's questions how to build a package out of your project. So this is actually a topic for tomorrow. So I think the very last lesson is a demo of taking Python code and making it a Python package. Oh, and by the way, it's break time now. Yeah. But I guess that doesn't mean we have to stop looking at HackMD, but don't forget to break walk around for the last hour. Also, if you can comment in the HackMD, did you like this lesson? Should we leave it or remove it for next year? And so on. One interesting question from the HackMD. How do you handle dependency mismatches? Dependency mismatches are one of the pains of your life once you get further. This is one of the reasons why I tried to minimize dependencies and use stable ones. Was the question about like two different dependencies require a different version of a third one? Or two things that you want to incorporate depending on the same library, but different versions. So is it bad if my answer is don't depend on things that have very strict dependency requirements? I actually have a whole lot to say about this. I could make it like an hour about why I think so many things are wrong with the way people do dependencies and trying to pin them too strictly so that way you make your code incompatible with more code and so on. But yeah, I very strongly consider this backwards compatibility and stability. And I would prefer not to depend on something which always requires the latest version of something else. That does cause a lot of problems. I guess the answer is environments, virtual environments or condi environments. But that doesn't help you if you have the problem within one. Okay, yeah. If you are developing a package and you have that problem within a package, then you depend on another version of something. So interestingly, PIP, perhaps until recently, didn't even try to resolve dependencies. It was basically a first match. And if there was something conflicting later, it would just use whatever came first, which had two side effects. First off, there were very rarely major dependency conflicts. And second off, stuff mostly worked. Yeah, so basically to me that showed that trying to pin your dependencies too strictly wasn't even necessary because people relatively rarely had that problem. And then you have condit which actually does try to resolve dependencies. And it takes a super long time to do that. Like it might take half an hour for reasonably complex environments to get resolved. Yeah. But okay, this is another rant for another time. Maybe we should stop talking to have five minutes of real break. Please keep talking at Hack and DL, I'll be answering there. Okay. Bye.