 Hello, welcome back from the breaks. So we will now move on to something about the library ecosystem, how the Python libraries are set up, and also after that we will move on to dependency management. So Simo, have you introduced yourself already? Yeah, I introduced myself yesterday. Yes, so my name is Sabri, so disclaimer, I'm not a Python programmer, but I support users on our high-performance computer systems. My current job is as the manager of an origin AI cloud. But I also contribute to Code Refinery and Alto and all the partners whenever I can. So this library ecosystem Simo is that when you want to achieve a certain task, somebody might have already made some code that you can achieve the thing you want to do. So instead of you try to write it again, you could reuse others' work. In the first introduction, so there was this interview with the student. So when we study, it seems people tell you not to take others' work, you know, you have to do everything yourself. But when you come to real research, practically you can't do code everything you want. For example, if you want some matrix transformation or Fourier transformation, some array sorting kind of thing, it's already there out there. So you have to use that. So we will talk about how to reuse work. So this is not like stealing other people's work. This is actually reusing and collaboration. Do you use a lot of libraries Simo? Yes, all the time. So I would say that why Python is so popular, especially in scientific context, is because of this library ecosystem. Like Python is not the perfect language for scientific computation. Like what for example Luca said at the icebreaker session this morning that there are languages that try to like are designed for like scientific computing. Python is not one of them. But Python is a very general language. It has all kinds of things. And the library ecosystem makes it possible to have various things. So like Sabir said, like in the Chopin interview, something you might be asked to do like a quick sort algorithm or something. But I would never trust my own algorithm compared to something that is in a library ecosystem that has been validated by hundreds or thousands of users. So yes, I use a lot of libraries. I use a lot of libraries all the time. I don't want to reinvent the wheel if somebody already has read like perfect at it. Correct. So if you were there in the first day, when Richard and Yano were introducing some beautiful introduction to the course. So Richard asked from Yano, when did you start using Python? And I know for a fact that Yano is a very heavy user of this Python, but he has to think a little bit. When did I actually use Python? So that's a very important thing that you mentioned. That means like Python itself is not the very useful part of it. It is just what people made out of Python. So earlier, if you remember that I don't know when you were young how you will learn programming, they taught us pseudocode. And you have to learn how the programming tactics in order to actually write a big by the time you end up writing some code, you have learned about the semester of what programming is. So this Python made it more democratic. It doesn't matter where you come from, you come from biology, mathematics, space science or whatever it is, you can just start achieving something. So that achievement comes from the library ecosystem, as you said. Also, I would add here that like, for example, my first start in Python, when I started working on it, I was in a summer job in Alder University, actually. And I had to convert a code that was using Python numeric, which was this really old library to NumPy. So I had to convert this code that used this old library to a new library. And the reason behind this that NumPy, like we have spoken already, it's underneath it, it's C and Fortran, like underneath it, but nobody wants to touch that part. Like nobody wants to see that part because it's harder to write and more laborious to write. It's much easier to work with the NumPy, I guess. And that's why like Python is so popular that you can write these like other things on top of other things. So at the bottom you can have the C stuff and the Fortran stuff and that sort of thing. But you don't ever have to necessarily touch it. You can just interact with Python things. And that's fine. And that still gives you most of the speed that C and Fortran would give, but much more usability. Correct. So it is not the time you first tried the hollow world. You know, it's not the Python usage. People actually use Python when they use a library and do something. So that's why people think a little bit, you know, when did I actually use Python? So it's not the Python hollow world print. It's actually the library ecosystem. And then it is very nice that we have placed all these terms in the same screen because taking them alone, libraries, packages and dependencies. Sometimes it's hard to sort of comprehend what this is about. So I like to think of libraries as like a screwdriver. You know, you can, it's not like well documented or well sort of made for like there's no big description of how to do this, but people who know to use it, they know how to use it. But the package is more like tool set with a set of screwdrivers and there are some instruction and they're on different gauges in different lengths. So we have this difference. So is that a good way to look at it? These libraries and packages. So how do you, how do you look at? Yeah, like, like, like libraries, when we talk about libraries, we can often mean multiple things. For example, like library can be like, usually everything Python is a module or so. Like I'm not completely certain what module means in the context of Python because it's used in so many places so that it becomes a bit more like, like it's used so often that it loses the meaning. Like if you say a certain word enough, you don't know what it means anymore. But usually like libraries are like the libraries and modules, they can be a bit like all over the place, unless they have been like packaged together and made into like a complete set of like tools, tools and like usually when we talk about packages, we mean that there's like a bunch of code and you don't want to look what necessarily what's inside of it if it works correctly. But you want to interact with that code with certain like functions or objects or certain things that it presents you. So for example, like you, NumPy arrays have a lot of like hidden things inside of the hidden attributes and that sort of thing, but you don't want to work with those because they are inside the package. You want to work with the functions that the arrays present you or what the developers of the package present you. You want to like only, only deal with the outermost layer of the package, because that's what the developers of the package want to give you. And that's how it usually goes that you have like outermost layer of nice things that you can use and inside there's like, like a whole mess of stuff, but you don't necessarily need to worry about that because that's the package managers problem. Yeah, you're presented with the easier interface for what you want to achieve. And it becomes dependencies, it's also like libraries and packages, you know, if you're building something, you know, let's say if you want to like build a table, you need like nails and balls. So these nails and balls are sort of like dependencies. So the table could not be made without those things. And then these nails, they are specific length type and good, you know, not all nails will fit all joints. So there are when it comes to dependencies, it's about other things that you need to for your code to work. And also, there are some specifications that we will go on later on. And then we talk about this sci fi scientific Python ecosystem. This is maybe the ecosystem, most of the research community they get. They would think that they start using Python, you know, when they start using this. So we have this NumPy sci fi macro clip, Panda for data structures, you know, Panda could, people work, research work with data, it could do wonders, they get really amazed how Panda could help them. They could scrape web pages for tables, they could create data frames, and then you have the NumPy with all the optimized array functions and all the mathematical functions there. So this is kind of the ecosystem that people start actually interacting, not all people but no researchers that we interact with. Yeah, yeah, and I'll just mention quickly that like also this ecosystem is highly influential to other packages. So like, like they are like yesterday, there was questions in the chat, for example, about followers or ask about like, which are like built upon pandas, but do pandas more efficiently or like more in parallel or something like that. But they usually like built upon the syntax and the ideas behind these packages. So like, there are lots of packages that do the same thing, or again, but they try to reuse as much as they like the language that NumPy and pandas and mudcloth live and like they do similar kinds of things, but they might do it a bit differently. Like, or, or better, or something like that, but or efficiently or in parallel, but they usually reuse the language, they reuse the same kinds of concepts. So if you learn these like core concepts around like the NumPy ecosystem, you can then transfer them to other other things or in other ecosystems or in other tools that use this. Yes. So in addition, you know, there are other packages like no psychic learn, you know, we could they have this in this is not like exhaustive list here. So this is, you know, a part of it. So if you're doing some classification machine learning, even psychic learning is using these packages. And if you extend things like pytorch has a like optimized version of NumPy manipulations for machine learning of related operations. And the rest is really similar. This is for people to read, you know, about other packages. So I would not spend time on reading through this because I think the next lesson of dependency we can talk a little bit more. The only thing before I give it to you. I want to mention is the this pie connected to Python to other languages because we will talk about Konda for example next. So connected to other languages, the library that we were talking about NumPy is a good example. So there are certain things Python. You could do it but there are other programming like like languages like see could do it better for initially when the when the code you write, it could write on your laptop, you know, run on laptop but if you go to go to like high performance computer system or like a bigger server or to achieve bigger things. So you have to use the resources optimally and also to understand the underlying hardware, for example, how, how, why these your processors registry for example, you know that kind of things we don't worry about when you do Python program. So Python provides these libraries behind the scenes that they use C or Fortran or other languages. So remember Richard slightly mentioned that you have this one data, one, one program that you want to introduce different data sets. So we call this the SIMD no single instruction multiple data kind of work. So those are better done in other languages than Python, but you don't have to learn all those things, you only need to learn how to call those. So if you want to do some vectorization, for example, if you have a loop that goes through 10 things you could do Python loop. But if the loop is going through 100,000 things you don't want to go to the 100,000 steps one by one, you want to go 100 steps at a time or 1000 steps at a time. So those kind of things we call, you know, loop vectorization could be handled by this code, but you could still use them inside your Python environment. Is there anything else that we want to mention here before I give to you. Yeah, I'll quickly mention this last chapter about evaluating Python packages. So, so how do you like know what is good like this. If you go to GitHub, there's like million packages, and there is no like single way of determining what is good or something is bad. What I usually do is like, I try it when I when, let's say our user wants to solve a certain problem, and they far wonder if they should reimplement an algorithm or if they already exist an algorithm that does it. I try to find with the corresponding keywords, like a git repository or something in GitHub that that would provide such a package. And then I look for various factors like is there a community around this package? Has it been validated scientifically? Is there like papers published from the package? Are there like lots of stars? When was the last commit and these sort of things like you need to usually check, like what is the community around it? And what is the is it is it trustworthy source? Like is it some random guy somewhere? Can I can I actually myself validate it? Of course, I cannot validate like, let's say numpy. I don't have time to read the whole source code. But if it's like one, one page long, maybe I can see what it does. And these sort of things, there's various factors. And here's some questions that you can ask yourself when you're trying to validate. And I would say that it's always a good idea to first look if somebody else has done it, and maybe check if you can extend upon it. Like, for example, many, many frameworks like, for example, scikit learn, they provide a very comprehensive way of extending their own things so that you can, like if you write the way they write their like models and that sort of thing, you can reuse all of the other tools that they provide. And then you don't have to like, like, figure out how to do like a cross validation or something like that yourself, because you can use the algorithm that they use for cross validation for your model or something like that. So it's usually a good idea to check what the packages provide, how to extend. Usually there's like a development document somewhere where they say that, okay, how to extend this package. And then they provide like, guide on how to, how to like write my own stuff on top of the already accessing library. And it's usually not that hard. It's usually much harder to start something from scratch. And of course, sometimes it's needed, but a lot of times you can build upon what other people have done. Yes, so the, I will only comment about the security part because you mentioned the other, the list, there could be malicious packages as well. So there is trust. So research is based on trust actually. So the, so when you when the, when you use a library, you could, especially if the library concerns like web security, for example, you can't just use a just because it's doing the one thing what you do, you need to have a proper understanding of what it actually do. And also there could be malicious libraries misspelled malicious libraries, you have numpy and you like num die or something, you know, just to me, so you have to be a little cautious as well. Then I think we should go to the next lesson about, you know, how we manage dependencies. Yeah, yeah, let's let's jump to that. So I'll take the screen share. And then switch to mine.