 And now we have 15 minutes and we can basically answer any question you might have. There's a there's a good question about data storage of this like, so it's like, do you usually keep the important files on your workstation of laptop and you then have to transfer them to Triton and do the calculations there and then transfer like there is us back and that's usually how it happens. But once you like give the ring finger then like it's usually you turn out turn turns out that you move more and more of your work to these clusters. Like that's usually what happens like once you become more accustomed with the system, you usually can start doing more and more stuff in the cluster itself. So you need to do less stuff on your machine. So on my for example, on my laptop personally, like I every now and then I just erase my whole laptop. And like I the most important things from my laptop are my configuration settings that are inversion control myself, but everything else I have it in in the cluster basically like I don't. And if it's important, it's inversion control usually. I don't usually like, have to do anything with my laptop. Yeah. Because if it's on my laptop, I usually think that it's bound to like if you if you like have a glass of water and by mistake, this is a ghostly glass of water. If you have a glass of water and you by mistake, pour it over your laptop and your PhD or something was in the laptop and that was the only place you had it there. That's not a good idea. So it's usually a good idea to have the least important place to be your workstation or laptop. Yeah, let's see modules applications. Yeah, I mean, how many days do you think data storage is worth talking about? I really feel this could easily be a whole day talking about the options, best practices, things like that. Yeah, it could be a whole day, but at the same time, like, like talking about something is quite boring. So I hope that the start of the day when we did the middle of the day when we did a lot of these handsome things, I hope that was helpful for you. Because like, it's like this is important, like, like a lot of your work will involve like how do I manage this like my workload? How do I manage my folders? That's sort of thing. How do I like transfer stuff and workflow things? That's actually like a big part of your work day. But that's not fun. That's not yeah, that's not why we are here why we are here for the computation. We want to run this stuff. We don't want to like, but that's actually what like people use a lot, a lot of time doing. And actually, this is a good point. So like when you're doing your courses and stuff like that, you have an assignment, you do it, it's done, you move on to the next thing. So projects don't last long enough for them to get really messy and need lots of organization and stuff like that. But once you start getting on the cluster, the things you start now, you might still be using some of that data and some of that code for or five or 10 years from now. And that's a whole other level of challenge where there's almost no preparation. So yeah. And like you will always learn better practices like after you have done something like like everybody has encountered that they like when you when you do something, then when you like forget about it, you come back like two years after the fact and then you look at your own code, your own practices, like what sort of things you did like they will look completely horrible. And and it's good. That's good thing. Because then you know that okay, you have moved forwards. And and that is good to remind remind yourself. So of course, the most important part is that you get stuff done. But you will also improve or you can also improve it. And you should because that's that enables you to do more. And this is all about the course like in order to do more, you need to learn new practices, new things to do that you can do. And then when you use those things, you become even better at doing them and then you can do even more. Yeah. There's a question all the way from the start of the day. Maybe a naive question, but what I'm hoping to do is transfer my our workflow to the cluster for speed parallelization of tasks. I see it's possible to run our studio on the cluster of our University of Helsinki. What would be nice in terms of Miller, which would be nice in terms of military? What a demo of how to do this be possible? Um, we don't have our studio installed on our cluster. But on open on demand. Yeah. But like, like, like, this is a philosophical question. But I would say that, like, I agree that our studio is is a great idea. And it's a great place to code our code. But the thing is that like, if you want to move past the one art studio, you need to take like the leap of faith that that at some point we need to like, like, nobody wants to use command line. But at some point, it's, it's basic. Well, okay. Yeah, some people do like to use command line. But yeah, you get from now, half of y'all will want to be using it. Yeah. But but the thing is that why these tools are created in such a way is because of the possibility that can be achieved by this, because like, you cannot run our student or basically you can buy like 50 laptops and run our studio and all of them. But like, that's an insane amount of work to like, set them up and and like run the code in all of them. And if you're willing to like, press the shift, enter or whatever, you know, all of the ideas, then, of course, you can like do it. But at a certain point, it becomes much more efficient to code it up and make it in the scripts. So the corner of art studio stuff into our scripts, and then use our and our script to run it on the cluster, because then you don't have to like, be there watching it. Exactly. Yeah. And I mean, you can use our script to do the initial development. But when you're running it on the cluster many times, then you've copied the files there or mounted them from the cluster and you are like, you're running it from the command line there. And that's probably the balance that I'd be suggesting. Oh, let's see any new questions. Want to run a program different Python module? Should I install them? So there's a thing you can read here about installing extra Python modules. So we haven't gone into this in detail, but usually it would be making a new virtual environment or content environment, which are like a self contained box that has everything there. So it gets Python from outside, but then everything's in the box and you can install whatever you need there. And basically, it's your choice in your work directory and your home directory. If it's not too big, these kind of things. Um, yeah, in the feedback, I really don't believe this feedback. Someone has to think that it's too fast or too slow. Okay, yeah, I agree that like some of the theory might be too slow and exercises too fast. Yeah, I completely agree. I think the problem is that this kind of like dual like how could I say like it's a double it sort that like at the same time, like the exercises, yeah, like I think we probably need to spend a more much more time in the exercises. But it's very hard to find like the problem is that it's very hard to get exercises that explain the concepts well. And we try very hard to figure them out. But basically like, because it's very hard to get the grasp of like, okay, now I'm here. I'm on the login note. I actually am in my office. I'm looking at my laptop. And in my laptop, I have a small screen at a small screen is now on some kind of different machine, like in a different system. And then I write this file, and then I write the command. And now suddenly it runs in a different system. And then I get like output and then I like read it in. It's very hard to get a grasp of that. And yeah, or to get exercises that try to demonstrate the whole complexity of the system. It's like this kind of like, you know, the idea of blind men filling up an elephant. And one person says that it's a three trunk when they examine the leg and somebody says that it's it's like, you get something else. So it's very hard to get like examples that explain the whole complexity of the system. And that's why we sometimes get a bit wordy, like this answer itself. And it's super hard to make exercises that work for everyone. Like a lot of the advanced things would work only on one cluster or many of the data access things would work only on one operating system and we need site adjusting for other things. And there's been other days when we've done the data storage and copying first and it ends up taking half the day. And still it's not really useful. So my philosophy of this course we've shown you the basics come to our garage and will help you make it work for you for anything that we're not doing here. Yeah, the problem with the data storage and stuff like that is that like, it suddenly becomes like there's only one way of submitting a job, right? There's only one way of submitting like serial job you use as much to do it. And it's quite easy to say like what sort of exercise you need to have. But when it comes to like file transfer it depends a lot on your workflow. So what sort of work are you doing? What sort of like what operating system are you use? What way of like what do you use terminal a lot? Maybe SSH FS is good for you in that sense. If you're using a lot of graphical applications, maybe you want to use the Samba mounts, you suddenly become it becomes a lot murky. What is the recommended way? And unfortunately, it becomes also like we don't want to pick favorites that much like which which one is the is the way we want everybody to use because everybody has their own way of working and their own preferred methods. And when it comes to these file transfers, for example, and that sort of things, it becomes a lot harder to say one definite answer. Yeah. Yeah. Are there any questions about the actual topics of today? Maybe at the end of tomorrow we can try to do some of these real full examples again and show the loading of the modules and stuff like that. Or is there any quick example we could do now? Do you want to do a quick demonstration of making a conda environment and installing something? Or should we call it a day? We actually have other videos that go over this kind of stuff that we can link. Um. Well, uh, yeah, well, I can do it like like, yeah, sure. And I would say that like if if, uh, if I would do this, um, if you want to share the screen. Oh, yes. So to see most screen, there we go. So the way of doing the conda environment, like I'm not going to show you just like like out of memory because of course I remember I wrote the instruction, but the better way is to show like, okay, where do you find the information? So so right in cluster. We're here. And then over here in the applications, uh, we have environment with conda. So let's see the first time set up. So I haven't, I've removed all of my settings. So I, whoops, again, the zoom, but this is the worst hardest part. Um, yeah. So I haven't done this first time set up because I removed all of my settings. So I'll, I'll first check that I, I have a module loaded. So I'll restore this. So let's do this. So I'll just. So it says here that by default it installs packages and environments in my home directory. And I want to put them into my work directory. So I, I run these commands. Well, these probably say that these folders might already exist, what would say? No, okay. And let's, I'm all set to create my first environment. So let's use this one. So you're making an environment file. So what Simo's doing instead of making it and installing stuff ad hoc, we're going to make a text file that defines everything. So that way we can delete it and make it again. When yes, and we can specify where we get the packages. So we use this open source condo for repository and we install numpy and pandas. So then we can actually create it. Okay, so now you're making it from environment level, where does it save the environment? Now it saves it into the work directory because I run this, this commands that make it. But again, like, it, it depends on the cluster. So for example, in CSC cluster, I would go to the CSC documentation and check the tool that they recommend in the chat as well. So because, like, not everything is applicable for every cluster. And also, if I, if I would be like, like, I'm going to say it, if I, if I cheat a bit, I could check here that below it says that maybe I should, maybe we should do this by default because I haven't used condo for years. So, so there's this tool called mamba, which does these installations a lot faster. So I'll, I'll actually like cheat a bit because this will take a while to solve this environment. So I'll cancel this. And instead of using condo, I will just use mamba because this will take forever to create. Yeah, I mean, I guess people watching probably can't follow this at all right now. We're going so fast. Yeah, unfortunately. Yeah. Maybe we should call it a day. Now it's after four. So yeah. Yeah, maybe we shouldn't have done this. I mean, this is just too fast. Yeah. Yeah. Okay. Point point is that there's there's way too much information in this course than that already. And there's way too much information when it comes to like these kinds of systems because they're complicated. And it's you if you take one step at a time and you contact us and read the documentation that's usually like a good way of getting there. Yeah. I will post a link to a video of a past year where one of the special topics we gave on day one was using Python environments with condo. No way. This is good. Okay. Um, but yeah, I think we're all really tired by now and it's my cat's feeding time. So my ability to do much else is really limited. So let's call it a day and hopefully see you tomorrow. And tomorrow we start running on multiple processors in many different ways. So it's really what we came for. So make sure you come. Okay. Thanks a lot. Yeah, thanks a lot. See you later. Bye.