 Yeah, so next up is Enrico who will talk about the scientific computing workflow Hello Yeah, I'll join Enrico in commenting on the on the talk as well like we get some rapport Multiple views multiple eyes on the same issue Yeah so Enrico's screen is shared so There you go Yeah, so next up is Enrico and Simo and They're talking sort of the different tools and strategies for doing things like basically from your laptop to the largest clusters to sort of frame this in the In the big picture Okay Yeah, so and that's just briefly as a reminder for those who register in the email You also have this schedule and there you find for example the clickable links of the slides It's not really slides. It's like a webpage Where we collected some useful info as well as the hack and delete and whatever is useful and we will have a break at 1250 so that's enough you have 10 minutes for stretching your legs today Maybe it's a bit you know on one hand might feel a bit boring because there's not much coding today or running scripts on a cluster But we notice from previous year that these kind of introduction Setting the context for some people especially there. They're really they're really important because they just never had this basic basic training on on this topic So these slides that you find in the pace that I was just showing You know, it's it's the minimum that one should know about scientific computing Or even about just computing There is this funny xkcd comic. So this is I mean this this is about machine learning The guy saying this is your machine learning system. Yeah, you put the data here There's some formulas and then some some output comes out. And what if the answers are wrong? You just did it. I mean you can replace machine learning with any method used in any scientific field And and and basically that's it or at least that's that's the first approach at least I had the first approach That it's like I have a black box. I put some numbers inside. I get something. I hope that you know, they they are fine and I can get them published So this type of workflow is, you know, in in in any scientific at least in any quantitative scientific Scientific field we have we run some experiment or we make some simulations so that we produce some numbers the raw data and Then we might have some models because we invent models they to driven models Or maybe we have models from the literature hypothesis from the literature We crunch the numbers here and we get, you know, some tables be value devalues Maps whatever is significant in your field and then we you know, put them in figures into posters into papers We publish we repeat and then we that's it Enrico quick question. So often I get a feeling when when looking at Well, talk when people talking about scientific computing that people are only referring to these like models and only before Talking about these like how do you like let's say calculate some Matrix inverse or something like that you it's only about that But isn't this like everything here related to some of the computing like all parts of this flowchart Yeah, maybe I mean we were discussing this earlier I think I found one one definition on the scientific computing is the collection of tools techniques and theories to solve a compute To solve on a computer mathematical models of problems But I think that this day, maybe you know in recent years people talk about data science So at least with the tools we use and with how people use the tools it doesn't you know, it's not just about inverting matrices or Optimizing some existing algorithm to make it faster It really is you know in I would dare say that even people, you know Using excel to compute whatever the a t-test on a table It's it's scientific computing because yes, it's it's more about the like The process than necessarily the tools used to do the process. Yeah The process that allows you to basically process process some data that is in in this case used in a scientific context So This simple schematic here Kind of tries to answer the question. How does computing happen? I'm sure most of you have come across something like this That you know, they have the box the piece of hardware That has the cpu and the ram and disks and other things. So this can be your laptop or your workstation And then on top of it you have the operating system And you interact with the operating system. Let's say with matlab with r etc so often I Okay, I say when we try to help people It's it we shouldn't give give give for granted that everyone knows what the cpu was a gpu. What's a ram? So here I collected some glossary from the amazing wikipedia, which is the source of knowledge for all the students and scientists in the planet, but But already like if you can think the cpu as like the where the computing happens Well, the operations the sums and whatever happens and the ram is this memory. So where you can keep In a in a in a in a physical space store, you know the numbers that you're working with so cpu and ram they're often Talking together and then I'm sure some some of you already using gpu and some of you would like to start learning gpu Basically the idea is that the architecture the hardware architecture is more parallel So the g stands for graphical processing unit because they kind of were started Coming out for for for processing the graphics for for fast graphics But now they used a lot also for computing and then what you have on top of the hardware is It's basically what you are what you are Already familiar with because most likely you're using a laptop or a desktop to do some Some computing some data science There's also like the question of the asst like over there like what's the difference between like in finish At least people especially from the olden days like in the 80s people are talking about Like moisty like they were talking about memory and and there's a difference between like this ram memory and this Hard drive space like well hard drives are going away sometimes on laptops. But what what is the difference? How would you formulate it? Yeah, that maybe if you think of this ram It's literally the numbers that you really need To access them fast because they need to be computed by the cpu because they need to Visualize them in a in a in a plot. The ssd is where you store the files I understand that even now with the with the newer generation the concept of file and file system starts to be difficult because if All you all you brought up with is just iPads and and things like that You you you don't see the files anymore. You don't see the file system But this is more like a A persistent storage that is still within the same hardware and here actually I didn't mention the cloud storage because Yes, this can be in a box that you physically can touch but you might have data that is stored somewhere else in your university on google drive or So that you know, you can plug external data and run it on your physical box But then as I will show you later Sometimes the physical box is limiting and then we we and this is why we're here Yeah, I would also like one analogy I would probably give them is that the ssd is that like if you have phone If you have a phone and then you have cpu that executes, let's say you run a firefox on facebook And I don't know messenger. What's up? You have run different applications on your phone and then you have they run on the cpus And the rum is that like when when your phone starts to get slow And you then you need to maybe reboot it or you need to close some applications That means that rum is usually running out because you're running too many stuff too much stuff at the same time And the ssd is the one that runs out when you Get another operating system update or you get too much photos on your phone or whatever And then you need to sync them to cloud or whatever like that is the storage for the applications Like when you need to install a new application and then it says that you don't have enough space then The ssd is that space that is like this kind of a storage space whereas the other space is The run is when the application is actually running and it needs to store the The valuables of the application So what do I need to make computing happen? Most likely you are already Touching right now a box whether it's a laptop or a workstation where you could already get something done You are able to run python or matlab or whatever But sometimes you need that you need a feeling that you need to scale up to a bigger system So often the perception that people have is that okay now I get access to an hbc cluster Suddenly i'm not limited by my laptop that only has two cpu's and whatever amount of rum Because now I can run my code on n cpu's maybe 100 cpu's The issue of course that people don't realize is that You know, I give you the example of pasta not not just because I'm Italian but If if it takes 10 minutes to cook a pasta if I have 10 pots It's not going to take me one minute to make the same pasta, you know So some things will always take 10 minutes even if even though I have 10 pots, you know So in practice some tasks are not ready To be parallelized as they are Simo will actually cover this more deeply and with practical examples on Friday on the parallelization But this is exactly when you you know, basically the most difficult question is understanding your computational needs Do you really need more cpu's? Do you really more need more rum? so In this part of the Of this course is basically knowing already that maybe what you need to do Doesn't require parallelization just requires long time because it cannot be parallelized and still HPc can be helpful because then it means that you can leave something running for five days on a remote system And you don't need to leave your laptop open for five days and nights hoping That you know that it doesn't melt in the in the meantime Yeah, I'd also add to this that like often like especially if you have recently, let's say bought a laptop or something like that You see like this all of all phone or whatever you see They're advertised based on like like this as 2.8 gigahertz processor and this has that and that's amount of ram and memory and And something like that You see these numbers that are like higher and you think that the higher numbers mean that everything is better But that's not what HPc is about Really, of course the numbers might be higher, but usually it's more about Like how do we make a lot of them a lot of these things discuss with each other more? So how how do we get more CPUs discussing with each other so that they can collaborate and do something together and how do you like Do this of course sometimes you just want bigger numbers, but in many cases if you actually need If you want speed you want to Or like more you want to let's say run Just your own model Like hundred times you you can you can use hundred pots to if you need to cook for the whole family You can use like 10 pots to cook for the whole family and you get the stuff done faster so so there's various different ways of of like Working with a lot more resources, but it's not only about like you get like a faster machine It's more about this communication and and this brings this problem that usually the Yeah, the user themselves needs to know how the program behaves Like can the program communicate? Can the program understand that? Okay? I need to do this part and the other Other part of the program needs to do other part and uh, this is like It's there's no one way of solving this Usually it's trial and error or reading a lot of documentation, but But usually it's This is the the kind of way that hbc works Yeah so Basically you are here most likely because you reach this stage to your computer It's not enough for your computing needs and you need to scale and so where can you go for doing that? I skip the text, but let's just focus on the picture This is kind of the graphical picture to That you can save in your head to understand where the computing happens with hbc or with whatever Remote computing you are basically here somewhere on the internet with your home connection laptop or desktop machine whatever you're doing there But sometimes it's not enough because you don't have enough resources because you don't want to leave your Your laptop on for seven days running matlab or python hoping that It doesn't crush is So then some people, you know, maybe you you belong to some department where you might tell your office Workstation and some departments some universities. They allow you to that you can basically connect From remote to the workstation and then maybe you have more power there. You can run things things better There are other tools that basically even to a web browser you can connect to Again, some some other books that often are called notes So some some physical machines and sometimes also virtual machines That would basically allow you to run your your code a very famous one. Is this mybander.org There's a there's a collection of links here in the bottle where you know, you can try jupiter You can try binder kaggle as this kernels. So, you know, there's there's there's many ways of running computing without using your laptop anymore so that the computing can happen can happen every month But then also these services, you know, they might have limitations at alto and at elastic university We have a service called vdi which is like the virtual desktop interface I can give a demo if there's sometimes in the end It's basically that from the browser you can see a workstation in your university like at alto or elastic university And you can do the same thing that you would do in your remote workstation But also there there are There are limitations. They're not too powerful and they usually basically log you out after 24 hours So then comes the hbc cluster where usually there's an entry point The login node where you basically connect to this entry point And this is where we will test today at 3 p.m. That you're able to actually Enter this login node, but the login node is just an entry point for the actual cluster Because then there are many many many Other machines other pieces of hardware and then you basically need to say, okay, you know what I need A computer with 16 CPUs and I don't know 20 gig of RAM for five days You ask this from the login node and the login node sends you to some of these nodes here And now for five days you have this machine just for you where you can run your computing So in practice, this is you know for for day two for for For tomorrow and Friday, we will cover exactly this. How do you access the login node or you can you start accessing the individual node and all these differences of Stotted systems. Yeah, I'd give this kind of an analogy for this is that basically like let's say you you want to Create you want to buy it like a custom made table or something You can either like order it from from the nearby like like carpenter Who's like let's say like in this analogy that would be or a workstation in in like your office in altar You you go there and then you You work with you work there basically you work in the in the With the carpenter and you make the table or if you want to make like thousand tables You suddenly are like, okay. I cannot make it anymore in my local like computer I cannot make it in the local Local carpenter you put an order to china to make like some company in china to make your thousand tables And and for that you need to give them specifications You need to tell them what they need to do like the instructions What do you want them to do and they will do what what you have instructed them to do But they they will If you give them bad instructions, they will give you a table with five legs or something and and You need to you need to specify what what you want to order Basically, and this is the similar kind of way that the login node or some other system is is this kind of like a Like an ordering system basically where you go there and you specify that okay I want this amount of resources I want this computer and run these things for me and it will run it there and You will get your hopefully your table finished but basically it's this kind of like You move the the process of actually doing the stuff like what is the actual doing of stuff It's well, of course the main doing of stuff is that you write the instructions That is what you are actually doing But what is actually like somebody's to have to do the calculations And actually do the stuff and that is like computing and running some code and that can be like moved somewhere else And and in this kind of situation you want to move it inside this kind of a Big warehouse of computers where where the computers can just like do whatever you have instructed them to do and then Give you the answers back Yeah, this is a very good point another thing that I forgot to mention is that Sometimes it's also an issue with the data Maybe because the data is too big and there's no way that you can store it locally or maybe because also the data might be Sensitive that your university or wherever gave you the data They don't allow you to take the data out of this You know private network, which are basically these these blacks or white circles that you see. Yeah But in general, I mean understanding where the computing happens and what makes it happen happening It can give you a better idea of say, you know, yeah, actually, you know, what I really need 10 CPUs or maybe actually, you know, you need only one CPU But doing the same thing over the multiple workshop that that Sima was saying over over multiple nodes so as a We have a long collection of public resources here that people can Immediately get their computing done, you know, not not not just limited by your by your local machine But of course each service, especially the free ones, they might have other limitations Luckily, most of your year are part of some organization in Finland And also abroad because we also have people from outside Finland here And so you can always apply for access to this computing cluster in your organization or in the xcsc, for example So we still have A couple of minutes, I was briefly checking the HACMD and I'm glad that people like the BASTA analogy But it's really, you know, even like it it takes some time to understand parallelization and And I think I've seen many people disappointed when they moved their code From the laptop to one of these HPC system They actually felt that their code was slower in the HPC system and some of they're like, so what is the point? I went through all this effort but then you need to understand why is the code slower because maybe You know, you're not requesting for enough for enough resources or maybe the the tools that you're using in a laptop. They're not exactly the same version and Implementations that you might have in this HPC system. So Simo, do you want to add something else on to close this? Yeah, I would probably close it that like the the main things that I would hope you have gathered from this is that like basically Scientific computing is multitude of things. It's Partially like things that like organizing stuff and doing stuff like that that you need to like Handle yourself like in a sense that you need to keep track of how you need to choose how how you want to organize your stuff And how to get stuff done like whether you basically put your Tools when you're when you're doing your Computing but it's also this kind of like what kind of resources you have what kind of Like especially when you turned into remote usage like for HPC It's also this kind of like how do you What does the code actually use like what kind of tools it actually uses so basically like if you have a recipe For cooking and you need to cook like let's say the pasta for For whole whole Italian wedding. You need to cook pasta for the whole wedding You need to use the tools available in the kitchen where you're going to and and you need to choose the correct kitchen and correct Oh, you tell us some stuff like that So so basically if you if you book a kitchen you better be certain that there's enough pots for For the for the whole cooking process. So basically you need Like it's it's it's very there's lots of things that are going to come during the course as well about specifics How do you ask for different resources and stuff like that? But the main thing You hopefully gather is that like there is this kind of like Distinction between like all kinds of like Yeah, there's all kinds of like coding stuff and then there's what's actually been done with some Tools like some resource some actual machine somewhere is calculating it and this once you get this Like this kind of distinction that okay I want to cook pasta I can choose between different kinds of kitchens. I can choose between different kinds of utensils to do the cooking Then it's easier to choose. Okay. I like this needs this kind of a big pot. I will use this one so so But this kind of like separating this layer of layer In your mind it helps you a lot in the in the long run because that's how we usually perceive the thing that Like we try to separate. Okay. What's what's happening and where is it happening? And where can be quite nebulous when you're working across the internet? But but there's actually somewhere like cloud isn't in a cloud It's actually some machine room in hamina somewhere like for google Like there's no actual cloud where the data gets uploaded There's actual like somewhere somebody has to actually store the data or calculate something and once you get this kind of a distinction It helps you a lot along the way Excellent, I think we could have a break now. I don't know if it's 1 p.m But I see the people on hack md They got really inspired by this cooking metaphor somebody's writing that you prepare one sauce But then you can cook many different pastas. Yeah, hopefully you had lunch before the before the talk like This is exactly that, you know, that you realize that a part of the process You just need one cpu because you you can't really parallelize it just one sauce But then maybe you need you have three types of pasta that you run in three different machines All right, but if there's nothing to mention from the hack md, I won't follow it too much I think we could have a break and then We can resume at the 1 p.m sharp Anything to mention from hack md or I guess everything is being answered Okay Yeah, thanks