 Hello, we're back. So hopefully you had enough time to work on those exercises. If not, well, you have a little bit more. You can think about it during this next talk. We'll be going over the exercises in about half an hour. So next up we have a talk called laptop salumi, an overview of CSC resources. And we have you have left oh here from CSC who works on the computing environments team, which is sort of between the cluster admins and the other user support. But what I understand the idea is making more easy to use and sort of better ways for people to use the clusters. But I think the main the point of the talk is to know that okay, we've been talking about things here if you need even more resources than there are at Triton with at alto, then you have the option of CSC and it's really pretty similar. So with that said, you have um, if you can share your screen I'll flip it to that. Okay, thank you. So, here you go. Here we are. Okay, so. Yeah, I guess I don't need to be here. So I'll hide myself. And if you have any questions, let me know. Okay, thanks. All right. So, from laptop salumi. So, what kind of services CSC provides. And. And. Yeah, I think that's that's a good title for this thing. My name is Juha Lento, not just the Enkovar I inherited this talk just one day ago. So I will use last year's slides, but we will have a look at the let's say actual services instead of just the slides. So I think that all slides are just fine. But we're going to talk is what is CSC. So CSC is computing services, data services, other services, and how can you access CSC services. CSC IT Center for Science has been around maybe 50 years. Maybe not that long. Something like that. And it has grown. And it's growing very fast nowadays so we are not only using big clusters but doing lots of stuff for the ministry of education and culture and also with universities and politics and and also nowadays also research Institute researchers have access to CSC's resources exactly the same way as university researchers and students. I would say that kind of the first thing to mention here is that you all have right to use CSC services and most of them are already paid. So feel free to use them. And then the second kind of question. Can I use the services? Yes, you can. But should I and this is a more difficult question. So the obvious and easy answer how to how to know about this is to ask your supervisor or let's say that your master's thesis supervisor or PhD thesis supervisor. Should I use CSC services and they will actually probably tell if you should. So usually the if your research group is already using CSC services that's then you can like work together with your colleagues and this is. Well, let's say the easiest way to know how to how to work with CSC resources. And that's yeah. That's how to get access. But in short, if like in traditional sense, if your calculations are starting to take a long time, but they can be run in parallel, then you can use a cluster, either your local cluster or if your research group has been using CSC for a long time, they already know how to use that resource and you can kind of start using that easily. If your calculation needs lots of memory, also in here, if your calculation is lots of memory, you probably want it to run in parallel also. It needs lots of storage space. Yeah, that's available at least definitely more than on your laptop. And then we have lots of scientific applications that are already installed into the machine, which is nice. So to use, I would say that the most CSC's research users and researchers using CSC services are not running large parallel calculation. I would say that the large number of our users are actually using CSC services as they would use a large workstation instead of the cluster of workstations. And also for kind of within research group, kind of getting the synergy of using the kind of same machine or similar things that you are doing in your research group. Yeah, and then you can also share data using ALAS. So ALAS is an object storage, but you can publish whatever files you have in the internet or share them between different projects. And here with projects, I mean CSC is computing projects. Yeah. And you can also build like interfaces that you use ALAS. So ALAS is very raw storage. So if you want even to, let's say, list files in a bucket, bucket as a folder or directory, basically in object storage, even that you need to kind of code yourself. But that's pretty easy. Okay. But anyways, you can kind of use it as a Dropbox or or Publish Insurance. How CSC is super copiously different from University cluster in all to your lucky because your local cluster is very similar to Puhti and Mahdi. Puhti and Mahdi are just larger. And yeah, but also they have more users. So kind of you don't necessarily get more resources for yourself. It depends on the time of the year and day, how much crowd there is and how much other calculations are running in the machine. Comment line access is very similar. You log into login mode with SSH and then you type stuff in the terminal. And this works very similar in all the clusters. So basically you're logging into login mode, which is a Linux machine and then using the batch queue system you can spread your calculation into compute nodes. But you have already discussed that and that you are familiar with that stuff. Okay. We have a new service. So this slide is a year old. So actually now the service is established and it's I think it's very popular. It's accessing CSC services through web browser. So Puhti has a web interface and Mahdi will also have a web interface very soon. And I hope Lumi will also have a web interface very soon. And this web interface is really convenient for exactly this kind of workloads where you use let's say one node in the machine as a large work stage. That kind of uses is really easy and nice using web browser. So you need to run large parallel calculation then probably using just a command line is still better. Yeah. And there's lots of software already installed and you use module system to pick which versions of the software you want to use the same machine or local cluster and use batch queue system to submit the jobs to compute nodes. So CSC resources consume billing units. So we tried to basically this is for giving like a fair share of the resources to everybody. So you get billing units based on the kind of size of your approach and how many researchers there is and what kind of research you do and how many publications you already have and do you have the money and whatnot. But I wouldn't worry too much about billing units. Like usually the problem is not the resources. Maybe the disk space if you go crazy but computing resources are usually like there's plenty of that. If you know how to use it efficiently it's there's plenty of computing resources I would say. So you don't need to worry billing units that much. Also you kind of getting billing units or getting more of them to your project is very easy and I will talk about that a little bit later. Computing services so which machine should you use Puhti or Mahdi? So Puhti is our kind of general machine and this is used a lot like large workstation but you can also run pretty large parallel calculations in there. I would say that if your parallel calculation starts to be more than let's say 128 MPI tasks then you probably should already change to Mahdi because in Mahdi there's less queue and also it's a little bit more responsive machine nowadays and there's more disk. So if your MPI jobs are more than 128 cores or otherwise you can utilize that number of CPU cores in one calculation at one time then Mahdi is probably better machine and this is more about the same stuff and then the web interface I will show there's a slide about this but I think it's better if I just show how it works like let's hope it works and then CSC has lots of other services in addition to computing clusters so we also have cloud computing services. People are sometimes a little bit confused when to use cloud computing services. I would say that you use cloud computing services when you want to run a server. So server is a program running in a computer that is listening to requests from outside instead of kind of running calculation within the machine and in supercomputers you can't run servers in that sense so you can't have a file server for example. You can't start your own file server in Puhti which is available to everybody in the internet for example but in cloud services you can also if you want to just share some files and not provide anything more complicated service you can just use Allas and then you don't need to have the kind of active cloud instance of virtual machine running but for kind of special needs you have cloud services also so and for example you might need it for sensitive data if you handle some like patient data or something like that and this has legislation and strict rules that you can put it into a machine which is this and that lots of details I don't remember them all but you know if your data is sensitive data you know about that stuff and then we have a rather new machine and so this is Palm European supercomputer so this is not only for finished users it has maybe around 10 different countries on it and it's an interesting resource because there's kind of support is kind of divided there's an international support team and then there's admins which are from CSC and then there are of course we other CSC specialists like the regular national stuff also help with with Loomi things so it's interesting a little bit in that sense but it's a nice machine even the kind of small CPU partition of Loomi is about the same size as Mahdi so you can do already a lot there but the kind of main computing power is in the GPUs in Loomi so this is common nowadays if you want to get like lots of flops then you stack GPUs into a machine yeah and you can also get access to Loomi and the process is maybe slightly more complicated but not really so it's pretty easy I already mentioned allas many times so I think this is really convenient if you want to share data and also allas is the place where you keep project lifetime data so data that is that you should have during the computing project so maybe two or three years or until you graduate or something like that you should put that kind of data into allas and then the scratch space on Puhtian Mahdi is meant for the duration of the let's say calculation campaign or whatever so let's say during one the writing one article let's this way and then there's fair services fair data services and well I must confess that I'm not that familiar with that side but let's say the discoverability and also the lifecycle management of data that is something that everybody should think carefully when you start the project so how long do I need to keep the data and to whom I need to share it with things like that and then this fair data services helps if you if you need something more complex than just making the public to all word or something like that other services training definitely I think we all application specialists hope that we had more time to give courses and I have always enjoyed the courses they are nice and yet they are usually in some specific area of science like chemistry or something like that and then we have expert services so we have maybe let's say 30 persons who are former researchers so we know about the research field and we support the software and stuff related to that so if you are lucky the stuff that your research is already kind of covered and if you are not lucky then we try to see if we can find somebody who knows something about it so he can or she can help you with the scientific stuff also a little bit mostly we work with a computer running computing efficiently and we of course help installing software so sometimes scientific software is not trivial to install okay couple of courses that are definitely worth mentioning is elements of supercomputing and then using let's say this using CSC environment efficiently course is something that you should definitely check I think it's online nowadays so you can do it whenever you want and the lectures are video lectures and so but it gets you familiar with the CSC environment very nicely and then CSC summer school in high performance company that's a great summer school it's a lot of work but it's also a lot of fun it's in Nuuccio I think it has been in Nuuccio many years and it's really nice I definitely recommend that one okay then I promised to talk about getting access to CSCs supercomputers and services so few are eligible and it's actually really easy as university stuff and students you have HACA authentication so you can just go into my CSC maybe I just open it up so the front page looks like this and then you log in or register okay and I can use HACA and I am already a I already have a CSC account so I'm kind of you CSC user and then when I have an account I need a project so accounts don't you don't get that access to machines yet with an account but you get access to Puhtian Mahdi through projects so you need to start a computing project or join some existing computing project and the projects have the resources so you can apply billing units to projects and you can say that I want to use Puhtian Mahdi and Allah sandwall not in the project and then as a member you can use user stuff in the in the project and you can manage all kind of things related to your account in this my.csc.fi service there is a question that's relevant right now which is can I create a personal project how many resources can I ask for yes you can you well you actually for a project you need to basically be a staff member at the university so you need to ask if you are a student you need to ask your ask permission to join your professors or supervisors project so you need to be a like senior researcher or professor or something like that to start the project but then being a student is not enough yet yeah did this answer the question yes thank you good good so the project owner needs to be some kind of responsible people or person who is who is a staff member at the university or polytechnic or research institute like fmi or vtd or something like that but other than that billing units at least at some point billing unit is was roughly equivalent to one cpu hour cpu core hour and you can get 100 000 cpu of 100 000 billing units like automatically so that's about the scale that you can get and then like let's say big project might be five million billing units and then really big project maybe some between like 20 or 30 million billing units which is roughly equivalent to core cpu hours but also your disk usage will will consume billing units don't worry too much about those so the billing units are kind of if you're starting to run out of them you may be doing something silly like you have left your temporary files on the disk or something like that yeah but anyways this is the place where you manage your account and projects and yeah okay here it says project manager needs to be an experienced researcher yeah and then let's not go to questions yet I'll show a couple of these services live so I promise to show this puhti web interface so this is if you type in puhti.csc.fi to your browser you're going to this kind of font size a lot larger because of the way it's put into a little bigger font yeah this is could be a bit bigger but it's probably okay yeah I'm afraid the layout will be horrible with the larger but it's enough just to show roughly what it looks like and then you log into puhti wakey wakey okay yeah I was already logged in but you can use your HACA authentication when you have an account and a project you need to get those first from my csc but when you have those this is the kind of web interface for puhti and let's say that I want to do something in the login well okay login notion so this is where you would go if you ssh into puhti.csc.fi but usually you might also jump right into compute node so this is interactive job in a compute node let's see it doesn't start obviously if you're doing live demo it will be slow like now it's waiting for some matrix yeah okay yeah so instead of go logging in using ssh and terminal and then starting interactive session or interactive job in a compute node you can go to this web interface and these are basically the same parameters that you could give in your sbatch command or script so you can specify the number of cdp course memory how much local disk you would like to get and stuff like that and then you can launch it okay so I need to specify okay for some reason I have project csc training there which is not right I'll get something like this is my somebody asked can you have your own personal project I do have I don't know if it's recommended or not but it's convenient to have one okay now it says that it's starting the interactive so it's now queuing to start the interactive job and then I can just connect to the session so now I'm basically going right into the compute node and this is nice because I can mess up here so if I want to run heavier calculation which I don't want to run in the login node here it's okay it's only one one of the compute nodes that will get stuck of course also in here you cannot do bad things on the disks because there's just the parallel disk disks are shared resource so you need to be careful how you use those so this is the equivalent of the s interactive command that we learned yesterday I guess so you have the job allocation and this was a kind of yeah and when you're done you can just close the tab and then you can delete the job so you can reconnect to the same session if you want so that's kind of screen or emacs type of thing yeah we're familiar with those okay but then there were lots of other things here also maybe I have one minute more so desktop is something which you might want to use if you want to plot some graphs from the data that you have in puhti we used to have service called no machine which is a remote desktop desktop service but this is similar so you can open a container within the compute node and you can run a regular desktop here and you can open terminal which will open in the same compute node and now if we want to start some graphical stuff you can start it here and this is quicker than doing SSH minus x puhti and using x11 to transfer graphical stuff between puhti and your whole machine so I think lots of people are using this one also let's close and delete that job and then the services that are used a lot are you can start MATLAB RStudio Visual Studio Code you probably want to run on your local machine and then use this remote plugin or BS code to access stuff but Jupiter is something there that a lot of people use and RStudio so you can just open those yeah so we've got some questions would you like to see yeah okay I will take over the screen share yep and move yes this is me I didn't talk about Lumie should we go back now but yeah I talked briefly what is Lumie so yeah I did talk a little bit about it so you can search for Lumie supercomputer and it will throw you into the documentation also I didn't explicitly mention docs.csc.fi which is our documentation for all the machines so if you are starting to use CSC resources docs.csc.fi is the address that you want to go first right okay good okay then the questions yeah so the first one was what kind of research projects benefit from the Lumie supercomputer I would say if you have a large international project or a project with industrial partners because lots of Lumie resources are reserved for industrial use and it's really I would say it's really easy to get results resources from there if you are an industrial user or you are collaborating with one and I guess that's not the case at CSC so for industrial users they would be paying for the services exactly but in Lumie you probably can get to access to resources that are already paid or something like that okay that's pretty good yeah so yes our bachelor's and master's thesis allowed use cases for CSC Lumie was that already answered like is it the case if you need to belong to some large approach you can use Lumie but you need to belong to some project that has access to work yeah so could you make a project say I'm requesting project for the students thesis or would it need to be phrased as something larger than that did you say like a project for all your students for example maybe or something like that we would prefer that projects have kind of kind of ending time also so it's easier to keep the let's say in practice it's easier to delete files from the file system when the project ends then you know yeah otherwise as you know disks will fill up no matter how big they are yeah we need to clean them somehow and when project ends that's a good place to remove stuff yeah yeah okay another question what would using Lumie make more sense than using Triton which is Altos cluster in terms of computation I guess this was maybe already answered somehow if you need to use lots of GPUs especially in parallel then that would be a good case like from the kind of technical right yeah that's what we recommend like yeah and I point out that us at science IT we've gone through and figure out how to use the how to use Lumie with some of the common machine learning frameworks so we'd recommend if you aren't sure come to us let us know your problem and we'll help you get set up on Lumie and then you can run there and have access to far more GPUs than you could have imagined yeah that's that's a good service because getting started I would say you always need somebody who has already done that stuff earlier otherwise it's really slow even with good documentation yeah definitely if you have a colleague or support person for to help you then that's definitely the way to go especially for new researchers ask help you will save lots of money yeah okay um I added the link to the using the CSC environment efficiently personal project okay I already asked you that earlier yeah our projects officially funded projects do they need to be approved is there anything that's not already answered here so I guess project doesn't mean like Academy of Finland project it basically means a request made to CSC for something research yeah we have the source of the fun when I'm talking about project I mean CSC is computing project yeah it's a basically we use projects if you think about technical side project members can easily share data between themselves or within the group because they all belong to same Unix group and also Allah's access is similar everybody in a project has all the rights in the Allah's files that the project has you know so it's kind of way of managing groups we used to have this thing that that we kind of the first class citizen was was user instead of a group and then if you wanted to share your files with another user that was that was not easy because there are some drawbacks also in this approach that group is the kind of first class citizen and user is second class yeah okay billing unit I was just writing down here so what's the very large project size I must say that I might be out there you can put 20 million there okay okay let's say this is very large it could be 50 million nowadays I might be out there but as I said it really doesn't I don't think it's really difficult to use that much of computing resources efficient it's really difficult like if you spend like a week more thinking then you can probably do with half of that anyway yeah so how to access Lumi I guess that's answered in these points here yeah comment on Lumi versus Amazon Web Services or Azure are completely different that's good yeah totally different so I guess Lumi well it's sort of like some fundamental idea where on Lumi the main set of the main allocation is a job and all of the computing hardware is shared on AWS or Zura you would be getting either like the core thing would be the like the servers you're requesting or like the certain amount of hardware or the access to some database or something yes so AVF and Azure are cloud services and you can use them a little bit like computing services also yeah you kind of build your own cluster in Azure or AVS you can think about it yeah okay but if you take AVS or Azure you may have to kind of basically build your own HPC machine or whatever if you want to run it of course then you have it all to yourself you're paying for all of it even when it's not being used yeah and these things are not cheap yeah obviously yeah I see it and I kind of finish a policy of having one big computing center I think has paid really well so in other Nordic countries and in many other countries the national resources are spread around different let's say university campuses like completely spread in Finland we have one big center and then universities have their own clusters then the kind of one big center has more kind of possibility to get just bigger machines than individual university could ever yeah okay there's a question I guess to me is there support for using CSD services in the daily garage or is it just I guess you mean the ultra daily garage so yes we help you with everything so even if it was AWS or Azure we've done projects using all of them and that doesn't mean we know the answer to everything but we can get you started we'll do the best we can and direct you to the other resources if needed okay there's a last question coming in here you can see Project Unlumie how to start a session on it maybe for that we well I guess you all can probably answer this by writing later it's probably some of the links up here maybe you just you need to copy your SSH key the public part first into the service and then actually I think there's a good documentation on that also yeah we can just find the link if it's not already there get started users in Finland I think it's there already in the first link I'll get okay well with that said I guess we are pretty much done and we should probably get moving on um yeah so thanks Juha hopefully this was uh insightful to other people here to know that like what we're teaching in this course really applies to far more than just what we're doing we're teaching you the standard stuff pretty much definitely like all supercomputers are individual so details changed but if you know the kind of basics around how why do you have that system and stuff like that then you are fine then you can google or ask for the details thank you very much thank you yeah once you get accustomed to your local your local area then it's usually good idea too and once that starts to feel small there are bigger points where you can go and play around and we all like work together so that we can like everybody has their individual needs and their individual needs can be supported by multiple different places so it's not like a competition that one is a bigger player like it would take something out of us that's not how this works like the main product that all of these services produce are people who manage to do their PhDs research that sort of things and like it doesn't matter which site provides the features it's all about the end goal yeah