 Welcome, well, again, we're coming back, not you. So anyway, we're back and now we have a presentation which we call laptops to Lumi. So the basic idea is that at the start of this course. So on day one, we were talking about small scale, like the kind of skills you need to know. Like, you know, you start with your laptop, you go to other university resources and so on. But you can easily scale beyond where we are now. So beyond Triton to what's at CSC. And CSC has much bigger computers than Triton has. And not only that, so their largest is the, well, it's a EU, well, I guess you see will tell us exactly how it's organized, but Lumi is going to be the largest or one of the largest supercomputers, not just in Europe, but in the world, and is managed by CSC. So using the lessons we're learning now, you can basically continue your work up to that scale if you need to. Of course, there's a whole lot of other considerations, but it's sort of want to introduce you to what the path can have afterwards. So with that being said, you see, can you please tell a bit more about yourself and we can begin. Okay. Thank you, Richard. Nice to be here. As I said, my name is Yusien Kovara. I work at CSC in high performance computing support. My background is in physics. So I'll go sort of for the University of Technology is my alma mater. I made my PhD there in 2003 using CSC supercomputers doing my diploma thesis and my PhD. And then after doing the postdoc, I ended up at CSC doing various things related to supercomputers, programming supercomputers, etc. With this presentation, I'll first briefly describe what CSC is. So in addition being the National Supercomputing Center, we do also quite a bit of other stuff. And then I discussed a bit what sort of CSC has to offer, how they relate to the things we tried on and so on. I've already discussed then in what kind of situations CSC services might be good solutions for you, not only related to the computing but also to the data management and stuff. And finally discuss briefly that okay, if you need to use CSC services, how to get started. Before actually going for what CSC is, if you go to HackMB, I put there a simple question whether you have used some CSC services or not. So please mark there and I'll know a bit that how familiar you are with what we are doing. Okay, I can already see that quite a few people have used some services or at least know that they have been using some services. I really doubt that most of the people that transfer here know if we take these CSC services in a bit the proto context, so not only the supercomputers. We are actually using CSC services even not necessarily knowing about that. So if we actually look a bit more at what CSC as a full is, I mean, first of all, we are a nonprofit company owned by the Minister of Education and the universities and in a bit broader perspective, we provide various IT services for the research and higher education. Supercomputers are of course one thing, but we also produce services like FUNET. So the main network between the Finnish universities, HAKA authentication, eduroma just some examples. So actually, if you are studying or doing research in any of the Finnish universities, I'm 100% sure that, well, let's say 99% sure that you're actually using at least some of our services. Maybe important point here related to the nonprofit aspect and the finding from Minister of Education that most of the services are free of charge for our end users. From now on, we'll, I'm not going to discuss this FUNET, HAKA et cetera anymore. So we will be focusing on the computing and data services and mostly from the perspective of user who already has the software that he or she wants to run and maybe scale up the simulations. Not so much on perspective of if you're developing your own applications. So I'm not going to discuss that what you actually need to do in order to have your application to run in full scale in LUMI because that's really wide topic. First question if of course this is that okay when you might be needing CSC and in many aspects this is actually you start with that when you actually need something beyond let's say your own workstation or laptop. And of course it really depends that what you are doing. And actually for some things laptop might be might be better if I just look for my laptop and give this presentation now the individual CPU cores. They are in fact they are a bit more efficient that least in terms of computing power in terms of clock frequency than for example CPUs in CSC supercomputers, Puhdi and Mahdi. So if I would only need to do something, let's say compute something with single CPU core, which would take half an hour and not to consume that much memory or disk space. Then I probably would achieve the end result faster with my laptop and then using the supercomputer. So I think so of course different when when the calculation would start to take a much longer and it's something that could be could be parallelized. Of course other thing is that it doesn't necessarily parallelize that far, but it really needs lots of memory so for example in my laptop, I would have 16 gigabytes of memory. And I think the maximum possible configuration would be 32 gigabytes. So if I need more memory than that of course I need to do something else. And as a sort of contrast that at CSC and I guess in Triton you you could get up to two terabytes at maximum. I think CSC doesn't even have that much, it's only 1.5 terabytes. So that's the other clear use case when you might want to use either CSC or Triton. I'll come a bit later on that in what the main difference is in services provided by CSC when you when you compare to something like Triton. And then third very, very likely case is that you work with lots of datasets. Either your input data is lots or your simulation produce lots of data. And once again my laptop has in total something like 226 gigabytes I guess. There of course the operating system and everything takes quite the amount and if I really need a lot of data I really cannot use that. And you get much more storage space in CSC and in also at the supercomputers. One more case still pretty much applicable just to CSC as to university clusters is that you want to use some scientific application. And if it's some commercial application it might be that it's actually quite expensive. You don't want to pay for that you won't have the money for that but it might be that it's available. CSC for example has very large collection of scientific software also commercial where we pay the license and they are for the users of CSC. They are free for academic research. Or even if it's not sort of expensive application even open source or free applications. It might save you some time if you don't need to install or maintain an update that yourself if that's already available. One further thing that if you go a bit that what you do. Let's say after the computing. If you produce lots of data that you would like to share. And I think that's something where you don't necessarily have the similar services at university level. Is that if you if you want to share the data to some group of people or even more if you want to publish the data and make it make it so that it's easily findable for other people or other people might even even cite your date. Okay then to the big question that in many ways I would say that the really the biggest step in your scientific computing career is really going from local workstation to some HPC cluster or Supercomputer how you want to want to call them. And so you will have something which is shared by multiple users. You will have something which has the batch job system. So your applications won't be starting immediately. And if you once you have taken this step. And if you then sort of looked at okay how using CSS supercomputers actually differs from using Triton. The difference is actually quite minor. So as Richard already said, the main difference really is in scale, the amount of resources. In terms of QCPU power or GPUs and also in terms of storage. In terms of single node I think has said, there are not that big differences but you what you would get from CSC or from Triton. And that's the example. I don't remember now what's the maximum number of CPU cores that you can use in Triton but I guess it's in the order of, I don't know, 2000 or something like that. But that'd be okay. And in contrast with Mahdi, you could go up to 25,000 cores in some cases even more. Or if you're using GPUs in Triton, I think there is maximum of eight GPUs at the same time you can use already with both the Mahdi National Supercomputers. You can use around 20 GPUs. And in Lumi, we don't know yet what's the usage policy will be but at least hundreds of GPUs at the same time will be available. Storage space is also something that you will get even more at CSC. So the default scratch space in CSC Supercomputers is one terabyte. And if needed, you can also pre-quest more. The command line access, that's very similar to SSH into cluster. In a set you use the module system, you use the batch queue system. As a new thing, I think I saw you briefly in the moment is that for Puhti, you can actually use that via web browser. One difference what comes to the using is that CSC, there is something called billing units. So you can think that that's a sort of virtual money. So before you start to use some CSC supercomputers or data storage services, you need to apply for billing units. And then when you are using these resources, they will be consuming these billing units. And then when you run out of the billing units, you need to apply again for them. The main reason really for this is that we are serving really large customer base. So I think at the moment, there are something like 3,000 people using CSC resources. And with that, we sort of try to make sure that the resources they shared somehow equally between users. So try to make sure that they utilized somehow wisely. So for example, when billing units are applied, typically you are required to somehow show that where you have been using them. So what kind of scientific applications you have produced with that. That actually for the individual user, that's not something that you need to worry that much. It's more the task of the project manager. And I get back that a bit more when we discuss actually how to get access to the CSC supercomputers. And by the way, I mean, if you have any questions and so on, feel free to ask in the HackMD. I try to have a look on that by now and then and also at the end of the talk. Let's discuss then that okay, what are these actual computing services that CSC provides? And I try to spare you from the technical details so you can go to our user documentation or more details if you are interested. So the basic national supercomputers are Puhti and Mahdi. Both have been, Puhti has been around for about three years and Mahdi bit less than two years. Puhti is more of the general purpose computer. You can do more, let's say, interactive single core stuff, medium scale parallel simulations. It has some varying amount of memory in nodes. So in the basic configuration, there are 192 gigabytes of memory per node, but then there are some large memory nodes. And as I said, there is also this web interface. And Mahdi is more towards medium and large scale parallel simulations. Puhti has Intel CPUs for the cores per node and Mahdi has AMD CPUs with 128 cores per node. So which one of these two would be more suitable? It really depends on the type of research you are doing. Generally, the software selection Puhti, as I said, that's the more general purpose computer. It has more software available. And if you are using something, let's say Q10's CPU cores up to a few hundred or so, Puhti might be the better choice for you. Some of the CPU nodes, they have also fast local disk. So if you're doing some, let's say, machine learning stuff with CPUs or other way something, some analysis where you really need to read and write a lot of data to the disk and you really need the fast disk, Puhti might be good. And for stuff like using Jupyter notebooks, RStudio and these kind of things, Puhti is probably more suitable. As I already said, Mahdi is more towards medium and large scale simulations. The minimum unit you can normally request from Mahdi, that's a one node. And as one single node contains 128 CPU cores, this is sort of minimum amount of CPU cores that your application should be able to use more or less efficiently. And if you really need to do very large scale simulations, Mahdi is really the machine for you. Without additional privileges, you can do simulations up to 20 nodes. That's a two and a half thousand CPU cores. And if you want to use more than that, one sort of has to apply for that and do some scalability testing. One practical aspect also when choosing which machine is used, that's okay. When you're using the batch job system, really depending how many other users there are and how much resources they need. The queuing times might differ and at least lately the CPU partitions in Mahdi, they've had a bit shorter queues. Both machines have some GPUs also available. Mahdi has a bit more recent and bit more powerful ones, but if you're using GPUs, both machines are quite okay. That's a very brief demonstration about the web interface. So as I said, that's a new service. So you can just go to pufti.csc.define with your web browser. You can sign in with your CSC user account. And then once you get logged in, what you can actually do, you can do various stuff with supercomputers just from your web browser. So let's say if I would like to use some Jupyter notebooks in pufti, I can start them directly from browser here. I can say that okay, how many CPU cores I would like to reserve that, how much memory, etc. And launching here, I would actually get the notebook just here in my browser. So for some use cases that might be might be convenient then and bit easy to use. In previous times, if you wanted to use let's say Jupyter notebooks in CSC supercomputers, you had to set up some SSH tunnels and things like that, which are a bit more complex. Sometimes the operating system you have in the supercomputing and system libraries might be limiting bit what kind of software you can install there. They might be not up to date or for some other reasons, you would need more flexibility than provided by pufti or Mahdi. And for these kind of circumstances, CSC has also some cloud computing services. Of course it means that in order to use these, you need to do more work. And yourself, you need to make the virtual image, you need to be sort of responsible for somehow all the sus admin stuff there. And CSC offers basically three different types of cloud computing services. So we have a C-POTA, which is kind of a general computing cloud. And then for cases where you need to work with the sensitive data. Personal information, health data, some gene data, this kind of black stuff. There is also E-POTA that requires a bit more to get into use than it is basically isolated from the normal internet. So there is, let's say, briefly it's much more secure and really meant for sensitive data. And then if you don't need to sort of work with full virtual machines, but some container stuff, and there is also the rahti container cloud. You can, for example, use that for running some web services, actually to CSC use a documentation if you look that that's that's run on top of rahti. Okay, let's then go to the final end of the scale in the resources. So Lumi, as already mentioned, it's going to be or it's hosted by CSC and it's one European supercomputer funded by the European Commission and by members of the Lumi concert team. Finland is one member there and there are also nine other members. Maybe the important thing for Finnish users is that the resources, they are sort of divided or distributed according to how different partners are funding the project. In practice, that means that about 25% of the whole Lumi will be dedicated to the Finnish users. And the access is supplied via CSC. So in that way for Finnish users, it shows up the sort of similar to other CSC supercomputers. It's going to have over 10,000 GPUs. So really the main processing power comes from GPUs and of course, in order to use that, it really means that your software needs to be something that can can utilize GPUs. And it's expected that when sort of Freddy, it should be within the five most powerful computers in the world, most likely number two or three. And hopefully that will be available to users in the forthcoming summer. There is also supporting CPU partition, small in the quarters and marks because that's actually a bit larger than Mahdi. And Lumi will have some notes with extremely lots of lots of memory for data analytics and other stuff up to 32 terabytes. So that's something that really if you have a problem that can scale up and can utilize GPUs, you can really get lots of computing power. How to actually, if you do not have a software that already runs with GPUs, then getting that ready to Lumi and getting that to scale up. That's that's a whole different story and port of, I don't know, one month training at least. Not really, but there are some training courses organized by CSC how to program for GPUs and how to get ready for Lumi. Okay, that covers pretty much the computing services we have. As I already said, some services that CSC provides that you do not directly have a similar ones in the treatment, for example, or in auto IP or in other universities, the data management and storage services. So, let's see you use your computation, you produce some amount of data, and you somehow would like to use that data, both in supercomputers on your laptop and maybe also share to some some coworkers. And for these kind of needs, CSC has the alas object storage server. And you can share today you can also control somehow the, you can have a different levels of boxes control this so you can share only with the limited set of people or for all bold or just for yourself or let's say for your project but you whatever you think that would be suitable. And that you can you can really directly access that also from the supercomputer it's not that fast as the scratch disk into into supercomputers so in in typical use scenarios you might have some data sets in alas. Then before you start to process them you download them to the supercomputer file systems. And when you finish you might upload that again to alas. Then I already mentioned that in some cases you would like to publish your data sets and make it so that the other people can can find that. So for these kind of needs there are these so called fade data services findable accessible interoperable reusable and there are principal tools you can use for adding metadata. You can describe the data so that other people can easier search for that and some of these the free data services you can use also for searching other data sets. As last but may have hopefully not least as some other services CC provides. In addition to these computing and data training services so we have a quite large number of training courses in different aspects of cycling computing every year. You can ask help for visualization of your scientific data and then if you need your particular scientific discipline. You need some help that okay maybe what software I would like to use or how to use that software. You can get some help also also from CC. You have of course the local local support that you can you can also try to reach. You can also open an application and you think that you would need some help in getting that to parallelize better pay from better CC provides also support there. In a way similar that you will get also locally from your RSCs your research software engineers and CC open works in collaboration with them so if you have these kind of needs you can. You can contact CC or first start from your local RSC. Just as before finishing some advertisement of some topical training training issues. We have developed the sort of self study. Popular level course about the super computing which you might find useful. You can find that in it's hosted by Kajani and Polytechnics. You can see the link here and find also CC pages. And if you can get want to get a bit more into how to how to use CC supercomputers. We have this using CC environment evidently coming in March content is in some aspects quite similar actually but you but you have in this course. And then that's not intel is your yet depends a bit on the current situation. But we have for many years we have around around 10 days summer school about high performance computing and hope that we can arrange that also in the upcoming summer. Okay, so final thing how to actually get access to CC supercomputers and services so first of all you need to have CC user account. So the one I used looking into putty. You can do that in the my.csc.defi. And if you come from Finnish University, you already have a hack out in the case and that will be only couple of mouse clicks. I already mentioned that when you use CC resources they consume these billing units. So you need to apply for them. And actually in order to apply billing units and use them you you need to belong some CC project. Project cannot be applied by anyone so it should be a project manager should be experiencing researchers typically postdoc or higher. But once the project manager, for example, your supervisor has the project and has applied for the billing units, they can easily add users to the project when they have the CC user account. Okay, just to finish, here are some links to CC user documentation and services I mentioned that if there are any questions at this point. I would be happy to take them. So it's okay. There is one question about the RSC like support for free. Well, I would say that mostly answer is yes, they're not not really for anybody. So of course most of the CC services they are funded by the Finnish government so they are for researchers working in Finnish universities. I guess that if you want to use our let's say optimization service you most likely you already are CC user and most likely you would like to run your software at CC. That also depends bit that the really how extensive service you need. So baby board typically, let's say these free to use cases is not that we take your software and and make it faster. So typically we might be doing some performance analysis and then together with you. We try to give the suggestions that okay what you could do for improving the performance and not do the whole job for you but help him there and hopefully in the way you will you will learn also that then in some cases we might be collaborating in some Academy projects and so on and might be doing let's say more extensive job in software but that typically then it requires some additional funding support. Yeah, let's see. So are your courses available for people to browse the material without registering or outside of the course times? The material that's typically available. Yeah, let's say if it's something courses given by CC stuff then material is typically published in the course web page. Most of the material you can find also in GitHub, exercises and so on. It's also mostly I mean if it's produced by CC it's typically also under creative commons so you can also freely use that. Let's say giving the credit then and so on. Sometimes we might have external lecturers, we might have people from in the lower somewhere else and then it goes up to them that what's the license and copyright of the material. Okay. So maybe I mean just, yeah, there is already the GitHub CC training so you can actually find quite a lot of material there. Okay. Yeah, I would probably add a great talk. You just say I would probably add like this kind of like all of these computings are pretty much like a spectrum like there's like you have all different scales you have a vastly different different scales involves but but it's all a spectrum and you might be at certain point of your like computational life. Currently, but it's good to be aware of where you might end up in in later time, even if you're currently working with a problem that might not fully utilize let's say supercomputers to CC resources. So you might want to push you towards that direction or you might recognize that okay your problem actually might benefit from this. So, in four, five years you might be there working on the supercomputers so it's good to keep that in mind. Another thing that to keep in mind is that basically, there's these national supercomputers all around Europe and all around the world so even if you're not here in Finland, if somebody's watching outside of Finland, there might be similar issues available through collaboration or through your local place, but it doesn't mean that the information let's say in the training places or documentation doesn't translate because much of these systems are very much alike so you might still benefit from looking at through the materials, even if they're not completely relevant to your use case. Yeah, now of course I mean many of things how to use certain software, some programming courses and so on they are the mostly trans labels so they might be details that okay which module and actual compile commands you need to write in particular system but let's say something how to program GPUs with good and MPI I mean that applies more or less to everybody. And really I think this is very good point is this spectrum so it really is wide and typically people I think go a bit on faces so I think there are very rare people that would start to do simulations with hundreds of GPUs at first so typically you start with something, something a bit smaller. You learn in the way of doing that but then it might be that actually the problems that you want to study or within your group you study and so on I mean they get bigger and then then you also scale up the simulations. And maybe for these resources or for example Loomi I said that's that's funded by both the consortium and the European Commission so actually half of the resources are dedicated to the all European researchers so of course there is a process to apply for the resources and there will be both technical and scientific review that would be a sort of going to make good use of the resources but Loomi is in principle that's that's available for all European researchers. I'd also mentioned that if you're like, if you're just starting, you know, like in a research group or something, you might also want like if you like many of these, let's say the billing units and everything like that depends on on your supervisors approval. So you might also bring this information to them if you're not utilizing these resources, because like from our point of view here in Alta, we are job is to make our researchers life easier and make them better get them their research done better. And if that means that they're using CSC resources efficiently that's of course best for us because then that means that they get their research done better. So even if there's no like competition between the systems, the main thing is to get the research done and if your group would benefit from this, I highly recommend starting the discussion also in your research groups. Yeah, that's that's exactly true. And I mean, that's, I guess for CSC point of view, it's it's really I mean, we are not trying to make make money here. I mean, we are here to make research done better. And whatever is the best tool, I mean, I can, I don't think we would typically get users towards using that okay you should not be using CSC you should go to the your local university class they're all laptop but of course I mean there are cases when that might be the better solution. So it's, it's always depends a bit that what you are doing. Sometimes it might be that I don't know how it's general to queuing situation in try to just might be something that there is a certain period in some month that okay Triton is fully packed up and for some reason put the has a bit more resources so of course you have you need to do sometimes be bored to maybe move your data check that software is available there but still it's not that you use either one system or other it's and it's also not that you use only Pustio Mahtio Lumi I think in most cases at least when you go a bit further away you use a mixture of different resources depending what what best fits you. So, um, I think most of the HackMD questions are basically being answered there some about where the different things are. Yeah.