 But the GPU stuff isn't that different than the parallelization chapter that we already went through. So basically GPUs, like if your code supports GPUs, that's good for you because it can properly use the GPUs. If it doesn't support GPUs, then it can't use GPUs, that's how it goes. So basically GPUs are very specialized like accelerators. And basically if you want to use the GPUs, your program needs to know how to like send the calculations to the GPUs. And that's usually done with these libraries, like CUDA libraries and stuff, that's so basically like let's say you have a matrix inversion or something like a plus calculation, this linear algebra calculation. Basically like this code in the CUDA libraries that basically tells us how to run the same calculation but on a GPU and your code or the program that you're using needs to like support doing this kind of a thing. So basically if your program supports GPUs, then you can use GPUs. For example, MATLAB has the GPU arrays, Python has these, TULIA has GPU libraries, R has many different GPU libraries. Like all of them, it really depends on whether your code supports. Many like physics simulation software can be compiled to use GPUs. But it usually, it requires that the GPU can support it. And if your GPU can, like if your code can support it, then it's very easy to ask for a GPU. You just add this one line. So basically our GPUs are in the queue, they are these generic resources or gris and basically they're specified like, okay, I want this resource when my code is running. And when you specify like, okay, I need this resource, you usually say how many of them and then you get the GPUs and then the code can utilize them. In GPUs, especially in the cluster side, usually like with the parallelization, it's usually not worth the effort of using multiple GPUs unless your program or problem is such that it can actually utilize the multiple GPUs because these cluster GPUs are a bit different to the GPUs that you have on your normal like workstation. They are much more powerful, but like you can often end up in a situation where the GPU isn't fully utilized because the problem isn't hard enough. So usually it's not worth the effort of adding like using multiple GPUs unless you know that your program is big, like that big enough that it actually needs or can utilize the multiple GPUs. You can also like, if you need certain GPU architecture, there are multiple different architectures for example in Python. So you can specify, use this corner strength to specify a certain GPU architecture. If you want to, Richard, run slurm features on the, so slurm features will give you the available different features that we have available. So you can see there that some of them have GPUs over there on the right. You can see Gress, these GPUs, some of them have like these G80s, really old stuff, or P100s, these Pascal generation stuff, or these P100s, which are like older generation, the last generation. We are getting some Unpeg GPUs, the new GPUs quite soon, but they are delayed currently, but we'll get them at some point. So if you want to, if you need a certain generation of a GPU, you can use this dash-constrained to specify the GPU generation that you want to, you can use. So, yeah, that is about it. If you're using some of the most commonly used machine learning frameworks, such as TensorFlow or Keras or PyTorch, they are all available in the Anaconda module. You don't need to load any CUDA modules or anything like that because the CUDA is in Anaconda as well. But if your code requires some CUDA libraries or you need to compile your stuff yourself, you need to usually load these CUDA modules, or these CUDA modules, and then compile your code so that it can, it can utilize the CUDA libraries. Yeah. So maybe you should run a quickly the, here's a few examples like how to do, like how to run one of the TensorFlow examples for PyTorch examples, but they are very specific for the framework. Because we may as well do one of these demos so we can show something. And you can see really there's not much about Slurm itself here. So let's see. I will get a file. You've doubled-weaved it. Double, well, it looked like it still worked anyway. Module load Anaconda. And now we're doing this without a batch script. We're doing this interactively. Oh. On other sites, the names of the GPUs, the names of the grower resources might be different, and if you want to use these Anaconda modules, you need to load the CCI common module. Like this. Who knows? We might have to be waiting a little while for this to run. But I guess the most important thing here is there's really not that much difference other than requesting it with Slurm this way. And it's a question of how does the code itself work. Talk about the monitoring of the GPUs. I guess I need to wait for this to get done. Yeah, like I've been testing out this better monitoring script that people could use, but it's not yet products. And really hopefully during the summer I can finalize it. That basically would give you a bit more output. And also this system that we have in Alto might not be in other clusters. But so many GPU frameworks, they have their own GPU monitoring tools. But when the job is running, you're allowed to SSH to the node. And there you can, for example, look with the NVIDIA SMI how the job is performing. Or in here in Alto, our GPU jobs record this GPU utilization that you can then access with this ESAC command. But we are currently working on a better monitoring script that you could use to get this monitoring information. It's a bit hard. In the future it will be better. But currently it's a bit tricky. Yeah. So there's a question of why GPUs versus CPUs on the job. So GPUs are very good at specialized calculations, such as matrix calculations. And basically all of the deep learning stuff that you have nowadays, what is really hit, is basically like big matrix calculations in a row. Like it's calculating matrices over and over again and doing operations on them. And GPUs are very fast when they have this kind of a workflow. So in specialized workflows and specialized algorithms, GPUs can be much faster than CPUs. So quite often CPUs are like central processing units. So they are generic. They can run whatever code you want there. But GPUs usually they are good only when you're running code that is optimized for the GPU. And it utilizes the structure of the GPU. GPU basically is like a bunch of single CPUs, like thousands of these UDA cores inside there. So that can do simple calculations. And usually if you have a problem that can be parallelized with GPUs, they are algorithms for those. And those run very fast on GPUs. But it's much harder to code for them because they're not generic. They're meant for like these specific kinds of calculations. Yeah. What about this data input and output stuff? Yeah, that's one of the things that might surprise people when they first start using GPUs in Triton, for example. So in your own machine, you might have like data, let's say you just download some reference data sets. Like an ImageNet or MNIST or something. Well, MNIST is really small, but CIFAR or something, like some of these popular image data sets. And you run some program against these on the GPU in the cluster. And you notice that, OK, it's not that much faster than on my own computer. What is the problem? Like why does it seem slow? And the problem usually is that these GPUs are so fast that they are not necessarily fully utilized if the data isn't there for them. So nowadays, we're talking big data is commonplace everywhere. So you might have data sets of hundreds of gigabytes and stuff like that. And the GPUs, if you're running some of these specialized algorithms like machine learning and stuff like that on the GPUs, it can be very hard to utilize the GPU because it needs so much data. And if the data needs to come all the way from our Lustre storage, which is fast, but it's far away, like it needs to travel through network, it needs network, it needs to get both the CPU and the CPU needs to feed the data to the GPU. It's a very long way away. And that might mean that the GPU is idling while this is happening. And that is why all of the GPU nodes that we have, they have local storage there. So you're supposed like, if you want your code to run faster, it's usually best idea to pack up your data into some sort of like binary format or something like that or a container. Like if you do analysis on images, you might want to pack your images into video file. So because that's a good format to analyze images out of. It depends. It's very like don't choose one format or use. It really depends on what kind of algorithm you're running. But there are plenty of these like formats that are utilized for this. And they're very like usually you need to choose the correct format for your case. But best idea is to like have the data set in some good format, copy it to the local disk before you start the running stuff on the GPUs and then utilize the GPUs from the local disk because that's near and fast to the, like they are fast NVME disks right next to the GPUs basically they don't have to get data from the Lusterer system through the network drive. Yeah. It's much faster to take it locally. And sometimes even these GPU jobs can cause problems for the file system. Yeah. If they are very hungry for like individual files especially like let's say the ImageNet data set comes like 1.2, 1.4 million images that are very small images. And if somebody wants to like every epoch in like a machine learning training you want to go through all of these files. So you might need to do like 100,000 epochs. So you might read billions of files. Doing one training process. So it's much better to have it in the local disk so that it doesn't bother other users as well. Yeah. There's instructions here on and also links to the instructions how to use the local disks. I think we could respond to some of the questions. Oh, look at this. Look at this. The TensorFlow thing ran. Okay. Yeah, okay. So now it's running the TPU code. Yeah, we, TPU nodes, some of them are undergoing update like right now. So yeah, that's why there was a bit of a hiccup there. So basically, do you want to try to run it again? Maybe we can get it through again. Should I try again? Yeah, just like this. Well, do I need to run it multiple times? No, that's fine. But yeah, should we look at the history from it? So look at the checking the GPU performance. Yeah, that's actually a good idea. Okay. Let's see. Let me go back to the GPU. Just a minute. Yeah, there's good questions in the HackMD. So why are there so many software packages in high level languages and not software packages based on low level languages for GPUs? Well, the thing is that it's usually a lot of work to do, to work with, like let's say the CUDA libraries, they are available in low level languages like C and Fortran. You can call, I think you can call them in Fortran as well, like with the C extensors. But basically the thing is that it's so much work to write the code, like people don't want to write the code, it's usually better to just write a wrapper around the, let's say the Kubla's library call and then just like call it from Python. So it's much easier to then do other stuff as well. So you can, like CSC has courses on, for example, on GPU programming, if you want to do like low level stuff, CUDA stuff. Yeah. So we use this and the job ID is this. So this is actually in the Slurm accounting database. It has, well, someone, was it Simo or Miko? Miko mainly wrote this. Who's added an extra thing that puts a comment here. So it puts in the comment field, the GPU stats. So memory, how do I power? Is this like tools or kilowatt hours or watts hours? Yeah, number of CPUs requested, number of GPUs requested and relative GPU efficiency, which was 23%. Yeah, this is like, I think this is pulled every minute or something or half a minute. And then average after the top, when the job finishes. So this is tried and specific, but we'll try to make a better monitoring also for other sites soonish, hopefully. But what you can notice already from here is that like this small, like let's say, in this example that doesn't utilize the GPUs. That's GPUs, like what is this? Like child's game, like they don't even sweat when you put these kind of like normal examples to it. So you really need to usually out of, you have a big model, lots of like data and lots of utilization to get the GPUs to be fully utilized. Yeah.