 So welcome everybody. Mitchell, is it good to start? I think so. Okay, all good. Okay, and okay, so welcome everyone. This is machine learning for Australia, community of practice. We are co-led by ARDC and other increases like NCI, QSIF, POSI and CSIRO. And in today's talk, and this particular webinar is also called by Intersect and QSIF. And so in today's, yes, we will share the recordings at the end, yes, and in today's lecture our today's talk is about a machine learning platform that ARDC co-invested in and was developed and in partnership at Monash and UQ as well as by QSIF. So we, I mean, we take pleasure in bringing you this important time talk and which will hopefully inform you about a very useful asset for machine learning and AI activities, okay. And so also I would like to add a note that machine learning for Australia, ML for AU community of practice is a co-led activity and we welcome participation from you for all things AI and machine learning in the research and research infrastructure sectors. Okay, with that, and today's talk would be led by Mitchell and who is all set and ready to go. And so I will hand over to Mitchell, but before that, let me also acknowledge the country that we stand and work and I mean, I work and live in and we acknowledge and celebrate the first Australians on whose traditional lands we meet, we pay our respects to the elders past, present and emerging. And so over to Mitchell now. Hello there. Okay, when we were creating this platform, we were thinking about what kinds of environments that people were working with. One of the best things that people have been doing is doing things like GPU back notebooks. Now these are ideal because they create this interactive environment, which is great for data set exploration and development. And a lot of tutorials take this form, which makes them very user friendly. This is great for iterating on your code because you can move through very quickly. You can change your function and just rerun it without having to rerun the entire script, it works great. However, this doesn't work great when you're dealing with lots of users. And the reason is when you're not actively running your code, the computer sitting idle. And this is really inefficient. It also means therefore that we can't serve as many users at once, which leads to really long wait times. And so it might mean that in some cases, there are a lot of clusters where you have GPUs which are not really being used by very much. At the same time, we have lots of users which are waiting in queue. And so the sign stops and that's not great. On the other end of the scale, we have job submission. And job submission is very efficient with the hardware usage when it's done correctly, which makes it very good for mature workloads. So once you've completed writing your script, you can then submit and have it run for 24 hours or something. And then once your training is done, you can then have a look at your things and select the best model or do whatever you need to. This gives you access to however much company you need because you can just reflect, you can just request whatever you need. And this makes them large and flexible. The downside is you now have to wait in queue and it's not as easy to get going because now you need to work with slum. It also means that you then have to refactor all of your work to just get all the hardware because things that will work with a direct reservation won't necessarily work with a queue. And it makes things difficult when you're trying to share things as well. For example, a common mode of failure that some people have when they're sharing code with other people. A batch script that works for one user won't necessarily work for another one just because of user permissions. And I'm sure a lot of you have experienced the pain of submitting a job script only to realize after you wait for three days that it just immediately crashes. And this is not great for developing new cook. So we wanted to create a middle ground. If you're a new researcher, having a requirement for a percentage of GPU utilization for your job is not ideal because you need the flexibility to request however much you need and iterate quickly. And so job submissions just don't work very well. So we need to create something that will help you to do that while also serving as many users as we can. And so we've combined together a whole bunch of services to create what we're calling the machine learning research platform, LEP. So the idea that we had is, what if we could have a notebook that can interactively submit to the queue? So to be clear, we didn't develop the software but we are creating an environment that supports it very well. This sacrifices some of the responsiveness that notebooks do have. However, the upside is that it now gives us the ability to transition quickly between development and training of Scripps because the code that you run within a notebook is very similar to what will then work on a Scripps. And it means that it'll scale up quite easily because you can change the amount of compute that you need interactively. And because your notebook is submitting to the queue you can then release that compute once you're done which means that somebody else can use it while you're not. So while you are messing around with your function and trying to work out where the tab is missing or why your folder isn't behaving as expected, that's okay because somebody else can be using the GPU. This lets us increase the uptime on the hardware and serve more users which means that you then don't have to wait as long in queue. The way that we're doing this is we are using DASC. Now, DASC is not a requirement for using our platform. However, we do support DASC acceleration so that we can have this kind of interactivity. So the idea is that you will have a Jupyter notebook which is CPU based, which will then reach out to DASC processes as required in order to use the GPU to run your tests. And this is what this looks like in code. So at the top of your notebook you will define the requirements of your SLIM cluster. This is just a DASC. This is essentially a DASC cluster which is backed by SLIM jobs. You can see there using the Job Extra Directives tag you can specify that you want a GPU or not. You can also specify what size of GPU you want. And you can also specify things like how many CPUs you want and how much RAM. Which, yes, means that at different stages in your notebook you could be requesting different kinds of jobs, different numbers of jobs, different sizes of jobs. And then once you've written your function you just submit it to the client like so. As you can see in the next couple cells you just pass the function directly to the DASC client and you can return the results to the notebook using the dot result function. So essentially what this means is you were running code remotely on another machine. And then after the job is complete it will be returned to your notebook for you to then continue to process. This does lead to some inefficiencies. However, we've run some tests and I was very pleased to see that the overhead associated with this decoupling actually isn't all that high. So as you increase the amount of training that you're doing for this test we did this test using CFR. As you can see the difference between running it directly on a GPU with a reservation which is what you can see with local and running it with a decoupled GPUs or running it on a DAS job which is the remote. It's very small and the percentage decreases over time as you increase the number of epochs that you have in the function. So, yes, it's not as efficient. However, now that we have this kind of decoupling there are all sorts of new things that you can add. So this is the hardware that we have on the cluster. We have six nodes, each of which have two A100s with 40 gigabytes of VRAM. There are 52 VCVUs, a whole bunch of RAM. We also support NVMe which you can write to directly. So if you are doing a long training job you can copy your data into the NVMe and then have this highly performance hardware that you can write from, sorry, read from. And we have two partitions with different qualities of service. Some of which are designed to support DAS jobs, some of which are designed to support GPU reservations and some of which are designed for direct batch submission. You will also be given a quota. So you will have storage of some amount. This is sized per the amount that you need. So if you have a use case which requires a whole bunch of storage for your data just talk to us, we can give you more, that's okay. By default you will be given 50 gigabytes. But as I said, you can have more than that. Just let us know how much you need. We also support group provisions in which case you will be given a smaller amount for your individual files but you'll also have a shared directory where you can all write to. And so you can share things like models and code. Currently we have 10 terabytes available for users in the cluster but we also have an additional 30 terabytes which is coming very soon. The cluster is split into two partitions. HouseCats is where we expect most users to start first. This is notebook backed by GPU reservations. So this is going to be very similar to doing your work on Google Colab where you have a GPU directly attached to your network. This is fantastic for data exploration. If you don't really know what you're doing yet it's great for visualization. It's also great if you just wanna get started because you can usually just take a notebook that has been written for any other compute platform and it will just run. It's like having a GPU workstation. However, the larger sizes of GPU are locked behind the BigCats position. And so we expect that users will start on the HouseCats and that once they get used to the platform they'll move over when they need the bigger hardware. On BigCats we expect users to be using CPU notebooks and then reach out to desk whenever they need to use GPU processing. We also support direct batch submission on the BigCats partition. And this is great for data processing or rapid iteration during your development. And you can experiment with new techniques very easily now because of the decoupling that we're talking about earlier. Batch submission is also very good for your model training and for hyperparameter suites, whatever you needed to do. You'll be noticing in the bottom left where there are these links. Those are the links to our documentation. So if you have a look at these slides afterwards you can look in more detail for all of what we have in more depth. These are the limitations that we have on each of the qualities of service. So in the HouseCats partition we have the Tabi quality of service. This is 12 hour interactive GPU reservation type jobs. The idea is because it's intended for active development you don't really need a GPU while you're sleeping. So that's why it has the 12 hour time on it. And that's also why there's only one job at a time. If you need to be doing something else like training, for example, that's okay. That's what the BigCats partition is for. So you could, for example, have a batch training run running on the Lion quality of service and then continue to develop on the Tabi quality of service. And that's fine. On Lion you have up to four jobs and we have 24 hours of all time. We're still being a little bit restrictive. We're not allowing the big seven day jobs. And that's not because we don't expect models to want to train for more than 24 hours. It's that clearing out seven days of training time is really awkward sometimes. If you need to do training for more than 24 hours that's okay. Just checkpoint your code, right? If you don't know what that means, talk to us. We'll help you do it. We also have some tutorials which you can have a look at. We also have the cheered quality of service which is designed for fast, small, wall-time jobs. And this is ideal for your dark workers. That's why we have 20 jobs here. So if you, for example, wanted to spin up a whole bunch of jobs so that you could rapidly pre-process your entire dataset, this is the quality of service for you. Lastly, I want to mention that we want to support those kinds of long, long jobs for CPU only. We don't have it set up just yet. But it is coming soon. And we're gonna call it the Panther Quality of Service. And so you're gonna have seven days with a small CPU interactive notebook. So the idea being you can kick it off at the beginning of the week. And then you can use your desk jobs as required. And so you could be running a long experiment over the course of your week that is managed by this long job. And then just train as there is availability on the cluster. It also means that you then won't have to reload your entire dataset to RAM every time that you look at it each day. We understand that there's a lot of different options here. We hope that everyone will be able to find a quality of service that works for them as they start. We didn't want to put up barriers and force people into doing things in the most efficient way possible, even though that isn't as great for the hardware. We want to meet the users where they're at. If you're not sure where to start, talk to us. If this doesn't work for you and you have some use case that I haven't mentioned, still talk to us because maybe we can work something out and then maybe there'll be a new quality of service for everyone. We're happy to work with users in order to create new features for the service. The hardware that we have is sliced up using multi-institute. So NVIDIA calls this MIG. This is the diagram that NVIDIA uses to explain what's going on. The notation is not the clearest, but essentially the hardware is sliced up into seven different compute fractions and you can split that up into these chunks of GPUs. We have everything from the 10 gigabyte size in our cluster, which means you can request what is essentially an entire 100, if that's what you want, a 7G 40 gigabyte GPU or you can request a smaller chunk like the 3G20. In the type of quality of service, only the 10 gigabyte slices are available. This is what this looks like in practice. So in the housecast partition, we have two GPUs and they're sliced up so that we make as many 10 GPUs, 10 VRAM GPUs available as possible. And on the larger ones, we have three nodes devoted towards a 120 and 210 split. And we have one node, which is two entire A100s, which are just not split up at all. So you can request the entire GPU if that's what you need. So the implications of this is that you now have the flexibility to write code with lazy execution and things can be asynchronous. So the idea, what that means is if you're not familiar with that, is that you can write things in a more optimized way. For example, you don't have to wait until your evaluation finishes before starting your next training run. You can just start that up in a new job, right? You can also now request different compute for different cells. So at the beginning of your notebook, you could have lots of small CPU jobs to do your pre-processing, spin up a new job for every sample, for example. You could use the job with lots of memory in order to load in a big data frame and parallelize it for efficient workloads. And then you can use the GPU back process when you're testing the training runs. The trade-off is that the code is a little bit more complicated. We understand this and so we've written tutorials for you, more on that later. We also still offer traditional alternatives, which means that you don't have to start here. You can start off using a direct GPU reservation and then move on when you need to. This also results in some new fail states. To our knowledge, this hasn't happened yet because we don't have enough users on the cluster for it to be a problem, but we're watching it. So we know that this will eventually become a problem and we will need to tune it so that it works for the most people. We need to have more users using this system in this way for us to test this though. And so we hope that you'll bear with us as this happens. The cluster runs using strudel2 as the interface. So the idea is you'll log in and then you'll be able to select from a variety of apps. You don't have to use, this cluster doesn't require you to use SSH. So we do support that. You can access your JupyterLab instance directly from your browser. As you can see on the left, you can select from whichever apps you'd like. For example, you could select this terminal app, which will launch with however much compute you need. You can specify how many CPUs you want baked into it. You can specify how much RAM you want baked into it. You can select how much time you need for your terminal to run. You can also just run directly on the login node. So this isn't ideal if you're doing anything compute heavy. If you're just managing files, that's perfect. If you're trying to install a condo environment, condo's pretty RAM heavy. So select a whole bunch of RAM. And as you can see, if you do something on the house cat's partition, you will have a, you can get a terminal which has a GPU reserved at the top. This isn't an SSH connection. As I said before, you can see this from the web browser. This is our JupyterLab app. We have made it so that you can select whichever condo environment you'd like. It doesn't have to be from our managed condo environments. We've created some for you to start with. We call it the data science kitchen sink, DSKS. It has a whole bunch of really common data science packages, machine learning packages. This is where we expect most users to start. It's not where we expect users to finish because of course, if you're writing your own code, you want to be able to control your Python packages. Something that you'll notice about this drop down is that they're not all from the same condo installation. The service will pick up anything else that you install on the system. And so you can launch and point the app towards anything that you install across the cluster. And you can just manage your own system in user space. And of course, this is the JupyterLab UI. Again, you can do this from the web browser. For authentication, we use SSH certificates. This is all abstracted away from you if you're using the web terminal. Sorry, the web browser. However, you can also just directly request a certificate from us using single sign-on. We have an app for that. So you can log in through AAF. You also support Google if you have external collaborators and you want to do a group allocation. But as I said, we also have a tool which means that you can get a certificate that will let you SSH in directly to the cluster if it's so you can manage your files. You can also SSH in directly to a job. It doesn't have to just be on the log mode. We've also worked out a way for you to do that. This is how Struidl works on the backend. Essentially, the idea is you will be using the front end, your sign-in, you'll select what you want. The SSO provider will give us a thumbs up, a thumbs down, and tell us who you are. And then we will provision a certificate that either will be downloaded if you're using the tool or will be stored in your browser so that you can then interface with the cluster. If you don't understand what all this means, don't worry about it. Just use the web browser and it'll be fine. And I recorded this GIF of the user floor. I hope this helps to cement everything together and make things clearer. So as you can see, you can select from whichever compute region you have an allocation for and then just load your job. This then lets you run whatever you need to in the browser. When you're ready, you just clean up the job, close it, and you can also do the same with the Jupyter Lab. I hope that the streaming is actually going okay and this is in the slideshow for you guys. Okay, it's now looped. So I'm gonna move on. You can also connect in once you have an SSH certificate through the VS Grid Remote Explorer. Again, you can also do it by connecting to a job. So you can be using your Python debugger inside of a job that's attached to an A100. We do support this. There are a few gotchas. So if you're going to attempt this, have a look at our documentation first because it can lead to some unexpected behavior with batch submission if you follow VS Codes instructions. So just double check what you're doing before you blindly try and do this. However, it is a fantastic way to work. All of our documentation is programmatically generated. It's publicly available from GitHub. We encourage you to clone a copy. We encourage pull requests. There are tutorials up there, which means it also means that because the tutorials are run on the service and because the code is generated from the cluster, we can be pretty confident that things will work for you if you just clone it. Of course, there's going to be bugs. This is a beta service, but as you come across things, let us know and we'll work with you to fix it so that we can make things better for the next users. They'll just be quiet. Please come to us. We have some tutorials that have been written. At the moment, we have been focusing on how to do things with DASK and SLAM because we expect this to be the primary point of friction with the service. However, as users develop more maturity on the platform, we want to try and expand this too. If you aren't sure how to do something, let us know and as we seek commonalities and what people are struggling with, we will work with you guys in order to create better documentation and better tutorials to help people into the future. So the requirements for using this platform is just that you have to be doing some kind of machine learning. And one of your project members needs to be a researcher that's based in Australia and New Zealand. Beyond that, it's really quite open. So we would love to have more beta testers. We would love to have more people on the platform. If you are interested, then just let us know. It will not cost any of your funding. What we do expect from you is that you're a bot bugs. You help us to fill in the gaps in the documentation and you give us your user feedback because this helps us to improve the platform and therefore serve more users. We also expect that you help each other as a community. We do have a community Slack channel so you can reach out for help and I'm always present to help people. But as the community grows, there is only one of me, unfortunately. So if you guys can help each other, that would also be fantastic. Beyond that, we just expect that you be patient with us because we're a small team. If you would like to sign up, there is a QR code. There is also a link to a Google form. That brings me to the end of my part of the presentation today. Are there any questions? Thank you, Mitchell. Before we take any questions, maybe let's pose the questions for a little bit. But this is a fantastic presentation and thank you for taking us through. And let us also see if Oliver would want to talk about any upcoming plans for training and then we will take the questions for both together, if that's okay with you. No worries. Well, I might as well just do my section of the talk and we'll take some questions at the end of all that if that's all right. That's good. Sure. I just want to ask questions now, like, let's not, well, they're still fresh. So you want to differ your part of the talk till the end, is it? Well, if there's questions now, you might as well start. Yeah, if there are any popping questions, it's okay. Otherwise, let's finish the talk. I don't see any questions on the chat. Okay, the question that I have is any plan for training? That is a good segue into the next part, Oliver. There are plans, but they're very fresh plans. I'll touch on that at the end of my section. Sure, sure. And so you want to jump into yours now? Sure. Okay, so I'm a research analyst at UQ and I've been working in collaboration with Monash for some time now. I was with this project that gave birth to NLV last year. My focus is slightly different. I've been looking at optimising very large data sets used basically to train neural nets. And so I've come at this from the HPC cluster side of things where we want to put multiple GPUs together to churn through data as fast as possible. Like some of the use cases that I've come across, we're looking at about maybe three months of wall time just to see if it was possible to train some neural nets on one high-end GPU, which if you're a researcher trying to publish a paper, that's probably a no-go if you're going to burn that much time to work out if something's viral. So optimising these processes for speed, if the resources can be allocated for it, because it only means adding more GPUs to the fray, sometimes it is essential. Now, in that optimisation and speeding up of training neural nets, you're always going to fall back on profiling to see where things could be improved and probably 80% of what I've put together as far as profiling on HPC clusters is applicable everywhere. So what I'm doing is I'm putting in, I'm putting together simple workshops and referencing of material that should be applicable to a general set of users. Now, a lot of the users that I come across maybe not machine learning experts or computer science experts per se, but they'll have expertise in a domain. And typically what you'll do is you'll copy code from elsewhere and try and run it on our HPC. You might be adapting code because the top data you're dealing with is very close to what other people are doing. And a bit of profiling and looking at how efficiently it runs can be a great service when you're having problems. And also when they scale up from toy data sets to what's going to be a production data set, you can also run it to brick walls. On the large end of it where we are getting teeming up multiple GPUs, I'd call those users more expert users, they're gonna have to have pretty detailed knowledge about how their models work and how clusters are tied together. They don't probably know more about their particular area of machine learning than what I would. So for those users, I'm basically putting in the right direction with regard to how to make the bytes flow through a machine quickly. I'm more of a plumber than telling them how to implement their particular machine learning code. And finally, we've got our system administrators, they wanna ensure that the libraries, open EPI, everything's configured properly and that we're using hardware efficiently. So they're the three types of users and my particular focus is being more on the general users, especially with regard to something like Mlip. But we go to the next slide. I'll talk about the platforms. So everyone has got a laptop or a PC with some sort of GPU in it. And you can do quite a bit with small datasets. When you've got code together on your workstation, you might wanna run on a HPC node with a single GPU because that GPU is gonna be a lot faster than what you'd have available on a workstation or a laptop. Also, you've got a lot of CPUs you can use and in a high performance disk. So then you can move up to the multi-GPU, multi-node clusters. And finally, I'm gonna address how the work that I've done can be deployed on this Jupyter and desktop texture that Mitchell covered. A lot of it is gonna be basically combining what you're doing on a workstation or a single HPC node and like I said earlier, probably 80% of what I've done is gonna cover that. So let's go to the next slide. I'll have a controller slide on. So the most simple case is basically the IO pipelines. Oh, I've got to say, when I say profiling and optimization, I've got a very simplistic definition of it. Basically, you look at the total amount of time your job runs and how much percentage of that time your GPU is sitting idle. If it's sitting idle for a long time, it means you have to look at what your code is doing. So we wanna have high utilization of the GPU over the total time of the job. So most of the bottlenecks are gonna be in feeding data from disk and doing some preprocessing to make it ready to be fed into our models. Fortunately, to solve that issue, most of the framework, all of the frameworks have got nice profiles built into them. You can just turn them on programmatically, run the code, get your hands on the profiling data, and it will just tell you your GPU is sitting idle because you've been waiting on IO and then you might even have suggestions about what you can do to speed it up, which sometimes for users, it's just an oversight in the way they've either copied that works down or, you know, not able to get up their training code. And there'd be simple suggestions given by these profiles as how they could do that. So this would be incorporated into workshop, like make sure that you're streaming the data as efficiently into the GPUs as possible. Something that caught my interest with all the talk about DASC is that DASC can be used for staging data, so there's potential there for pre-staging data without tying up a GPU and then running the GPU code once all the data's been put in place. And that might involve some of the preprocessing that would be done in a single-shot job where your entire job is taken care of getting data preprocessing and fitting through the GPU. So it'd be interesting to break it down into a CPU-only job, leave the expensive GPU resources alone while you're doing that and then bring those GPU resources online once it's been done. Like honestly, just the IO side of things, I think would be at least 70% of what a novice user would be dealing with as far as profiling and then demilisation would be concerned. And it's quite easy to do on a Jupyter environment. You can turn off the profiling in the code to say, switch on profiling, give it a path to where you want to put the data and then after the job team run, pick it up and analyse it using the same GUI tools that you do for looking at how your training parameters are shifting as you do a training one. Now, I have got a pretty simplistic view of it. It's necessary to be able to fit this sort of content into workshops. So I'm not being glib. The second thing I look at is the memory footprint of what's being done now. I'm saying that I'm not being glib because I know there's a lot more to it than just this, but these are just two simple things that I'm trying to cherry pick to say this is how we go forward in trying to optimise our ML codes. The memory footprint issue is when you either try and put in a data sets to be for the intention of the code that you've downloaded, or you're not paying attention to how your data set will scale up and fill up a GPU, let's say you increase a sample size. And with the same profile as using as a framework, you can have a look at how full your GPU is going with a single job. A lot of times it's just going to stop running and say that I'm out of memory. But quite often, well, occasionally it'll run, but it'll run very slowly. And there the profile will actually give you hints. You could cut down some of the data types and some of your layers, like you're using full floating point numbers, cut it down to FP16s and that might help. And also, if you see that the memory footprint is too big, you can move to a larger GPU. The way that memory can be a bit problematic because of the way that these pipelines behave. So this is sort of just a guide as to the fit or it doesn't fit. I wouldn't be able to go in a workshop environment too deeply into how to fix a problem past. Okay, we can just change some memory types and see if that will run. As long as the system converges properly with say a reduced memory type. But basically it's a go-no-go type of deal with the profiling. How close to the sun can we fly? And when you crash with the sun, you need to have quite good knowledge to fix it. So if we go to the next slide, then we have the more complicated stuff, which is when we get putting multiple GPUs together to solve problems. Now what we're concerned with is how much time is being spent communicating from one GPU to another. To do that, you're using a new set of profilers. Sometimes they attach to new frameworks or supplied by vendors themselves because you don't get the information on how some of this inter-processed communications is working. And when you tune it, you have to actually go through and modify your code a little bit to change back sizes and reshape things to keep that inter-processed communications down, and then you have to also monitor how your convergence is going if your particular model is agreeable with having those parameters changed. Also memory visualization comes into it depending on what sort of strategy you're using for multi-GPU stuff, whether you're using parameter servers or ring-reduced-style things. Things get a little bit complicated. Also because you're in a multi-GPU context, you're generally dealing with a lot of data, and that generates a hell of a lot of profiling data. So here we have to focus on sub-sampling the data, like taking slices of profiles where you think you're focusing on performance because you just can't handle that much data that's being sped out by profilers on multiple nodes and then combining them and trying to make sense of it. So you have to take little sections of code or run it on one node and try and extrapolate what's happening in others. So it becomes like a logistical game. And finally with that, something that probably only is relevant to this is having a close look at how drivers and hardware and operating systems are interacting with each other. I've only been there a couple of times in two years to make sure that some specific NCSE calls were being used on our generation of GPUs. It wouldn't be something that the general user would use. We go to the next slide. So there's going to be various tools. The easy ones, like 80% tools to solve these problems are the ones that are typically bundled in with your framework with these intense flow by approach or whatnot. They have nice GUI interfaces and they also give you like a little list of what you can try to make things run efficiently. That would solve the data IO issues. When you're looking at things like trying to tune memory, you need to combine the data from the profiling tools with the data that's being spat out by your training ones like to more of a convergence and what's happening in your model. So that needs to be covered. When you go past straight IO, you have to look at, is your optimization actually affecting how the model is performing itself? But again, here we're using data that typically comes out of the same framework. So it's not difficult to marry up. And then when you move into the multi-GPU side of things, it's a little bit more difficult because you've got to marry up data from different types of profilers and you have to have a bit of knowledge about what's going on in the hood to bring it all together. And different cases need different remedies, I suppose. But one thing with the multi-GPU stuff is in most cases, if you've taken care of the IO and the memory, it can kind of just work. And if it works, you know, you're getting a speed up and that's good enough. You don't need to profile it. And when it doesn't, it's usually because the model itself isn't going to support what you're doing to the hyperbranders to ensure that it runs on the GPU. So it's not really like it would be used in maybe, I don't know, 10, 20% of the cases where you're doing multi-GPU work and there are problems. Simply you'd solve them with the easiest stuff first. Can I forget the next slide? So yeah, generalised approaches is just look at your GPU utilisation over the total runtime, which is easy to do. If you're using, you know, like, so that's the distributed data or you're using HPC type, you know, custom file systems, you need to be aware of the lead time of getting data available. We've got a whole data system. So you just got to be aware of the profiling is being done when the data has been put in place and to discount that from, well, to be aware that that's not a part of what's being addressed here. The other approach is with the memory. Again, I mentioned this. It's either you can get the memory to get the utilisation to be as close to maximising a GPU and when it gets more than that, then you're going to need quite a good knowledge of what's going on with the hood, just like, you know, how close to the sun we want to fly it. And with the multi-GPU stuff, we're definitely looking at how much inter-GPU communications is going on. Okay, we go to the next slide. Okay, with respect to the MLERP environment, again, the profiling, the easy stuff can simply be done from within your GPU notebook environment. You just got to know where to place the data, where to pick it up and how to feed it through, you know, the GUI tools after the run. Again, I've said I'm only really looking at the GPU utilisation, so I'm not looking at how much time is being spent, you know, staging data. And I haven't spent any time looking at profiling, say, Dask itself, although there are options for profiling Dask. So how I'm seeing Dask is basically a service to stage the data and kick off like a single GPU job, even though Dask can actually perform distributed GPU operations itself. And I'm pretty interested in what happens there, given that you can split the data staging and the GPU processors in Dask. So I think that would be very useful. I think I've come to the end of my spiel. But what I'll say is that I've put some workshops together for the multi-GPU stuff. And I'm going to do the same thing for profiling. And I'm going to cover the simple cases. And what I want to do is allow a user that isn't, say, a machine learning expert to know when they're going into an area where things are problematic, so that would just keep on pushing aimlessly. And that would be when you're coming to, when you're bumping into, when memory utilisation is going overboard and you've got to look out for how to work out how your data scales in a model, how you think about fitting that into a GPU and that if you go over that borderline, then things get quite complicated. So it's a way of saying, we'll put some guide rails down. The first one is the IO. That's an easy one. It's straightforward. And then after that, it's like how fucking you push it until you really need to get to know what you're doing or where you need to go for outside help if you're not an expert user. So there's going to be workshops, coding that, and also reference material as well. So that would list how to get the various profiles going on your particular framework, the plumbing of moving data around and analysing it. And I think that's been quite good because then you get more efficient use of the resources and some of the machine learning effects could be more tractable, I think. So that's what I'm doing. Thank you, Oliver. That was good. And yeah, so actually you have delivered more than just an introduction to upcoming training. You have delivered the training itself. That was wonderful. No problem. That is fine. And so this is good. We have a couple of questions. We don't have a lot, which is good in the sense so you have filled that time. But is there any plan to pre-test code before running it either on cats to make sure it will execute without a risk? I mean, I think someone is looking for a test environment. So we expect that you should be able to just test things directly on the platform. So you can request a CPU-only job if you're not ready for a GPU and that's perfectly fine. DASC has an option called the local cluster. So rather than running it on a different machine, you're still writing DASC code, but it's running locally, which means that you can test things without having network dependencies be a problem. You can also just directly run the Python function to test that everything's working first. So what I would suggest is the way to go. You would start by just writing your Python function, run it directly in the notebook in a CPU environment. Then when you're ready to scale up to DASC, you can test things with a local cluster and then finally move on to a slim cluster when you're ready to start using GPUs. We do actually have a tutorial which covers how to use local clusters and things for testing. Have a look at our documentation. Mitchell, I have a question. Actually, it's Mitchell and Oliver. Could you outline what would be a prerequisite for a user of this platform? That's a hard one. I've come across quite a few types of users, say very good physicists, great knowledge of computational maths, but lacking in how to, say, apply machine learning to some of the work they want to do, but that they'd be able to get it eventually right. So, all the way to machine learning gurus that know a lot more about that area than, say, what I have. So, the users come from a very broad gamut. What we're trying to do with this platform, with making things available over the web browser, providing some amount of directory GPU reservations, we're trying to lower the barriers eventually. We want to make it so that if you're doing some kind of machine learning, even if you're brand new to the field, that's okay. You can start with running things on house caps. It's a very user-friendly environment. What provided you with condo environments that you don't have to battle through trying to create one yourself if you don't feel like it's ready. It may not have everything that you want to starting out, but when you're ready to start installing your own packages, you can clone it and use that as a basis. Also, we're able to help you if you are having trouble during just feeling like you have to suffer in silence. I would suggest that the only real prerequisite is that you're doing some kind of machine learning because that's who the platform is for. I don't think that you need to have that as a... I don't think that you need the knowledge of how to do the machine learning as a prerequisite because we want to make this platform a place where you can learn that skill. Yeah, I agree with that. I've seen it in bioinformatics as well where people were using machine learning for image analysis and analyzing microscopy images, and they certainly didn't need to know a lot of knowledge of machine learning to get really good use out of it. Something that I'll also point out, if you're a new user and you find that there is a point of friction in the process, let us know what that is, and then if there's enough other people which have similar points of friction, we will work with you to try and remove that from the system. To give an example, we had a user a few months ago who found that our pre-built environments just didn't work for them and because they didn't really have very much experience in Bash, they didn't really know how to make their own environment either. And the way that we used to have it set up, you had to point struttle to the environment that you set up using a JSON file in a specific place, which was just not very user-friendly at all, which is why we came up with this new method of reading the .conda-slash-environment.txt file that condo maintains. So now you just sort of get it automatically unless you do something funky with your installation. So now you just select from the dropdown which environment that you want, and that should include user environments now. So we've now removed that point of friction. I'm sure there are more points, but we won't know what there's on unless you do all this. That's a great point to actually emphasize. So in essence, please come and work with the platform. And as you learn and as you actually see issues, please let the team know. And Mitchell, Oliver, and several others are there to help you. And there can be some future courses that Oliver might deliver. There can be some... There will definitely be some... Yes, so Oliver will deliver. Yes, in next year. And so keep us informed as to what you need as well. So that would help. And also for Mitchell to tune the platform, upgrade it as it develops. There is another couple of questions. One from Lorenzo. It says, I just noticed that documentation is based on Quattro. Since I'm quite interested in Quattro I started to introduce my daily work routine. I would like to ask which is the typical process that you adopted and used to write, maintain, develop, deploy and ML ERP docs with Quattro? It is a bit of a tangential question, but nonetheless, Mitchell, you can swing it away. Sure. So Quattro, we deploy that using GitHub pages. It's something that Quattro just supports out of the box. So the workflow is that you make whatever changes you need to locally. We have an installation of Quattro on the cluster. So this is where I'm using the VS Code, remote server feature. So I'm SSHed in using my VS Code. I write whatever changes I need to. I run preview. The extension then shows me the rendering of the docs. When I'm happy with it, I then run the command to push it up to GitHub pages. And it just pushes up to the GitHub pages place. The URL, we changed using a nectar URL. It's just a DNS entry that we created. In theory, you could CI CD this so that you don't have to run the command. I've just been lazy about it. Yeah. We, as I said before, we do encourage people to make their own copies of the repository and create pull requests. So you should also be able to do that. In the documentation, we point out where the installation of Quattro is and we tell you how to make the adjustments to your VS Code extension to point to it. So if that's something that you want to do, have a look at that page. Excellent. Thank you. Thank you, Michelle. So there is another question about changing the language. One is asking whether MATLAB could run on, or could be made to run on this. Sure. Yes. So I don't current, I don't, we don't currently have any plans for MATLAB. One of the big problems with MATLAB is licensing, because unlike Python, you need to have a certain number of licenses in order to run MATLAB code. That being said, though, that doesn't mean that it isn't something that we couldn't try to support. We have been having some discussions about half, for example, because even if I isn't the best language for machine learning, maybe research would want to pre-process the datasets on it. We don't know a good way to isolate our environments there yet. So that's something that we need to talk to users about. And also, the MATLAB is octave viable. Yes. That's the other thing I was thinking. Rather than MATLAB, maybe it's octave that we support. For a start anyway, because MATLAB licensing is, you need to hire a staff member just to do the licensing. And Rachel asks whether there are any alternative to Jupyter notebook. Sure. So if you connect directly in with your VS code, or any other ID for that matter, I know someone tried to do it with PyCharm once. It is something that you can technically do. But you could just run the code as a script directly inside of a terminal. Do you do that? You could also use VS codes notebooks rather than having to use JupyterLAB. If you can name a way of running Python code, you can probably find a way to do it on Globe. If I haven't mentioned it and you're not sure what that would look like, let me know. And we'll work with you. Okay. And I think that answer satisfies Rachel. Conan asks if he could install mallet and use Python. I don't know what mallet is, so I can't answer this question. Actually, neither do I. What I will say is that if you can install it in user space, there's nothing stopping you from running it. Afterisk, if you think it's something that more than one user could use, then once you work at how to install it, let us know and we can make it available for more people. That's good. That's good to know. Okay, he says it is Java based. I've never tried to run anything Java based on the cluster, but if it can run in Linux, there's no particular reason why it could. Sure. Okay. So whether this platform exclusively caters to Python is another question from Rabia or if it extends to functionalities to other programming languages such as C++ and Java. We've been focusing on Python just because that's where we think a lot of researchers are trying to start with for their machine learning code. It's not that we don't want to support other languages in the long term. So like I said before with MATLAB, we don't currently have plans for other things, but that doesn't mean that that can't change if there's a sufficient user model. What would you be using C++ for directly? Well, I mean, most of my C++ wrapped Python functions anyway. Yes, but I also know researchers which directly write the machine learning code and C, right? Yeah, I mean, you can write directly in TensorFlow C++. Thanks to the API, but it seems pretty hardcore. Actually, I might just switch back to the other question slide that we have since there is a QR code on it. Yes. So there is a QR code and please sign up. And some people have asked about Slack channel and you cannot join the Slack channel directly. And I stand corrected on that. You have to join the platform and in the process or join or send an email to mlerphelp.com. And so with that, I think that's that. Okay, Pauline is giving you what Mallet is about. There is a GitHub page for us to understand what this is. It looks like some members in the community appears to use it. Absolutely. So it's good to know. And so if there are no other questions, this is really a wonderful session. You have, it's a very complex topic. And that, but you have made it accessible to, I take vast majority of the users and we want more people to come to the platform and use it. And this, this, this would be a good way for, I mean, further augmenting machine learning. In the Australian research community. Look, I've been on the side of the MLA project for, you know, much more than a year now. I think it's going in a good place. I think there's some good decisions there. Yeah, it's good. And then making that the mission of making the Jupyter notebooks available to users in common is a very good one. It solves a lot of problems. So, and absolutely. And also, please join machine learning for a Australia community of practice ML for AU COP. And so we will actually feature more of this development in this community. And also, I hope the community that develops or at least a sub community that develops might be directed towards this ML ERP and and its usage. So we would be so very happy for you to join there in your invitation. It should be there. And I will link. Here, let me post a link for you for ML for AU. There it is. Okay, just call it by number of organizations and including ARDC. And we welcome participation from the research and research infrastructure sector. And before I close, I would like to tell you about the last webinar as well as as well as the one that are upcoming. We have three more upcoming webinars that we would be co-presenting. And so let me post that and the slides for this one and the last webinar would be posted soon. So, so they would be made available to you all the everyone who signed up, they will receive the slides and the recordings for this webinar, as well as the one for the last one, which happened last week. So there is one every there is one next week on the fifth and then another one on the seventh. But having proposed there could be another one in either later in December or early January or in after mid January. So with that, let's show our appreciation for Mitchell and Oliver for taking us through. Thank you so much. Great work done. And then until we see you and see you for the next event.