 Hello, and welcome to OpenShift Coffee Break. Good morning, Eres. Hi. We have our friend, Eres Kizong, from Run AI. It's time for a coffee for, it's early morning for me in London. Eres, where are you calling from? Eres, you're on mute. Eres, can you hear us? We did mic test before it was working. I'm not sure what. Eres, why don't you try disconnecting and reconnecting and we will see if that works, okay? So today, Eres will be talking about Run AI technology. That's, let me see. Let me see if he's back. Hello. Yeah, can you hear me now? Oh, there you are. There you are. The guy. Where are you calling from, Eres? Now I'm in Israel. I'm calling from Israel. But now all the internet and my coffee is ready for the morning. And you can see there's a nice load on it as well, so. He's not, it's not like espresso like mine, but. Yeah, yeah. So it's happy to see you. I'm very happy to see you. Good to see you. Eres and I used to work in the same team in Red Hat. Okay. So, Eres, tell us a little bit about yourself and what you're going to tell us about. We're going to talk about today. Yeah, perfect. So thank you very much for this opportunity to have this conversation. My name is Eres Kirsten. I'm a principal solution engineer at Run AI. And I came to talk about some of the challenges in machine learning and AI and workloads, a little bit about what's happening, what's happening around this resource management challenges that we have. And how Run AI tries to address and make this world a better place, especially when you use GPUs in a large scale. And as I understand, we're talking about resource authorizations, specifically about GPUs, in the context of Kubernetes and OpenShift. So one of the things that has been happening in the last few years is that high performance computing has landed on Kubernetes and it's slowly becoming the default standard for HPC in the AI ML community. Therefore, that's the need of adding these capabilities of managing those resources and optimizing that utilization for OpenShift. Yeah, exactly. I mean, it's exactly what you said. We see more and more workloads becoming cloud native. We see Nvidia with their NGC containers. But we noticed that today Kubernetes scheduling solution are not always optimized for GPUs. They cannot reschedule. They cannot have priorities. They don't have fractions. There's a lot of things that are missing that we thought to bring to the table and add it as features to everybody who's running Kubernetes. And I think the conjection with OpenShift as an enterprise platform will create a big and very good round solution. And I was wondering, not everybody that joins OpenShift Coffee Break has the AI ML background. So I was hoping that maybe with the aid of subslides, you sort of set the seed and help us understand what the problems are and the sort of solution that you guys are bringing to the table. Do you have something for us? Yeah, I do have some stuff made especially for you. I can share my screen one second and I hope this is going to work out. Let's see if we're lucky that we do a mic. Yes, let's see if this one works. Okay, so ready to share? Yeah, can you see my screen? Yeah, you can be maximized the presentation. Yeah, thank you everybody for joining. Again, I wanted to be really open conversation. You can add things to the chat. We do love open discussions and you'll feel free. And again, as the host, you can interrupt me. I'll be monitoring the chat. You'll be monitoring. Yeah, perfect. So yeah, in this conversation, we'll talk about how RunAI actually optimize AI workloads on RedEye OpenShift. And we'll talk a little bit about what's going on in the world of high performance computing, machine learning, IML. And we'll walk it as more of a journey conversation. Although we have a lot of slides, they usually prefer as a technical person to make it more of a journey and to jump from one slide to another if it makes sense. So this is a little bit about myself. I'm a principal solution engineer at RunAI. I used to work in RedEye before, amazing people, and I'm still in contact with many of them. In my background, there's more from Kubernetes, HPC, Cloud, so a little bit of both, so I can bring some of the knowledge to RunAI. So let's start first to see what are some of the challenges or some of the infrastructure challenges that we see that we see in the field is that there's a lot of data challenges that we see. But in terms of infrastructure and compute-related challenges, it's something that how do you manage a lot of resources and how do you make the infrastructure part less complex? And this is something how do you utilize all the hardware that you have? And this is something that we see, of course Kubernetes helps a lot, but we see it as a very big challenge when you talk about AI infrastructure, it's complex. It's a little bit different than running regular workload because they have different types of characteristics, the retention and the type and the people or AI predictioners who use the system have different topics. So making infrastructure easy is the first challenge that we see actually in the field. And how do you build the right foundation for your AI? So we have the cloud-native transformation for how you make a cloud-native, which is a challenge. And then you have the AI ML data scientist, which when I say machine learning data scientists, they use a lot of data. They prepare data, they have their own way to work with the data. And combining those two to hardware accelerators, because now people need to use GPUs. And GPUs or graphical processor unit, when people use data scientists, they can accelerate the workload using hardware accelerators, mainly from NVIDIA, so they can shorten the time. But when you build the infrastructure, and this is actually what we call the triangle, you have the cloud-native, we have the pods, the containers, you have the data scientist, and now you need to bring it. How do you make use of your hardware accelerators in the best way? So this is the foundation that we see many times in the field, how people and companies work building their own AI initiative. And I want to talk a little bit about the hardware accelerators, because at the moment, we see that many hardware accelerators are used from high-end enterprise customers, and sometimes it's hard to get hands on all those resources. They're becoming more expensive. There is shortage. Many companies, it could be Facebook, Tesla, Dubai, a huge amount of NVIDIA GPUs. And the reason is because they want to make and accelerate their workloads, to accelerate their machine learning type. So this is one of the challenges. How can I get my hands on it? After I got those GPUs, how can I make sure they're not idle in the system? Because about expensive GPUs, how can I make sure they're not just resting? And I want to make sure they're fully utilized. And this is some of the things we bring to the table, because we don't like idle GPUs. We want everything to work all the time. And once people have the GPU, this is a questionnaire that we've done is, okay, you have GPUs. How can you now share them? How do you share those GPUs? So if you decided to share those GPUs and not give each GPU to one researcher or whatever, we can see that many is a manual request. People create Excel files, say, hey, when you're done with your GPUs, can you please call me and give me the GPUs? Maybe they can do a tracking system. And the reason we see it is because it's not a dynamic location. Once, especially in Kubernetes, once Kubernetes gives you a GPU, you're hogging your handcuffs with that GPUs till or you finish your job, or you stop your job and give it to somebody else, because this is how the system is more designed. And we can see at the field that it's a very, very difficult matter. How do you share GPUs, especially when you have many of them, and manual request is something super popular with data scientists. And I'm going to talk now a little bit of a recap around the data science and the new tools. At the moment, there are 100 tools that you need to choose from to start working with machine learning data science. And if we're looking from the left to right, this is usually the process that we see in companies. The first thing is data gathering. You need to take all the data. It can be pictures. It can be from the web. It depends on your business logic and goals. And then you need to clean and validate the data. So you have the data, you clean everything you need from it, which is the stage, and then you label it and you're preparing the data for the training model. Now the business, they said, okay, I want to see, for example, I'm building an autonomous car. I want to make sure that when the car is driving, I want to train it to understand traffic light. So now you're giving a thousand pictures of different traffic light. Sure, that you can actually understand what traffic light is and you try to even mix it. So you're doing experiment tracking. And this part is very, very heavy because you just hold up GPU power. Long models creation can be sometimes days, hours. And this is the training part. And once you're ready, you want to take what's called inference. After the model is trained, now you want to say, okay, I trained the model, I want to see if now the car can really understand what the traffic light is. So you're giving, you're throwing it pictures of traffic light and it gives you, yeah, it is a traffic light or not, but this inference, it's a small portion, it's a small file of the training. We can see it by the way in mobile devices. When you take photos, you see those square boxes because actually the phone understand what a face is. But this circle is an ongoing process. You check the inference, maybe you'll find error, you need to retrain it and it's a cycle and always goes training and inference as a loop. But the thing is, is you need to choose a lot of tools and it's a different way of working if you compare it to DevOps, for example, because in MLOps, it's a different paradigm. You need to take the models and you should take the data. So it's a little bit different from DevOps in the sense that it's more focused to machine learning and AI in a workload. So many people who come from the DevOps field now say, hey, now we need to learn a little bit more because it's a different set of tools, a different set the system is running and the jobs are different. It's not like a web service or NGINX or something like that. It's a little bit different. So it's a new best practice that you have in terms of MLOps mindset. Moving forward, this is, I think this is something that Red Hat even actually knows much better than me in terms of the cognitive transformation. And you go ahead. I do have a question. He sounds like a question from the Pinger Gallery in the sense, but as it were, when you were speaking about the different phases, RGPUs mostly used in the machine learning phase or is there a usage also, let's say the execution of the learned algorithm in the field or that doesn't really require that much computing GPU power therefore, and because they are expensive, it's probably not a good idea to have them run in the field. So the GPU is mostly at the machine learning phase. Am I wrong with that? Is that a fast statement? No, you're right. It's a very good question. Is that after you train the model, model where the model is running afterwards, if we talk about edge devices, for example, we see that today, even in phones, even in cameras, you have small GPUs. You have created small GPUs, boxes as well. So you don't have to use it, but if you use it, you can get a little bit more performance at inference. So you have a decision to make, but yes, when you run training models, you need more GPUs than inference. However, we want to give the opportunity that if you run an edge, you want to use a GPU or a fraction of a GPU, you want to give the opportunity to use it, but yes, you can actually decide what to use. However, yeah, I do see a lot of small compact edge devices that use small portions of GPU just to accelerate the results. Right? Go ahead. Yeah, perfect. Yeah, so this slide actually from Red Hat, it's something that Red Hat did, is what's running on Kubernetes today, what workloads. So we see that it's actually happening right now. I mean, people are using machine learning and Kubernetes are using machine learning and OpenShift. So people are actually using it as we speak. So it is popular. It's getting a lot of momentum in the past. It wasn't that possible. And now, you know, Red Hat has a lot of tooling around it, the Open Data Hub and Open Data Sign. So it is something that becoming more popular and more companies actually use it. And but there is one problem, is that many people are using it, but today that the fact that Kubernetes is not designed to run all those workloads because of the way it was, but it wasn't designed to run AIML workload. For example, there's not no mechanism to do advanced priorities or policies. What about fairness algorithm? Maybe each team needs a different priority of amount of GPUs they want to use, or topology aware. So a lot of the management around AIML is missing in the current Kubernetes scheduler. And this is something that it wasn't designed in the beginning to use those types of workloads, but we see it in the field that it is causing some of the problems that we need to manage all those resources. And again, this is more of a Kubernetes thing that was not built for those new set of workloads. And I can show you some of the example or some of the challenges that we see in the field is when you have a small amount of GPUs, again, this is a Kubernetes cluster at the right, and we have two researchers at the left, and we promise each one two GPUs, for example. So it's not much of a challenge when you have only four GPUs, but when you need to scale and you have much more GPUs, you want to say, hey, you know what, maybe I need to take a light blue team, and I want to give them different resources. Maybe I want them to have 10 GPUs and the other team want to give less. So I don't have any tools today that act in set priorities and select governance around different teams that I have. And I cannot enjoy fairness scheduling algorithm. I cannot do joints, priorities, and maybe I want to run more workloads on the same GPU. And this is example of fractional GPU. Maybe I want to use not the whole RAM, but some of it, because I have a Jupyter notebook or something interactive. And today it's quite challenging to do it with not much tooling. So this is the first challenge that we noticed in the film. This is one of the reasons why RunAI was created in the first place, to handle all those resources, requests, making priorities and scheduling, and fairness algorithms to make it work. And the second challenge is monitoring how can we know how the system is really used? Because maybe you're running a job but you're not utilizing all the GPU. Maybe you're using 30% of it. So what does it mean? Maybe we can put you on a high-end GPU or on a low-end GPU or how to make sure that all the system are utilized. So giving a bird-eye view of how all the resources are used in the system is super important to understand how the system is used, who is using what and governance around it. And later you can build even build systems. You can do a lot of API calls. Could we support actually everything? So those are some of the challenges that we see today when you run AMML workloads using Kubernetes. And this is why we created RunAI. So RunAI was created to attract all the little GPUs in the back of a pool, the GPU pool, and by adding our smart or called super scheduler, which reside, in this case, if it's an OpenShift, it's another scheduler in the system. So you have actually another schedule of RunAI. And using this RunAI scheduler, you can now submit jobs using RunAI scheduler and get all the smart scheduling in the system. So this is the main thing, the main two layers that we create in RunAI is how we take all the GPU pools create one big pool out of it. And when we can give those GP resources based on the run it your way or run it your way, you can select whatever tier you want. You can use RunAI tools to submit jobs or you can use different, you can use Kubeflow, you can use Airflow, Jupyter, you can use OpenShift, Jupyter notebooks if you want. For us, it's transparent. It's very important for us to be open, truly open to other tools that are needed. We don't want you to select which tool to use, but you can use whatever you want from RunAI to what's happening in the field. And given this platform where we can actually manage the resource, give you a cloud-like experience, it both makes the system quite unique in the absence of how we can manage all those resources in a smart way to the data scientists and data practitioners. Yeah. We're interested here. So when you were talking about Jupyter notebooks, so whether Jupyter notebook runs on Kubernetes, it would be running in a pod and so the Jupyter notebook server runs in a container. That means as long as that Jupyter notebook server runs, it has an allocated GPU that is dedicated to that. So if the data scientist is like editing the old book and not running anything, still the GPU is not available to anyone else. Is that or does it work a different way? Yeah. Super good questions. So I will talk a little bit about roadmap as well. Yeah. So when you run a Jupyter notebook and run AI, it's running inside of RunAI. And yes, it's called an interactive session. Interactive session means that we will not preempt it. We will not kill it because it's interactive. However, if nobody's using this pod for a certain amount of time, we can have a way to suspend the job or even delete the pod. If nobody's using it for, let's say, one day, two day, you can decide. What you said is true. When you use Jupyter notebook, you use a full one GPU and you actually not using your resources in a smart way. However, one of the tools we added are actually two tools. The first one is maybe you can use a fraction of a GPU. Maybe you can use a third of the RAM of the GPU. So you can run more workloads of the same GPU to be more productive. So imagine you can run five or six Jupyter notebooks on one GPU. So you're using your hardware in a much smarter way and let the data scientists work and interactive. In the roadmap, which will be released very soon, the problem you addressed is a well-known one. Somebody's using the Jupyter notebook, but he's not using the GPU inside. He's writing some code and this GPU is not doing anything. So it's going to be released quite soon. We have a term called swapping. So what will happen is he will work on a CPU, but the minute he needs a GPU, we will copy the memory from the CPU to the GPU and then he can allocate a smart way of GPUs between Jupyter notebooks when they need it. And then he can use that GPU and to move on the fly from one user to another. Again, this is roadmap. It's not there yet. But at the moment, the way we solve it is we use fraction GPUs in Jupyter notebooks. Yeah. Okay. I think I'm guessing the picture. Slowly guessing there. Thank you. No, no problem. No problem. Yeah, but Jupyter notebook, it's a, yeah, I can actually do it later. No problem. And I'll talk a little bit about our platform. There is actually a question there. Yeah, yeah, of course. And I guess the question is about the intrinsic capabilities of some NVIDIA cards. They say, wouldn't having the NVIDIA A100 card able to slice multiple profiles up to seven Jupyter notebooks? And I suppose that's the capability of the NVIDIA thing, but is Kubernetes going to be able to do that? So that's the question for you, of course. Sure. No problem. When we talk about fractals, you can let me just, you know, I'm going to go to the fractals. Okay. Our motivation when it is, the first motivation is to use any GPU. So imagine you can have, you can slice or fraction a GPU. It doesn't have to be A100. You can even use different types of GPUs. These are motivation to be open, use any GPU and fraction it. When we talk about MIG, you can have MIG profiles. And we can do dynamic allocation on the fly for dynamic MIG profiles. You don't need to read with a server or something else. And yes, you can run, and we can, you can even say, hey, you know what, I don't want to use MIG profile at all. I want to use one AI fraction. We can slice the A100 using this method of fractional. So even, you don't even have to use, maybe you can use, let's say 10 up to 10 containers in one GPU. So we give you actually the flexibility, or you can use a MIG or you can use R-Solution. You can really, it's going to depend, we understand MIG, we understand MIG profile. So when you run jobs, you can select what MIG profile to run. We can do it dynamically, dynamically on the fly. So you have actually the decision, but the first motivation was, what happens if you want to run fractional GPUs and you don't have A100? We want to give you the opportunity actually to do it. A bit of what, actually a question, because I can't see exactly the chat or... I'll ask Andy and if that answers this question. Okay, we can have it in the meantime. Yeah, we can read more here. Okay, so now I want to talk about more around a deep dive about how our smart schedule actually works. And the first term that we have is called guaranteed quotas. Guaranteed quotas is a way to provide quotas for teams. So let's say each team has, we're providing, I have a system with eight GPUs. I can provide four GPUs for each team, for example, but it can have an option saying allow over quota, which means if there's available GPUs in the system and the other team is not using it, I want it on the fly to give it to the team who needs a GPU. I don't want to keep idle GPUs in the system. So we take care of the queuing and we take care of the GPU pool. And once the team is back in business, so let's say I'm using eight GPUs now for teammate, and now team B is back in the office and now they want to run some jobs, that's where the fairest algorithm will kick in and say, hey, listen, I want to take four GPUs back to team B that are back in the office. So I'm going to preempt the jobs, put it in the queue till we have more resource in this dynamic allocation, what makes the system utilization much, much higher. And again, everything's on the fly. Zero touch, the researchers don't need to do anything to just run the jobs. And I can show you a quick example. I hope you like the animation, but I'll show you a quick example before a live demo, how it looks like. So in this example, this is a quick example of we have researcher A and researcher B, they're using a system and this is the run I scheduled. And we can see that two jobs are running for user A and two jobs for user B, and we promised each user guaranteed quota of two GPUs. Okay, but however team A decided they want more jobs, but right now they're in the queue. So team B is on vacation or they came to listen to OpenShift Cafe with an AI, so they're not doing any jobs. So we have two additional idle GPUs now in the system. And remember, we have two jobs for team A. Although we programmed two GPUs, we now gave them four. The reason is because we had three GPUs in the system and they can go over quota. And now that they're going over quota and being more productive, they can run work workloads. And this is only available with run AI. We want to make sure they run as much workloads as possible. And now team B is back in the office is, hey, I want to run a job and you promise me, I'll always get two GPUs. So what will happen is we'll take one of the jobs, we'll put it in, we'll make it work at a GPU and we'll preempt one of the jobs and put it in the scheduling in the queue system of run AI. In this scenario where everything is dynamic and the researchers just get their guarantee quota and they get over quota when needed, it's what makes the performance, because our motivation is that partitions will be as well as the ability to run more workload, more productive and finish on time. So this was a quick example of how our schedule work in terms of preemption. I haven't even allowed that one because you have to get the possible, but I want, this is important for me just to emphasize how our schedule work and how we provision GPUs between users. And again, this is dynamic. This is the additional layer of the run AI schedule we added to the system. And now I want to talk about the architecture. I tried to make it more open shift friendly. By building the layers, I looked inside open shift and I feel free to ask any questions. So this is from the deck of Red Hat OpenShift that we have the operating system. We have Coral West, we can run it literally anywhere. We have Kubernetes, which is administration. What we do in run AI, we have an operator that you can actually, it's a different operator. I'm going to release another one soon. And we added additional scheduler that is side by side with OpenShift Scheduler, which is Kubernetes scheduler. So this is the first layer that we actually need. And we added another pod that communicates with the GPU. This is some of the secret is how you can provide fracking on GPUs. It's another layer that sits in the middle of communication with the, communicating with the GPUs. And in terms of run AI, we need the no discovery operator because we want to make sure that we know how to communicate with the right node and we need the NVIDIA GPU operator installed before installing run AI because the way we work, we communicate with a DCGM exporter, which is a layer in the NVIDIA operator to get all the metrics and data from the GPUs. We can show it in our GUI, we can show the utilization based on the current NVIDIA GPU operator. And once you have those two prerequisite installed, you'll get to install run AI and install our operator. So this is the first layer in terms of the integration of our scheduler system when you install it in OpenShift. And the second layer that we installed is the Atlas platform, which is the whole GUI interface to give you a cloud-like experience when you work with run AI. So you can use our GUI to submit jobs or CLI. Or again, we keep it open in terms of integration. And because we're an operator in the system, so if you scale, if you add more worker nodes, we will see them at the system because when you add or scale a worker node, the GPU operator kicks in, it will install the driver, whatever needed. We see it as another worker node and you'll see it in the GUI interface. And the last bit is you can select integration, how you want to run whatever jobs is needed. You can run Jupyter Notebook. You can use Interactive like VS Code, Qflow, inference. A lot of those naming were open, whatever you want to use. And again, if you're using an operator that's using Jupyter Notebooks and a lot of Qflow, not a problem. You can integrate fully with us, so with RunAI too, you can use the scheduling. And what we see in the field is many people using all those tools, which are great. And then they add RunAI to help them with the resource management layer. So that's how we work, that's how we glue things together in terms of the integration. By the way, are there any questions because I cannot see it in the chat? I'll read a couple of questions. I have one myself first. All those workloads that you mentioned there, do you need to make any changes to the images, let's say of Jupyter? Or can I use any anyone? Can I use any that come with my own tool set? Let's say I use Qflow. Do I have to inject something into those containers to run it and manage it? This is perfect. This is a great question. You can run whatever Jupyter notebook you want. When you run the job, you just have to select how many GPUs or fractional GPUs you want to use. And then when you create the pod, it will download the image and then we will give it, we will show Jupyter how many RAM or how many GPUs it will see inside the pod. So when you run the NVIDIA SMI, you can actually see the output. And you ask a very good question around Qflow, for example, all you need to tell Qflow in terms of integration is use run AI scheduler. There was a place in Qflow, and we have a lot of documentation around it, you can add to, hey, use run AI scheduler when you run the jobs. So you can still keep enjoying Qflow as you use today, but when you run or submit, it will kick in the run AI scheduling system. So the integration will be fully, will be easy, so the decision is going to keep working. And at the end, maybe I can send in the chat links to how to glue together all those, yeah, I can show that. Absolutely, I'm going to copy them. There was a question from Dwayne, I admit. I'm not sure I understand it myself. I'll read it for you, Dwayne says, I thought for HPC, the parallelization of data was a major task before GPU application. Is there a container nest practice for this? Personally, I don't understand the question. I'm not sure if I need to ask Dwayne to phrase it, or maybe, can you read it yourself, or shall I? Yeah, you can, I can stop the screen sharing, and I can try to read the... So that's the, you can see it here on screen. Okay, I want to see, if I understand the question correctly, so there's the data preparation before running the training load, and the second thing, if you want to run multiple, you call MPI, which means multiple process, you can run each node, each part in a different node to have a multiprocess work. So this is something as well, and then there's the training. But yeah, I mean, the data scientist should prepare the model and all the algorithm inside. We provide the resources for it. But if needed, we can take it offline. Dwayne, if I'm gonna have no problem, if we need another part of the presentation, just understand more later. Yeah, yeah, yeah, sure. And, Erez, did you... I don't see your screen sharing anymore. Did you want to... Yeah, I stopped it just for the... Did you have something like a live demo to show us? You want to complete the presentation first? Yeah, okay, can you see my screen? I can, I can now. But you need to go back into the screen. Can you see it? Thank you. Okay, so yes, I will show you a demo of our system. I prepared some stuff for you just to walk you through. And, you know, doing an OpenShift, this is something we have to show. Because RunAI integrate with IDP of OpenShift, so when you're still running AI, you can actually log in using OpenShift credentials. So this is just an example of how it looks like when you log in to, I hope I have the right password. So once you log in, RunAI credentials, you can actually go into the GUI interface of RunAI and you can actually start working with the system, Red Hat Cafe, it's a cluster I created where you can see the GPUs and the resources and everything. But I want to show you a different demo because I want to use a more robust system that I created for the demo purposes. Tell me if, okay, is the screen big enough or you want to increase the front end? Probably a bigger front end, I think. Bigger front end, okay. Better now? Yes. Okay. So what we see here, this is a live, let me just increase, can make it be a bigger. This is an example of a running RunAI system. I can see that I have two nodes, I have a total of four GPUs and I can see that four of my GPUs are currently allocated and I can see how I have Team B and I have Team A. If you look at the project, the project in RunAI, we call it, it's like an open shift project. It's a project with steroids because we add a little bit of more annotations inside the namespace to give a little bit of more capabilities so RunAI can actually read from it. And what we see here, we have Team A and Team B and we assign each team two GPUs, okay? However, if you look at the project itself, we have GPU over quota, which means if there's idle or free GPUs in the system, I want to use them, although, but the guaranteed quota is two for each team. If you go back to the overview, we can actually see that one of the team is using more than two GPUs. Team B is using three GPUs and the reason behind it, because we have GPU in the system and Team A is using only one. So what I'm going to do now, I want to spin a job for Team A and I want to see how the system automatically kicks in the fitness algorithm and balance the system. So each team will have two GPUs. And the way of doing it, again, we have a CLI interface and we have a UI. You can select how you want to submit jobs and you just see that I'm kicking it in the right place. I want to kick it to Team A. So I'm going to submit a new job now for Team A and this interface is for the researchers where I can select how many GPUs I want and what's the name? Let's call it quota back. I can select, this is just an example of POD that we have that loads the GPU. I can have resource allocation, storage if I'm using OCF anything I can put in storage class on FS. I have a lot of flexibility and again, everything is scriptable and available in YAML and you can even pull in get repository if you need for your POD. You have all needed tool sets you can easily train your model. And I'm going to submit this job now and what will happen, the system will actually pull the image and try to run it in one of the nodes. And you can see here that we have Team B with the three GPUs, Team A1. First part will be kicking it inside the pending state and say, okay, what decision should I make? I have a POD that is that's run in one of the teams and I want to make sure that it's fully balanced. Wait a few seconds for it to kick in. So you can see now Team B already, one of the job is preempted to a pending state and the state, you can add webhooks to make sure that the system will keep track of where it stops. So the next time you run the containers will continue from the same place. So this is done automatically and now we can see that system automatically preempted the job. In a few more seconds, Team A will get the additional GPU to make the system balance. And here is example of how the fairness algorithm actually worked. And you see one of the job is pending and now they're balancing again. The researcher didn't need to do anything. He just ran his job and we took the heavy lift and shipped for him. So now Team A is more productive. We can allocate more GPUs are allocatable. We want to make sure no idle GPU is in the system. And this was a quick example of having I'm just going to leave some jobs. I want to show you the Jupyter part. But this is a quick example of how the system runs in a very fast way of preempting and dynamic location, which is only available today in OneAI. And before I show you Jupyter, we can do the same thing. Imagine you have a system with much more GPUs, for example. And now you have different project organization or maybe each organization needs a different type of GPU. And you can see that only organization C can be, for example, over quota. But the rest, I don't want them to be over quota. I want them to use AGPU. So if I just load the system, this is a script to just run jobs. It's going to show you the syntax how it looks like. But in this example, it's more to show you how real system works when you schedule a lot of pods to the system. And it will kick in in a few minutes. And here you can see how you can govern and make sure that each group is not using more than your resources. You have a lot of plays for it to work. Fill it slow. I want to show you the Jupyter part, how the Jupyter notebook looks from a GUI interface. So this was what we did. We kicked in a training job. But this time I want to kick in an interactive job, which is a Jupyter notebook. This is an example of a Jupyter notebook. Again, it's not something special. It's just a Jupyter notebook. And I can actually select, I can say, either a fraction of it or a full GPU. I can actually really decide. And when I kicked this job, we understand it's an interactive session. We're doing some of the jobs because we don't need them anymore. And we'll wait a few seconds for it, for the notebook. And we can see it as it's going to download the pod. It's searching for the right place. I can see the logs. And we understand, you can see here, it's an interactive session. So we treat it as an interactive. Let's wait a few more seconds until the pod is actually that you're running. It's now pulling the image. We do it from a private registry. From your registry, we support generally everything. So when the job is tagged as interactive, what happens? You can say that the job is allocated a fraction of a GPU. If you have a team, and let's say you have 10 researchers in that team, and you've given them, I don't know, five GPUs, would that fraction allocation happen dynamically? Or each of them will have to choose when they start their Jupyter notebook. What fraction of GPU they were allocated? Yeah, when you run the Jupyter notebook, you need to decide what fraction of the GPU memory you want to use. It's not, it's not automatic on the fly. You need to decide, the researcher needs to decide how to do it. However, we're trying to add more capabilities of how you can govern the system with some admission controller and stuff like that. But at the moment, the researcher is responsible for the fraction, sorry, for the fractional GPUs. My other question is, I suppose you can extract those sorts of metrics from here. And I imagine, and I'm thinking of a project that was done where you have a really large number of several teams, many researchers belonging to different parts of the organization. Is there a way to extract metrics that would allow for chargeback? Yeah, yeah, it's a very good question. I think I need water now, I'm out of coffee. Yeah, that's perfect. Yeah, we do support using, actually in the next release, we were going to add a building dashboard where you can actually, which will be a much better looking field, but at the moment, you can actually decide and select data in terms of projects, jobs, you can have API calls, and you can actually carve down whatever information you need from a researcher or from a project and you can make more decision and billing metrics out of the system. It's something that is supported, API calls, you can get all the data, again, it can be used behind the scenes, Grafana, Prometheus, where you can actually get all the data. Yeah, you can get all the data just, but we're adding now more tooling, we're adding to the GUI interface, so you don't need to do some API calls yourself. But if you want more information that is not part of our GUI, you can actually curve and actually make, and some of our customers actually use it as the metric between departments. So they can actually see what departments need more GPUs and actually carve it down. This is a demo system, I can show you a real live example if needed, but here it's just this data. Yeah, so if I'm going back to the system with all of my GPUs, we can see that each organization now has a different amount of GPUs being used, and I can, organization C can actually grow to more GPUs, I can add more job to it. If I enable the over quota, I can actually add even more workloads. So I'm giving the capabilities to the organization to decide what teams need, what type of amount of GPUs. In additional, you can select which GPUs go to each team. Imagine you have very expensive GPUs and you have T4 which are less expensive. You can decide that maybe the students in the university will use the less expensive GPUs, but the PhDs, they're going to use the P100. So you can actually govern what GPUs of the nodes you can actually use in the system. And this can be done in terms of the node of 50. You can select, you can create node affinity and select which GPUs you want to give to each project, just to give you a little bit of more control over the system. And again, we have everything is scriptable. Everything is CLI. Everything is, we try to make it as easy as possible for the researchers and for the administrators actually to give and provide jobs to users. So we have another question. And I'm actually not sure if this is the scope of run AI. I can try. Because the question is around management. So I'm reading the question to you. How intentional is the data provenance recording whereby a data set can be shared and validated for reproducibility or trading? I guess this has to do with how the data sets of manners are used by the workloads. But I don't know if that's the scope of run AI at all. Yeah, if I understand the question, there are a lot of tools like question biases and Jupiter workspaces and a lot of third party software or bonus software actually give you that capability of tracking the data and tagging it and all the data sets. There are some solutions that are not part of run AI. However, because we are fully aware of what's happening in the field, we listen to the customers and see if it's something we should add in the product. And the integration is super important. But if you have data set that you want to use, you can definitely look it up and understand better. What does it mean in terms of the cycle? Because some have CI, CD pipeline where they have training and then they have inference and then go back to training. So we do support those pipelines because they're adding as a scheduler or as an inference because today we have more capabilities. But yeah, usually today third party software actually is in charge of the data as of today. So essentially it's being able to track what runs are using what data sets so that you can reproduce the same run, making sure that you know exactly what data set has been used for a specific run. Yeah, I mean, basically we have when you run job and run AI, you can, again, if it makes sense, of course. Jobs. Because we have all the information you can do run AI, you can actually describe some of the jobs and you can add some information what it's actually using. You can actually make a YAML file out of it and you can get some data. Again, if it helps, it's the train one. So consistency. And you can get some info. Again, if it makes sense, there's a lot of information, we can see what the command line or what data set you use or what NFS path or what storage path or you can actually script something if you need more information how the pod is using but at the moment it's not part of run AI. Again, if it makes sense, we can help you build something around it but at the moment it's not a part of run AI. Okay, there are more questions on the chat. We do have another four minutes to go. Yeah, so I'll give you a bonus, something bonus. Yeah, okay. Because we have to say something about open source, right? So we have run AI top or we have run AI top which is actually a tool is to give you information how your current GPUs are being utilized. So we have a GitHub page, we can actually download this tool. You can run it everywhere inside of pods, machine machines, whatever, and what this tool is doing, doing actually two things. It gives you, in terms of the CLI, you can see how the GPUs are utilized in terms of the node from the node level and you can see which GPU and you can get information if you want to see it on the fly. And the second thing, which I think is a very cool feature, you can export it to a file and actually look how over time the total utilization of the cluster. So you can see maybe it was 50% utilized or maybe it was utilized more by once server zero but wasn't utilized for server five. And this tool can give you a little bit of information how your current system is being utilized, it's open source. We have more tooling in the future, so be tuned. And this is something that you can definitely be part of. We'll send it later in the chat and just get a clear understanding how this system is used today. And it seems that we have a lot of tools, open source tools, actually. If you could catch and paste the GitHub, the URL for the GitHub repository and then put it in the chat when you have a moment. There you go. I'll try to do it. Yeah, try to do it quite fast, live as we speak. Yes, no worries. And you know what? I tried and are you sure it works? Let's see. Yeah, maybe there's a typo there. Because it says page of time. All the testings. This is the right one. Yeah, no, no, it's good, it's good. This is, yeah, live. Try this one. Okay, there was a missing. Yeah, yeah, the good clone command. It might work. Yes, I'm going to put you in the chat for everybody to check it out. And also, if that's okay with you, I'm going to put your email address in case somebody has was actually with you. And I guess that's all we have time for today. Any final comments, Eres? Other than thank you so much for joining us today. No, thank you. I mean, I feel at home, you know, I feel at home here. And thank you for helping organizing this thing. And I can give you, you can put it in the page, the RunAI webpage. If more information, feel free. It's needed. We can open a Slack channel if you have questions. Feel free to, you know, involve us. We have a big production team who would love to listen to some of your work cases. And many thanks to those who joined us today with those also questions that I admit, sometimes I didn't understand. And I'm not an AI expert by any transmission. Also wanted to say that coffee break goes on holidays as well for a couple of weeks. And we're back on the 24th of August. See you all in the 24th then. Thank you guys. Thank you. Bye.