 Welcome everyone. I know that we are closing towards the first day of the conference and it's been wonderful. In fact, it's my third talk in the entire day. So it's been a bit tiring, but I'm so excited to be presenting here at Multitencicon. So our topic is embracing Multitenc for scaling MLOps. So we know that we are essentially living in a world of large language models and whether you accept it or not, certainly every major company is looking at adopting machine learning in their use cases. And we'll be looking at some of the biggest issues that currently exist in the regular DevOps systems that exist today for most companies and how you can leverage Multitenc and some of the other examples that I'll be portraying today with the help of open source tools. And that can actually enable your teams to become better at handling large language models or even just more difficult to manage machine learning infrastructure that your traditional DevOps systems cannot handle. So I'm Shavaya, I'm a developer engineer at Miliserts and my co-speaker who could not come today, Shavanshu, he works at that trade but because of visa issues could not make it up, but I'll be presenting our demo through a recorded video which he is able to record. So of course, just for those folks who might not be aware because this is a Multitencicon and it's not a machine learning conference, we do have the AI day but just to give a very quick recap of what essentially happens in an end to end machine learning life cycle. So you start by taking some data and you train it and you of course then start a training cycle where you take that processed data and try to create certain modern artifacts, model artifacts and then you first validate whether the performance of the model is good like where it's giving you accurate results and if it does then of course you push it into production and then it's all about like scaling it similar to how you scale microservices. So of course all of the different steps that are involved over here, one thing to keep in mind is that ML is not that different than standard software development. Of course the end goal is still to be able to scale up your services and that is why we can use the same or similar multitenancy concepts that we usually employ for our standard DevOps systems but also for MLops and we'll see why is that reason in today's talk. But of course like once you scale it you will do a QA that means you're testing your model with some test data to see whether the model performs well or not and then you'll deploy it and of course then you'll monitor like how you'll monitor any traditional service with Rometheus Grafana you can do that very similarly to monitor though your model performance and if it does not work well then of course you kind of reiterate and follow some of these different steps inside of your standard machine learning process. But of course one thing to keep into consideration is that data and machine learning infrastructure does not always scale well across teams because machine learning is very compute intensive so there are certain considerations that you have to employ in order to be able to deploy machine learning models efficiently and train especially like large models that we typically see today as most models are like gigabytes in size. So that architecture that you'll employ for like let's say running your machine learning models that typically would require you to have GPUs might not work well with some other pipelines and it might just actually result in breaking some other teams pipelines if you were doing that because there are some special constraints that you have to put in when dealing with ML infrastructure and of course because nowadays we are dealing with a lot of complex models that means we require dedicated infrastructure teams your regular DevOps cannot or the DevOps pipelines that you're running might not be able to work with these large models that we typically see today. So you need dedicated hardware specialized set of folks who can handle these large models and training them and then of course deploying them because it requires more compute it requires more cost it will take a bit probably a bit slower to be able to run these processes. So you need dedicated CPU memory provisioning you need multi-tenancy in place so that you can have multiple of these different services running together in a single infrastructure and independent of each other so that and also of course you might have to put in dedicated resource allocation for different teams because there might be certain ML processes that do not require that amounts of that much amount of compute whereas something like training might require a lot of compute. So you need to be also be able to put cap or dedicated resource allocations for specific ML tasks. So of course when we see all these different things as we are kind of expanding there is definitely a need for us to be able to not just use the traditional DevOps architecture that we usually follow and we need some specialized tooling for that and of course the biggest benefit that one can achieve by introducing ML ops within or introducing multi-tenancy in ML ops I think the biggest one is the cost efficiency now generally the if you kind of look at a machine learning company right the data scientists and the ML ops engineers will basically work in silos so the data science team is primarily just involved with training models and going ahead and that's deploying the artifacts and they don't they don't care typically about hey like what will this model look like in production right because you need to make certain optimizations to these models before they are pushed into production you cannot have like very large 200 megabyte sized model or 200 megabyte size model and just deploy it as it is so typically like the ML ops teams or the DevOps teams or the infrastructure teams are are kind of in a catch 5050 like where they are now just I mean they cannot directly just deploy this model that has been given to them by the ML ops by the by the data science team they have to make certain optimizations so there is this distinction that we are making and it's very difficult for them to collaborate together with each other so why not like have a cost efficiency by having a single platform where the data science folks and the infrastructure infrastructure teams could actually work together so that would also reduce the complexity and the cost as well to require different stacks or different technologies being used like why not have this one single system where you could actually do or handle all of this at the same time of course that is a direct consequence where you'll also save time because there'll be direct correlation between the teams that are operating for the data science for like building the models and then of course deploying these models and of course you will also get resource optimization because you are saving up cost by not having to deploy dedicated services on different type of infrastructure whereas you could manage everything through a single platform and of course the biggest consideration that we want to keep in mind is that when we are specially deploying these multi-tenant AI ML systems we need to ensure that we are able to properly distribute and orchestrate them at the same time right so that is why I would want to probably like just introduce the concept of orchestrators over here so of course as I showed in the first slide that machine learning is really all about these different tasks that are essentially happening one after another and of course you're just transforming your data from one format to another so you start off with taking your input raw data then you pre-process it to basically remove any errors and then you train it so you get some model artifacts and finally you then you go ahead and deploy that model artifact and use it for making any you know predictions so all of that so that is like the orchestration orchestrators basically are meant to picture where essentially orchestrators allow you to coordinate your data flow from one step to another and of course each and every step in this compute is logically you know linked with each other similar to how we have in MLops so of course like any machine learning cycle would fit in very well with the orchestrator and of course orchestrators are also very great because they help you to understand that how much units of competition you actually need for each and every step inside of your orchestration so in this case especially for ML we need this very urgently because of course not every part of the orchestration step will require the same compute and the same cost there will be certain parts of your entire workflow or of the orchestration which might be more heavy and might require more compute right so they give you a lot more data about how you can manage costs and how you can manage the compute and assign it to the dedicated parts of your orchestration and of course you get to know how your data basically flows from one part of the orchestrator to another what is the type of data because of course whether or not like you're dealing with machine learning data or with non machine learning related workloads you need like strongly typed data as well and of course like what dependencies will be there that you might have to deal with right and of course like how much resource are you putting into each and every different step of your entire process so that's like something that you have to keep into consideration so that is why I like to introduce flight so flight is an open source production grade orchestrator tool it's similar to kubeflow if you have heard and the origins of flight actually started off as airflow at lift but they decided that lift like airflow had certain you know I mean certain issues especially with machine learning use cases so lift wanted to create an open source product so it's essentially flight was brought was born out of air flight airflow and that is how flight is basically born and it works pretty well with both machine learning and data use cases so over here like there's one example where you have your entire like machine learning code and you basically define tasks and workflows and I'll come back to that in a bit but essentially you're defining these tasks and workflows and these workflows are again logically categorized in a sequence you can orchestrate these workflows as much as you want because flight is basically built is a Kubernetes native platform so that means you can very easily scale up any part of flight very easily as you want so of course if there are certain workflows perhaps for training with where you might want more resources to be added so you could very easily just spin up new pods and get a resource like let's say mode CPUs allocated to that particular pod in a community cluster and assign it to that workflow for your training right so essentially you basically package and register them and essentially what happens is that like these are the simple python modules that get packaged in images and then you deploy them to let's say an s3 bucket and then you can just run them right and of course they are executing inside of your container and in your pods so it's very efficient in terms of being able to scale it up because it's based on communities now of course a task is kind of the smallest building block of you know of flight so think of task as like something like if you consider a very simple machine learning example so you could have one task for just your fetching your data set so let's say your data set exists on a remote s3 bucket and you want to fetch that remote data set so there will be just one task that fetches your data there could be another task that just does some pre-processing of your data there could be a task that does just the training of your data so it's a the smallest atomic unit of work now what are workflows workflows are essentially logical grouping of your tasks one or more tasks can be there inside of a single workflow so you could have like one one workflow that is probably probably being governed by like let's say the the data science team so for them a workflow could have multiple tasks so fetching the data pre-processing the data and training the data whereas there could be a separate workflow that is just specific for the infra team for deploying the model so you essentially logically group together these different tasks into these workflows and the great thing about flight is that you can now scale up these workflows as you want so here's an example where you kind of define you know the first task where we are just taking a pandas data frame and we are doing a multiplier on top of it then we have another task that is to find that role spent and then the workflow is basically logically grouping these tasks together right and we'll be seeing in the demo right now with a proper machine learning use case how this basically works of course and the biggest point now comes to find that how does flight basically embrace multi-tenancy right so flight has this concept of projects and domains so projects are essentially logical grouping of different workflows that exists you know independently of each other so let's say if you wanted to have two separate teams in this in this particular platform so it would basically be you know your data science team that's mainly working on the model training and you have the MLOps team that is working on just the model deployment so they could work individually in silos and not have to worry too much about you know like they could basically work in isolation that's one of the biggest points in in multi-tenancy so you're getting these logical groupings and built-in isolation that you get with these with the products and then domains allow you to basically just run your models in any environment so it could be in development staging and production you get proper immunity and proper isolation for this as well and now like let's take a look at a demo so let's quickly take a look over here thank you Shivay hi everyone this is Shivayashu thanks a lot for coming let's get started with the demo so let's see how resource training would work in a multi-tenant setup for example in a multi instance setup there is no task sharing no data sharing but if we introduce a task or pass data then we can probably share the result of different tasks we can share the task we can share the database used by different teams so under the hood we need an architecture which is which is extensible meaning we can run different type of ML programs they might be using different libraries by torch or tensorflow so it should be extensible and there should be a reliable retrieve mechanism and work a pool to re-execute the tasks which are getting filled so this is how the algorithm of type looks like but we probably don't want to dive deep into it so let's see how resource training would work in a in a setup where we are trying to enforce multi-tenancy where we are in multiple ML models so let's say team A is running a workflow where there are different tasks and then team B they are also running some workflows and they are sharing the same cluster and there could be a case that they are running multiple workflows and then they need to define a project and then there are again it's all sharing between the same cluster so let's see how we can do that with the help of flight so let me first just show you what is the workflow for team A so in team A I have defined a task this is a simple get data task and then process the data and then train the model and the workflow is doing everything sequentially similarly for team B it's a similar type kind of thing they have the get data thing process data task and train model task and then the workflow is doing everything sequentially if I want to run everything in a single cluster and kind of isolate the different teams the work of different teams I can create two projects so for team one I am creating team A I am creating the project team A demo and for second team let's also create a project for them and let's register the individual workflows to associated teams so team A workflow would go to team A project and team B workflow would go to team B project we can see the registration successful so if I go to the UI I would be able to see demo data science and demo demo data science and demo team mlops perfect now what I want to do is I want to run the individual workflow so for team A I want to run my workflow I want to run all of my tasks can just do a pipeline run and then the execution would be available so team A belongs to demo team mlops sorry team B belongs to demo team mlops and team A belongs to demo data science so we need to go to demo data science and development and see if the workflow is launched so there's a workflow which is registered and it is running and the get data is succeeded process data is succeeded and train model is running so if I take a look into what get data is doing so get so in the get data there is no input because it's just fetching the data but after getting the data it is dumping the output in s3 bucket similarly in process data there is an input from get data which is using the same s3 bucket and it is it is dumping the output to another s3 bucket and now my train data model will be using that s3 bucket so this is how all the tasks are linked and if I want so if another team wants to share the output of let's say the task to they can do so by referencing to this s3 bucket directly or they can reference the task which I will show you that demo in the demo so this is how the flow looks looks like first we start the workflow the first task in the second third task in the third task runs in a sequential manner similarly for similarly for data for the second team that is team mlops I can run a similar command and I would be able to see fall demo team data science sorry team data mlops similar workflow which is again running get data is running process data is running and train data is in waiting state because process data is running so this is like the simple workflow where every team is executing independently in a same cluster under different projects and different workflows if I want to do task sharing meaning whatever team A is doing I want to reuse the results so as a team B if the task A is successfully run I want to use the task itself so this is how I can do task sharing and to actually do that what I need to do is define a separate workflow so let's say I want to use the task get data task from team A which we just run demo data science so we need to go to so okay for the mlops all the tasks are complete so it go to team A again team A is demo data science demo data science and we would again see the execution of the workflow and we okay so I want to reuse the get data to replicate this scenario I want to use the get data task for the another project for another team so I want so what I can do is I can reference the task from from team A essentially like this team A demo development team A get data and I need to define the version which is this version so that I can exactly reuse the task A and okay let's try to run that register the workflow okay it's successfully registered and then run the workflow okay so if I now go to team B which is mlops and this is the latest run and if we look at the execution of the workflow there's one task though this is team B so this is team B but the task it is using the team A's task basically we have referenced if I go back to the code since we have referenced the task from team A we can just reuse the data we can use everything the task and the inputs and the outputs in that sense it's kind of orchestrating tasks between different teams and it is giving us a resource optimization so another use case is how we would create resource isolation restricted access restricted access between different teams as by creating some RPEC rules given these RPEC rules so if I would show how would I do that okay so before we dive into that let me see what kind of pods are created so for every task that I've run there is a individual pod created which is now complete so if I describe a pod I can see that a lot of information is attached the domain is attached execution ID project name and the name space itself it contains the project name and in the branch developing staging basically the domain so we can use this information to actually create the RPEC rules and the rule bindings rules and rule bindings so for team A I can define the name space and I can create the required role and the corresponding role binding would be the user from team A can only access the pods available in the same name space and similarly for team B I can do only the users in the same name space can access the resources of the same name space meaning with the help of name space isolation and providing and by providing RPEC rules I can create resource isolation between different teams the third thing fourth thing that we want to see is how GPU and CPU requirements can be satisfied for different tasks so for example and then MNIST training example and get dataset task I don't need a GPU but while training I probably need a GPU so I can individual task I can enable and disable GPUs and CPU as well I can also define how much GPU and CPU is required for that particular task specific resource allocation for individual sources so essentially again being able to define how much specific workloads require specific amount of compute and memory so that FML storage and what type of you know runtime you want to use either CPU or CPU can be defined for individual tasks and again specifically for individual teams that's with the help of the projects so that you get native multi-tenancy out of the box for these machine learning systems but with that I'll basically go ahead and conclude my talk so mainly the point over here was to demonstrate that how you can use multi-tenancy and how essentially you don't need your DevOps teams and your data science teams to work in different you know different infrastructure or use different tools all of them can work on this one single same tool and still have this proper infrastructure in place and resource sharing that can be very efficiently managed with the help of something like an orchestrator tool like kubeflow or flight so you can feel free to scan this QR code to share any feedback and you can connect with us on twitter as well we have shared our handles thank you so much