 Good morning and thank you everyone for joining us today. I am Selvi Kadirveil and I'm the engineering lead at a lotl and I'm excited to be presenting our work to you. I'll let Chris introduce himself. Yeah my name is Christopher Newland. As of recently I've switched over to being a technical marketing manager for AI. At Red Hat prior to that I was a cloud architect working in emerging technologies specific to AI. Thanks Chris. So with my experience with Kubernetes I've been a software engineer and a tech lead in the Kubernetes management space at a lotl, Cisco and ContainerX before this. Prior to that I was in the field of AI ops which is using AI for improving infrastructure management systems. The session will be something different. It'll be using Kubernetes infrastructure to run your GenAI pipelines. Okay we'll begin with describing a core component that simplifies orchestration of your GenAI pipelines on a multi cluster, multi cloud environment which is a fleet manager. We will then focus on what is unique about a healthcare GenAI app and how a fleet manager can help with this. This includes the dimensions of data gravity, a hybrid cloud approach and a lift and shift placement of your workloads. We'll end with a detailed demo of a drug research Q&A service that we've built on multiple clouds. Okay so what is a fleet manager? A fleet manager helps orchestrate your workloads across multi clusters and multi cloud Kubernetes by providing a single API endpoint that abstracts away your multiple workload management clusters that are behind it. It can also be called as a multi cluster orchestrator. The key difference between this and a typical cluster management plane that you use as you use your cloud providers or your on-prem clusters is that the core center stage is your application lifecycle rather than your cluster lifecycle. A simple mental model to view a fleet manager is to think of a fleet manager as to Kubernetes clusters as just Kubernetes is to your nodes. So if you were submitting workloads to a single cluster you do not want to know how many nodes exist, what are their resource availabilities, what are their current capacities and so on. This is exactly what a fleet manager aims to do for your group of clusters. Previously the CNC of landscape can be quite expansive and you might have heard a project called Kubernetes Federation. This project waned in its growth in 2020 but in the past one here there's been a resurgence of a number of fleet managers. This includes my own company Allotl's product Nova, Kermada, KCP, Kube Stellar, Open Cluster Manager and so on. Even within cloud providers you now have fleet managers. If you are using your console to create clusters on AKS and GK you'll see the availability of adding your cluster to a fleet. Now that we have a high level overview of what a fleet manager is let's dive into its details. The capabilities of a fleet manager can be divided into day zero ops and day two ops. Within day zero there are three different use cases that fleet managers help satisfy. The first one is the static placement of workloads to clusters. This is the most obvious use case. So if you have a customer facing front end application or a web service you would want to be very specific on say an AWS cluster in US Region 1 in which it has to be placed and fleet managers can help you specify this through what is called as a placement or a scheduled policy. So fleet managers extend the Kubernetes API to add a custom resource in which in this case is a scheduled policy. The second use case is dynamic placement of workloads to clusters. Say for example you have a team of data scientists who have a bunch of ML training workloads that need to be placed on any one of your platform teams dev clusters and you have five of them. A data scientist does not want to know or care about which specific cluster they need to use at any point in time. So the schedule policy defined by the platform operator enables them to make this mapping through what is called as a capacity or availability based scheduling policy. This says that, hey, place my workload in any one of the dev clusters that has the CPU, memory and GPU resources I need. The third use case is the standardization of your workload clusters. Your managed clusters have a number of commonalities in certain dimensions and differences in certain other dimensions. So for example, the monitoring stack, the metric stack and logging stack might be something you want to unify among your clusters. In which case, a spread scheduling policy enables you to capture and duplicate the same resources on all your clusters. In certain cases you have specialized inferred needs. So certain application teams within your BU might want to use a different service mesh. Say for example Istio and a certain other products and microservices want to use say Lincard or Scupper as their service mesh of interest. In which case the spread policy can be applied to subsets of your fleets, which is called fleets within fleets or clusters that are labeled together through certain topology keys. Finally, in addition to these inferred needs on your clusters, you also have specific application needs, such as secrets and namespaces that might be common to a number of applications. In our specific AI pipeline, we'll show you how we use spread policies to capture the same resource needs and prerequisites in both the source and target cluster before a migration. We're moving on to day two needs. So your application workloads are now placed on the clusters that they are intended to be, but a number of changes occur during operation. For example, your ML training workload, which was originally set to operate at 10 replicas, might have new data sets coming in, in which case the current target cluster might not be sufficient. So NOVA or other fleet managers can evaluate these conditions and automatically reschedule your workloads to a new target cluster. This is a change in the location of your workloads based on resource needs. There's also workload migration in which you intentionally, as the platform operator, want to move your workload clusters. This is referred to as workload migration. Then you have a third use case, which is on demand or just in time clusters. Say you have a static set of 20 clusters, but you have an incoming large workload seasonally or once and occasionally, at which time you want to bring up a cluster based on a golden template that you already have within your system. So this would be a new cloud cluster that comes up, runs this workload and then goes back down. So just like you have auto scaling within a cluster, this is auto scaling at the cluster level. Okay, so now we go on to a particular Doug Research Q&A app that we developed based on some of our engagements in the field. A typical AI pipeline will have a number of different components starting from data preprocessing, model training, model fine tuning, data embedding into a vector database, model inferencing, model evaluation and many more. We're going to use only a few of these components that were most relevant to our health care app. The first piece is an event trigger for data availability. We wanted a system in which the data ingestion part of our pipeline gets invoked as and when new data sets enter the system. We use a product, a tool called Kader, which stands for Kubernetes Event Driven Auto Scaling to start and start our AI pipelines. The second piece is data embedding. This is the conversion of our input data into embeddings that can be then stored in a vector database. We use, we use a hugging face and line chain components to implement this. This third component is the LLM model. Instead of using a third party LLM model, we used a self hosted service for cost efficient reasons and allowed us to experiment with different pieces of our pipeline. Finally, we also use retrieval augmented generation using the vector DB. In our case, we use Facebook similarity search as well as our vector library as well as we also did a lot of experiments with VVT vector DB. This is an outline of what our pipeline looks like. I'll walk through it through the numbers. The number one stands for the data team that is responsible for producing data. In our case, we use clinical reports and research abstracts stored within a document data source. This could be a data lake or a data warehouse. In our case, for simplicity, we chose a relational database. We then have a message queue, which is what triggers the entire pipeline to come into operation. Number three refers to the KDA system. Kubernetes event-driven auto scaling uses external triggers. In this case, we integrate with AWS SQS, which is a messaging service. KDA is installed on our systems through a bunch of operators and a custom resource called a scale job. This scale job comes into operation, as I mentioned, when new data sets come in. This starts up a job that does the data embedding, which is our Huggy Face model that converts the data into vectors. This then loads the vector database. This is executed as a Kubernetes job. Then the vector database itself is made available through a deployment service. At the bottom, you have your researcher who's interested in these question-answer sessions. They make a request to the RAG service. The RAG service then talks to your vector database, as well as your LLM model, and provides the best possible answer it has with the amount of information it has. Now that we have an introduction to what a fleet manager is and what the healthcare pipeline of interest it is, we will now have Chris talk to us about what the unique needs of these healthcare apps are. Yeah. We talked a little bit about the what. I'm now going to break it down into the why. I'll be using healthcare as an example, because I think it is the perfect example of the problem that we're trying to solve, but it's still something that can be used really in any industry. When it comes down to it, it's really a conversation about data gravity. Data gravity forces us into these multi-cloud configurations. That's especially true in healthcare, where things like PII and different types of international or national regulations are forcing us into potentially multiple different cloud configurations. Data gravity is the concept of data pulling services and applications close to it. Traditional monolithic databases, for example, very often you will find it difficult to decouple from your services and applications, and this becomes even more challenging when you start moving data into the cloud. Now, I know most of you, you're here at KubeCon. Many of you have probably gone through the process of doing a modernization effort to maybe into event-driven architecture or microservices. But typically, when we're talking about healthcare companies, finance companies, a lot of times there are limitations for them going over into being completely cloud-native into an AWS or Azure. A lot of times this is government regulations, or as I said before, PII, and data gravity is a big part of that. Data gravity makes it very difficult to be in a fully cloud configuration. When we're dealing with data gravity, a lot of times it leads to hybrid cloud models, and this isn't any different when we're talking about AI. Actually, data gravity impacts AI ML workloads significantly more than even traditional workloads just because it's so dependent on the data. So data gravity will pull in your ML even more than your traditional applications. So just be considering that when you are rolling out your AI ML operations, that very often data gravity is going to force you into multiple, a multi-cluster type of configuration, potentially even a multi-cloud configuration where you may be working with things like Kubernetes, EKS, AKS, and then an on-prem solution. The type of solutions that we're talking about here are very much based on how do we move our AI ML pipelines around effectively between our different environments. And then I thought this was a really good quote. If you check out the link here, there's actually a video on this where someone goes into detail about how data gravity is significant for most industries, but it affects healthcare in a way that other industries just don't have. And it usually has to do, like I said, with things like PII. Healthcare companies more often will be affected by mergers. There may be different pharmacies that you're working for, different regulations. Your data just ends up a lot more fragmented. And then hybrid cloud. Hybrid cloud becomes very critical here, as I was mentioning before. Being able to deploy your services close to your data rather than moving your data around. Now, some of you may be overcoming that through either data normalization or different types of data migrations. But if you are an industry where you just don't have the option to move your data around to make it more fluid, this is where a multi-cloud and a multi-cluster type of fleet management is critical to be able to deploy your AI ML solutions close to your data. So for those of you who are working towards this, maybe AI MLs is new to you this year as it's become more significant in the industry. Just know that data gravity is going to play a very significant role in how you deploy your AI ML services. One technology that's really taking off right now, too, is RAG. Funny enough, the terminology when I submitted this talk with Selvi, RAG wasn't even a big buzzword. And then from last October until now, I think RAG has now become the definitive buzzword for AI right now. RAG is another way of being able to interpret data, get information close as possible and getting your services close to possible to the data. This is a way that you can add your own flavor to your models. You don't have to retrain your models. This isn't really meant to be a fully RAG talk. But just know that RAG plays a significant role in this type of multi-cluster configuration that we're talking about. So the ability to be able to deploy RAG vector databases into your clusters, especially when we're talking about healthcare, will also be critical when we're working with this multi-cloud configuration. So just know all these tools work together. Data gravity at the end becomes the critical key piece here of what you're trying to overcome when deploying these types of services. And then ultimately, what you want to be achieving through this is just being able to move your pipeline, your AI ML pipeline across different clusters and across different clouds. And this is the beauty of Kubernetes because you can have the same footprint across multiple different platforms in multiple different environments. This is why things like fleet management and multi-cloud management becomes such a critical aspect of this solution and really ultimately overcoming data gravity. And as I mentioned in healthcare, this is specifically challenging with so much different data stores across different environments and potentially even different technical stacks. Thank you, Chris. So we'll now map on what these the infra and data needs that Chris outlined to fleet managers. With respect to infrastructure, we learned that hybrid cloud and multi-cloud becomes really critical, both for data gravity, as well as for leveraging the availability of on-prem and cloud GPUs. Fleet managers, by definition, are multi-cluster and multi-cloud. This enables them to be a perfect match for deploying such a pipeline. Secondly, complex data pipelines have different components and each of them have different resource needs. So for example, your LLM and data embedding would need GPU cloud clusters, whereas your vector databases can be CPU or GPU depending on the sophistication of the similarity search algorithms you choose to use. And these change over time. So you could be having certain needs in quarter one and quarter two, you have things have changed and you want to be able to independently define scheduling policies that makes these changes to the type of resources you have available. Data gravity, which is the availability of PII data on premises and within clouds is allowed for through flexible scheduling policies, both the capacity-based policies and spread policies that I refer to. And finally, the ability to handle scale. NOVA as a fleet manager is defined to be easy to use. So there are no changes that will be needed for your application manifest, except for the addition of labels. Users can independently define schedule policies as part of the platform engineering team's roles. And most importantly, the ability to lift and shift your applications is made possible through workload migration policies within a fleet manager. Okay, so now let's look on to a working example of the healthcare app. Before we get into and start seeing a lot of terminal screens, I'll give you an overview of what we're going to see. On the right are your cloud one. We call it the on-prem cluster here, set of clusters here. This is what hosts your CADer operators, your scaled jobs, your model ingestion pieces, your RAG models, your vector databases, as well as your LLM model. On the left are your second cloud. This will be used on day two deployment. Pieces of the first pipeline will be moved to the second pipeline. At the bottom is your researcher that's interested in this drug research Q&A app, and they will be able to seamlessly continue to use the app as the migrations are complete. On the top are the document data stored in the message queue, which are cloud instances, and they're not part of the clusters we'll be showing you. I'm not able to see parts of the screen. I'm going to redo that. Okay. Step one is viewing your managed clusters. Here we talk to the fleet manager API endpoint. We say, no, get clusters. Cluster is typically not a resource that you'll see in a single Kubernetes cluster. This is a custom resource that the API endpoint was extended to be able to satisfy. We use a fleet of 10 clusters. Four of them are on EKS and six of them were on GKE. There are different regions across the US. We describe this custom resource, the cluster, to see what kind of information it contains. For example, the most important piece of information that we want is its capacity. We see that this specific cluster has five NVIDIA GPUs. We look at other pieces in the custom resource. The fleet manager adds labels to your clusters regarding the type of Kubernetes version it has, the region it's available in, the provider, and these pieces of information enable scheduling decisions. Going on. The second step is deploying your workloads on this fleet. Your fleet manager allows you to view all the deployments across your cluster. Here we look at a sample manifest. It's the CADA scale job. This has additions to it, namely a scalar. The CADA project provides integrations to a number of scalers. In this case, we have used the AWS SQS scalar. All it needs is the URL and the region it's available at and a secret contains the credentials to access this queue. At the presence of such an event, we call the model ingestion job which loads from your document data source into your vector DB. In this case, we get. We look at an example of another resource, which is our LLM that is being self-hosted. We use a Mosaic ML 7 billion model. We chose one for simplicity and cost efficiency that can be run on a not too expensive GPU instance on EKS. We then look at an example of a schedule policy. We look at a secret policy. The secret becomes a resource that is needed on both your source and target clusters. It's using the spread policy, which simply says that take this secret and make it available on every one of my clusters because I expect to be migrating across these clusters. There are much more sophisticated policies that we'll have. We're just showing you this one for simplicity. The next step is being able to deploy your LLM and your RAC components. You can use it through. We deploy it and we show that. We just showed two separate deployments. One is your LLM model and your RAC service. Both of them have been deployed on different clusters because of their GPU and CPU differences. You see one is on on prem one. The other one is on on prem two using the duplicate policy. The next, we access the Q&A endpoint. We once again talk to the NOVA API. We find the external IP that was assigned to our service, which is the second service, the RAC service. We then just use a postman UI to talk to this endpoint. We ask a sample question. Since I loaded the documents to the document store, I knew of a particular research report. I ask a question about a summary of a study on the mechanism of isocryptomerone. That's the answer provided by the endpoint that is live on our clusters. Day two, this is where we show what we would have to do to migrate your workloads, say from EKS to GKE. We first make sure that the two prerequisites, namely the secrets and namespaces, are available in the target cluster. We check that it is available. We then look at what current policy exists. This is the policy that we're particularly going to move one component of the pipeline, which is the RAC service. We notice that it spreads based on a particular label that it expects to see in the cluster. So we go ahead and edit our custom resource to add this label so that it's going to be a new target for our component. We then reapply the policies and wait for the service endpoint to become available in the new cloud provider. We talk to the Nova API. We find the list of deployments. We see that the second RAC service is now on KubeCon Cloud C cluster with the same duplicate policy that we just modified. We find the new service endpoint. This is a typical GKE external IP. We then open our UI interface, and we ask the same question, and we see that the RAC service Q&A endpoint continues to be available. Okay, I'll go back to our slides. Okay, so what we saw today was the unique needs of healthcare J&I apps with respect to their data and infrastructure features. We found out that fleet managers are able to satisfy a number of these requirements through flexible workload placement as well as migration policies. We'd love for you to try out the Nova fleet manager. If you have any questions, please contact us. Thank you to our teams who made all of these pieces possible. Happy to answer any questions. Okay. Looks like there's no time for questions. Sorry about that, we'll be in the room the whole day. Come talk to us. Thank you.