 So, in 1943, some years ago, Thomas Wetzen said a really interesting sentence, and if you don't know him, he was the founder of IBM. And what he said back then is, I think there is a world market for maybe five computers. And look at us now. Yeah. And I think that's a really great example of the cyclical nature of technology in general. Why are we talking about the cyclical nature and expectations management in tech? Well, I think everyone might have heard about the thing called AI hype that could be happening right now around the world. And I actually love this Gartner graph because it really highlights that what we are feeling are around the world, that the hype cycle is very strong at the moment. It is very much happening. But as technology professionals, we also know that that is not the only truth or so to say. We know that, OK, soon, likely, it's going to go down, and the slope of entitlement became and then, actually, we will get to the plateau of productivity when the hype disperses. We can actually see the benefits of the technology in the long run and so forth. But mostly, this is just to set the scene, that I think we know that the technology behind these things, innovations, is quite nice, and we can really utilize them. I'm very happy to welcome you to an offer you can't refuse, discovering Chicago film sets with Amalops and Kubernetes. I'm Annie Tavastero. I'm CMO at Vision, which is a Swiss DevOps company. And I'm also a CNCF ambassador, as well as an Azure MVP. I also have done such things like Cloud Cosy Podcast, Cloud Native Live, hosting for a couple of years and so forth. So a lot of fun stuff. I'm very excited to be talking to you here today. Fantastic. Great to have you with me today. So hi. I'm Andi Pollak. I'm the author of Scaling Machine Learning, where I'm diving into different apologies of how you can do distributed compute, starting with Spark, and progressing into TensorFlow and PyTorch distributed approaches. And I'm also a Databricks ambassador, which means I'm working extensively with the open source community, such as Spark, Amalops, Delta Lake, and so on. Perfect. So a bit about our agenda today. We have about 30 minutes, so we're going to be quite quick. So we have introduction, which is happening right now. Then we have some best practices. Then we have a quick demo. Then we have the open source stack that you can use to do all of these things that we're talking about. And then we have some learn more resources in the end. So very much I think we're going to have a fun ride today. To set ourselves up, I think we can kind of agree that Amalops loves Kubernetes as a whole because AI really requires a lot of the things that Kubernetes can offer. So it really requires partability, customizability, performance, consistency, micro-pervices, composability, and security. So Kubernetes offers a great platform, a great orchestration tool to manage Amalops and AI in production, which is also the reason why we are here talking about Kubernetes and Amalops together. But how does that journey look like then from that side? Yes. According to 2019 research, 70% out of all the work that researchers are doing to develop machine learning models never got to production. And that because a lot of them were under, they didn't get enough budget, they didn't have enough support from the teams, and these things are starting to change. According to the recent research from the last year, we are seeing that only 50% of the actual models that are being developed don't get to production, which means 50% actually gets to production, and that shows that there is a market, there is a need, and we finally understand what are the benefits that machine learning can do for us as an industry. And so our questions shifted, not from how do I develop the best model, that's always a big question, but actually how do I enable the machine learning folks, the machine learning researchers, data science, and so on to get their models into production. And what we're seeing now that our stack has expended. We used to have, as developers, as practitioners, we used to have our dev environment, our staging environment, our production environment, and now many companies are adding another environment. That's the experiment environment. That's not the A-B that we're all used to, but it's kind of similar in a way that we need to give researchers a place that they can run their experiment and scale from four cores that they usually start with to thousands and thousands of machine, looking into distributed training as we go. And those environments are extremely hard to develop because challenges like hardware compatibility, the different model, image size, and so on that we're going to dive into them today. Perfect. Then the title and the description did have movies in the description, so why that, how come? So Chicago actually is historically very well known as a film city. So back in the day, actually over 80% of film production or film makers of the movies were produced in the US, so over 80% of them were done in Chicago. So historically very strong movie city, and even nowadays Chicago ranks among the top hubs for moviemaking, so obviously Hollywood is quite big as well, but Chicago is out there with the other hubs as well. Just as an example, there's only one studio, for example, that has over 36 stages, spread over 1.6 million square foot feet of studio around the city. So that is very much kind of like a testament to the moviemaking capabilities of the city, which is why we have selected to talk about movies in this particular cube con. Now, some of the famous movies filmed in Chicago that you might know or you might not know, depending on how deep you are in the topic, Ferris Bueller's Day Off, for example, Transformers 3 and 4, exciting. Then we have Batman Begins, The Dark Knight, Batman vs Superman, The Batman, so a lot of Batman movies, for example, have been actually filmed here, which is actually exactly why, but now that we're going to be running through the six and some extra ones, best practices for MLOps, it's going to be the special edition Batman. So very happy to go through these six and some extra with you today. So we're going to kick off with collaboration. Yeah, I love that addition. There's always a big conversation about DC vs Marvel, but today we're all leaning into the DC space, so that's fine. OK, so the first one is collaboration. And collaboration speaks about systems of people, right? It's how do we find the language to communicate and converse what it is that we need with different parts of the organization? So a lot of the challenges is how do we develop with the customer in mind, knowing that now maybe one of our customers is our researcher. They care mostly about papers. They want to get a high age index. And we need to build a system that is good for them so they can build state-of-the-art machine learning, and later on we can push it to production relatively fast. So we're changing a little bit the way we're thinking about the world and the system that we build. And we also know that now we need to have kind of a shared language, and with the boom of LLMs, all of us kind of learning what everything means, what are tokens, how are they different, why is it bytes and tokens, and data entries, and so on. So this is really the first piece that we need to tackle is the collaboration and finding a shared language within the organization. Exactly. And if sometimes people might say that developers or data scientists might be a bit of a lone wolf, so can definitely Batman be. But as such, as Batman, developers do also have to learn to collaborate, for example, as Batman and the others did in Batman vs Superman, so you do have to kind of learn these skills to slay all the monsters that are attacking you so that you can team up and become the superhero or the MLOP team of your dreams, for sure. But more best practices, version control, and model management. Yeah, just one sentence about Batman that I want everyone to understand. So Batman is the ultimate solopreneur, right? When we think about Batman, he develops his own ships and his own cars, his own things. But even Batman knows that if you're around alone, you can run fast, but if you run together, you run far. So this is something to bear in mind that even our greatest heroes understand. Exactly, yeah. Yes, so the second practice is something that we see very often, we're most familiar with that. We need a place to have a version control, but now we're adding another layer of model management. I need to version my models themselves. I need to manage them. I need to know where are they in my lifecycle of development. Are they only an experiment? Do we need to scale them? What are the hyperparameters? What are the variations of distributed settings that they need? What is the GPUs that I'm running on top of them? Because I might need that for inferencing as well. Inferencing is when I'm deploying my model to production and serving my customers using that. So there's a lot of aspects around version control and model management that we need to add into organizations in order to take it to the next step. Exactly, so for example, tools like MLflow or Kubeflow can really help with model versioning. And version control is something that Batman has had to learn the hard way as well because there's so many versions of it. So you have the Michael Keaton version, you have the Paterson version and so many others. And even there, you really do have to do a lot of version control because all of these different versions have different costumes as well as different mannerisms and so forth. So they need to have distinctive features. The next step, we have automated testing and CI CD. So really, this is super important because it really benefits the code and during the code reliability, efficient deployment and continuous improvement is top notch. So CI CD pipelines automate the integration, testing and deployment processes and reduce the manual intervention and accelerating the delivery of reliable models to production then down the road there. Again, this is really critical as we're building a feedback loop. The CI CD want to be able to be as close to production as much as possible. So again, bringing fresh data and real production use cases going to help us a lot in figuring out if that model actually makes sense. And we'll touch a little bit about that, about feedback loop and how it's kind of like a place where we don't know how to measure our model especially with LLMs now because we don't know if that text is being generated as high quality enough or not essentially. Exactly. And I think another good point that Bathman has made about automated testing or like tool that we know if Batman uses it as well, what we can see offered is part of this maybe. So here and there he says, can I proceed you to take a sandwich sir? So he really helps in the background. He helps Batman with these food needs or with for example, he's for example, testing his equipment, making sure all of these things are happening and it all is happening quite automated and really easily and independently from the needs of Batman, which is a testament to the importance of automated testing for example. Yeah, so then we have containerization as the next best practice. Yes, so for long, long time now, we most of us used to use either VMs or Docker leaning in between the two and their different tools. Some of the challenges that we see here is now if we want to containerize everything together with our model, we're looking at images of about 60 gigabytes. And that could be really, really big, especially if we need to download that and if we need to do auto scaling. So now we're changing things a little, but it might be that during time when my pod is going up, I might start downloading it in buffer mode and there's some different open source tools that I will need to download my model in buffer mode. And there's also a question about the hardware. So VMs are usually faster because we don't have another layer there that we have with Docker. So those usually get better throughput from the network, but then it means that we need to program our image and program everything that we do to fit a specific VM and a specific operation system where it runs, where Docker kind of makes our life much easier. So here we're facing a big challenge, especially with inferencing when we need to scale up horizontally and we need to create more pods on the fly to serve more customers. But how do we containerize that LLM model in order to be able to scale relatively fast? And we have another bullet point about scaling, but I want you to think about it because this is another real bottleneck that we're seeing in the industry. And it seems like only big companies found a solution and some companies are actually leaning into renting their own servers again. So it's kind of like what we've seen 10 years ago about cloud versus on-prem. So it's an interesting discussion that happens now when each company develops different solution based on their budgets and resources. Exactly, and also Batman is an expert in containerization because he has the Batmobile and he can use it to isolate himself from the world around him. So he can even be protected from explosions, lava pits, everything around him. And I hope that your dev environments are not as dangerous as the explosions in the lava pits in Batman's life, but they could be as stressful, but hopefully not as dangerous at least. But there's also monitoring, logging, security and documentation, which is a lot to fit in the one best practice, but we'll try. Yes, we'll definitely try. So who here handled security for containers and scaling up in SQL injection, DOS, D-DOS? Okay, I see a couple of folks, great. So history kind of repeats itself only on a different flavor. So what we're seeing here is things are very similar to SQL injection where we can put any string that we want and essentially we're sending it to our servers and those servers are going to give us the information back. And because it's very similar to prompt injection where we're sending essentially a string to the server, then history kind of repeats itself and it's a good opening for prompt injection, different things like XSS, opening a shell, opening a shell with the actual server so we can do whatever we want. All the traditional network attack, OpenAI just announced that they've seen patterns of D-DOS attack in their network. This is why they've been down since yesterday. And that makes it really, really hard for us when we think about how do we wanna containerize our model and how do we want to create also a sandbox for that already heavy model that we need to serve in order to make sure we are protecting our customer's data and we're protecting ourselves from all of these attacks. And it also correlates to the feedback loop because I wanna know my model is okay. So how do I distinguish between what was an actual prompt injection attack or what was an actual kind of a bad prompt that got into the system because my user is not educated enough and my feedback loop returned an answer that it wasn't supposed to return and maybe I need to retrain my model again. So we're entering this phase where we need to have gateway for our AI that it's a little bit more sophisticated, something that is similar to WAF web application firewall where we have filters of what is going in, what is going out, which add another layer of complexity because of the nature of how these models behave. That again, we can put it in a strategy and target it when we need to but we should be aware of that as well. Yeah, and Batman is also a bit of a security fanatic. I think a lot of you probably know that. So he really does monitor Batman with, for example, all of these screens that he has in his Batcave. So he is really focused on monitoring, as you can see from the amount of screens there and he's really focused on logging it all as well as using that for security. But obviously he might be keeping illegal recordings of all the people of Gotham which is not maybe a best practice that you should be following. So don't do everything that Batman does but something to consider. Yeah. You know, the next Biden administration makes something new. Yeah. Then next up we have scalability. Yes, scalability is a really interesting topic. So if we go back to the slide where we look at the different environments, right? From the experiment all the way to production then we see different places where we need to scale our clusters from experiment. We need to go to distributed trainings apologies where we need to configure all the hardware, configure how we're distributing, configure MPI and a lot of different things depending on what it is that we wanna achieve. And the goal here is really to abstract all these from the researcher. So they won't need to deal with all the ops underneath. They'll only need to deal with how the data looks like and what's the best model to use here. So that's a really big challenge now how to abstract all of these things. Second thing is inferencing. We're actually in production and when we're serving the model are we serving the model in batch or are we serving the model in streaming? If we go into batch maybe we can have some prediction around what is happening with the amount of requests that we get. We have some previous knowledge of what's going on but if we're thinking about streaming if we're building a solution such as serverless or SaaS or anything like that then the predicting how to grow, how to scale my cluster becomes really hard cause it could be that adding just another node another pod could take something like an hour. And so looking at the SLAs understanding what to scale, understanding what to do with streaming is a completely different use case than what we're doing in batching which is a little bit more predictable. And also we need to add layers such as caching that is another big questions. Like how do I do multi-tenancy for a specific customer if I have a customer profile that means my LLM is slightly different. So how do I know how to cut the base LLM? We have the network. How do I cut the base LLM and keep the edges just for the customer? So that's another question that the open source world is trying to answer. And for auto scaling specifically there's a really cool project named KEDA. I know probably most of you are familiar with that. It's a CNCF project that enables us to auto scale based on cues. So instead of looking at CPU cycle or GPU cycles now I'm looking into cues and I can configure it to look at specific things that are relevant for my LLM or for my machine learning that it's not based on specific cycles of a machine. So a lot of open questions and that's based on very specific use cases. Yeah, exactly. But KEDA just relatively reasonably graduated as well so that is very nice for them, congrats. But Batman knows as the last Batman tip of the day Batman knows something about scalability as well at least in here where he's actually using his backbone ability literally to extend his body and legs and arms by becoming a mecha robot which is one way to do scalability but it is scaling his body literally. So that's very nice. But then moving on to best practice rapid fire. So there was a few tips that are kind of more condensed here than the big ones that we had dedicated there. So MLops loves DevOps. So if there's any DevOps or Kubernetes cloud maybe professionals here we're wondering, okay, should I be just starting to focus on MLops? Is it something where I should be exploring more? And that has been maybe a key theme across this KubeCon as well on we need more kind of resources or ideas from the cloud native scene towards MLops to solve a lot of the challenges that Adi has been, for example, mentioning here. So while we do that transition I think we have to keep in mind that all of the DevOps practices that we have learned and loved within our previous work, those all usually apply. Like obviously there's a lot of things that you need to learn in top of with MLops space but don't let's not forget all the good DevOps tips and tricks that we have learned that help our lives. Understanding the bottlenecks is also super important. So really kind of digging deep into, okay where are in that long road from research to production that Adi kind of mentioned in the beginning where are the actual bottlenecks in the organization? Is it for example in the self-service models for data scientists or is it when you're taking actually things in the production and so forth. So there's a lot of spots that you can kind of look out for. Up-leveling your skills. So if you are doing a transition from a certain area to another one, obviously focusing on learning the skills that you need to master to get there better there. So if you are, for example, cloud native KubeCon like a Kubernetes person then might be worth looking into Python, math and basics of data scientists. And vice versa, if you are more from the data scientist side and trying to get into MLops then it might be worthwhile to learn with more about cloud services and cloud native and so forth. Last but definitely this time, definitely not the least it's all in the data for all of these things because if you put that data in you're never gonna get good results. So talking about the data of our presentation today. So we have the Chicago data portal filming permits from the transportation department. That is a bit of a mouthful but essentially this is data open data from the city of Chicago that is about how many filming permits what kind of filming permits does the city give out? The data is there, but we're gonna do a quick run-through of general parts of the data as well but anyone can dive deep on their own time as well. Me as a movie fan, I think I love this data so much it's a lot of fun. And also obviously there's a lot of different things that you can discover from there. So just like a quick tip, it's there. So as you can see the data set here just a few days ago I took screenshots because I didn't wanna trust the internet too much to be fast, that's always a bit tricky. So we have for example, or 14,000 views, downloads and so far it's all the data comes from all the way from 2015 to current. So unfortunately for example you cannot see the filming permits for first viewers day off or so forth but it still has a good amount of data there. It has columns such as application number, application type, work type, applications that is in software so there's a lot of things there. This is how it looks like if you start like opening up the rows so you can see there if you can see, well it's quite big. So you see current milestones, applications start date, application process date, name, comments. And you see here for example that there's a prom doing some filming for their high school prom which is kind of cute in my opinion, that's very nice. And there's a lot of student films being recorded and so forth but there is a lot of exciting ones if you can look through it there's a lot of things you know. Some movie filming and inter-filming and so forth and if we go here I think this one has yes so we have the NFL in zone is doing some filming here so if anyone's a hockey fan you know where to go on that date then. Don't disturb any film sets obviously but you know where they would be. So there's a lot of interesting facts here you get obviously the address, exact address what are they doing, even more utility data what are they actually doing, what kind of camera sets are they using or even the primary addresses of the people who are asking the data and so forth but there's a lot of fun stuff I do you want to talk a bit more about it? Yeah so I always say when you do BI or kind of data analytics you can shake the data as much as you want to confirm that what it is that you want to see. So just bear that in mind when we're doing that but some of the interesting things that we wanted to know is which street are probably going to have film location the most and so we looked at which street requests the most permits which one get accepted so you can see here that Clark Street, Wabash Street, Walker Street and a bunch of other of them get the and the one that gets the least request is Paul Cren but it still gets request so it's more than the one that don't get anything and of course we kind of did some diving in and then we asked ourselves okay so which month should I go to this street to find a film location and we discovered that March 2023 was the most crowded in terms of request applications so definitely if you're a movie fan, if you're a Batman fan, if you know there's a movie coming up and they're now filming you need to go to Wabash Street on March next year and then we can we put it on the map to see how it looks like, what's the distance if we need to run from place to place as well so we can better plan our next visit to the city. All right, should we go for the demo? Yeah. Right, so I talked about all the challenges we had with Kubernetes and one of the biggest challenges that Annie and I had was actually getting GPU pools apparently those are scarce and you need to book them in advance and have a very big company logo attached to them or work for the biggest big clouds. So the second best that we did and again to demonstrate what it is that you can do is we actually took pre-configured LLMs and we fine-tuned them so here in Azure what you can do you can create your own deployment and so I created my own deployment I took GPT 3.5 Turbo and I added it I fine-tuned it with some data that I put here inside so you can see this is the Chicago Q&A Jason L Limes that we added and I call my model Annie because I can. All right, and now that I have my model fine-tuned I didn't need to deal with any GPUs, lucky me. I wanted to start experimenting running some things against that to see what I get back and so I started building instructions sometimes you'll see things like a prompt that it's more of a template size you have you tell the model exactly what it is that it's supposed to do this is how you kind of focus it into the right direction so you have examples, you have the questions that you wanna put and you wanna add a response now this often time works and this is where you can add like hard coded rules to your model in order to make sure it doesn't expose any specific data or it answer to some specific request or even if you wanna change specific terminology that the user puts cause the user might not be extremely familiar with your product terminology and you wanna switch it to something that exists for example within your docs and so on this is possible so we can do all of that but then we decided to just play with the prompt specifically for the case of this demo so here I have some input that I'm getting from a user is there any permits with application start date November 23 and you can see there the way we structured the sentence was specifically to fit the data that we gave the algorithm so applications start date this is a column in my data and it was really important cause otherwise the model didn't respond well so something to bear in mind and we're doing just an HTTPS request to Azure using the OpenAI on Azure so we're sending that request, we're packaging it we're sending it over and then we're getting some response and the response is interesting but it says yes there is a permit when an application start date in November but it also says that the permit with a specific number here 1850015 has an application start date of August 28 so you can see my model hallucinate a little bit although I played really really hard with telling it what to do and this means that we still have a lot of work to do around fine tuning, around prompt engineering around working out with our data to make things work and so my Batman is traveling in Chicago and hopefully we'll find them soon but this is just the beginning we still need to do a lot of work to reach a model that actually works for our needs either if they're internals or they're external needs so the whole process around productionizing, fine tuning adding observability, the feedback loop, the quality assurance is really really critical and by the way for quality assurance there are some known ways to go about that there's the rouge and there's the blue from a hunking face that helps you assure that the actual generate text makes sense and one of them is actually adding a human in the loop so having a dedicated person that looks at the response and say okay that makes sense or it doesn't was really really critical for that. Good, Batman is traveling actually from the venue to the filming location as well so that's useful that's perfect, yes and then let's switch this a bit perfect, so that was the demo quickly there so then open source staff for LLMs that we would kind of recommend you to look into is obviously thinking about things such as secure GPU pools and hardware you need to do get that, open source models obviously so Lama, Dolly, Claude, NBT, Falcon and so forth would be at the top of the list and hunking face for example as you explore all of these LLMs starter pack that was pitched or mentioned by Priyanka in the keynote this week was a really great place to start with and start playing around with highly recommend that. You do need a specific hardware for that so just make sure you're, if you're going for this make sure you have the hardware that is needed. Yeah, so hunking face I also mentioned that already quickly there, really good stuff and there's been a few good demos of hunking face within this cube and cube con as well so I would recommend checking in that as well and Langchain as well as Kubeflow, Airflow and MLflow for good running there as well. So and data obviously is needed as well and make sure that that works really well but then what is actually Kubeflow that I'm going to mention a few times within this presentation already? Probably many of you have heard about it but if you have not so Kubeflow is an open source platform that's designed to make it easier to deploy manage and scale machine learning workloads on Kubernetes and it provides a set of tools, frameworks and best practices to really streamline the development and deployment of those machine learning models. So it is really an end to end ML workflow so it really helps you cover the entire machine learning workflow from data pre-processing to model training to deployment and monitoring and it also helps with scalability and resource management so you can really leverage Kubernetes orchestration capabilities for easy scaling of machine learning workloads and really dynamically allocate resources then and it really helps with reproducibility and collaboration as well and collaboration has also been a topic at hand throughout this presentation also in the Batman tips so that's also why this is really important so it really helps you with capturing metadata, environment details and code versions associated with each experiment as well. When you think about building the control plane it helps you manage a lot of the metrics underneath to get combined with ML flow that gives you another layer of managing your models themselves. Exactly, so for example if your bottleneck is around for example having a self-service model for data scientists in your organization that really helps them boost their productivity to flow is a great resource for that. But here are some learn more resources so obviously the linked slides you can also find them in skit as well but I've added them to github as well. There's the data link there that if you want to check that out it's a lot of fun to play around with that. There's the practical MLOps book if you want to get more hands on like okay this is how you get started in Azure, AWS or so forth on MLOps, highly recommend that or I'll book there. There's Adi's book there that she has written about scaling machine learning and then I did a session in KCD Washington DC more focused on kind of like the overall how does Kubernetes fit into the MLOps world and so forth so if you want to tune in there and then there's also a bit more resources there on Kubeflow for machine learning as well. Now what have we actually gone through here today? It's been a tight 30 minutes. We've learned about collaboration, version control, automated testing, containerization and monitoring logging and so forth and scalability and we've learned that be like Batman was the theme of the beginning of the presentation but remember that don't always be like Batman because we are really living in the real world but thank you so much and I think we have a bit of time for Q and A now we'll also be here obviously after the session so you can come and talk to us afterwards. Thank you. Thank you. Thank you.