 Okay, I think we'll get started thanks folks for joining there's some people join it coming in please for free to see where we like Thanks for joining this session today about balancing productivity that Large language models and inference API is an AI give us with the responsibility of Accountability particularly when it intersects in cloud native and I'm going to talk to you from my own experience Building and submitting a project the CNCF and some of the challenges that's encountered. So just very quickly Yep, just very quickly. I am a principal engineer at AWS I work in region services, so it's building out regions and everything that entails but you know, that's not really why I'm here today I think What I'm really here to talk about is the sort of more existential fit side of things that interest me which is sort of the rise of Artificial intelligence, you know, you get images generated by Dali you have non-stop conversations about all through kube-con But I think really interestingly if you ever read Isaac Asimov for anybody as a fan You can sort of see where the world is going and it's it's interesting because we've started in the virtual and AI Eventually will be making its way to the physical What what's kind of funny is that the reality of that right now is it has very little spatial awareness It's typically used to regurgitate coding problems and produce bad images But once you have agents running on top of models and models going on top of models that are high They're tuned to perform to optimize the models beneath them We'll get to a state where it becomes highly performant and applicable in every walk of life We've already seen that for the past decade or so where financial models insurance models have been Guided by by ML and now we're starting to see that this is coming into the physical world Where that's kind of sweet of AI ML is being applicable for things like transportation security and health care What's interesting about this as well is that all throughout this kube-con so far we've heard about this intersection of Cloud native and AI and I would sort of challenge. How much do we actually really know about it? I think it's it's an interesting set of domains to kind of entwine But if you think about the crowd of machine learning engineers and data scientists versus folks who are typically involved in cloud native They're a very different type of community and they're very different needs and so this Because epimetheus is his name is the husband of Pandora here using the key that Zeus provided to Unlock Pandora's box and I think we're kind of getting that getting towards that point fairly soon So today I'm going to talk to you about a few things We're going to discuss Kind of the accretion of knowledge to how do we get here today? We'll also be discussing The kates GPT project. I want to show you sort of what is it about? Why did I think it was an interesting idea and what are some of the challenges? from an ethical point of view but also from a Community and technical point of view with starting to build some of the first Practical cloud native applications with AI and then we'll end up just discussing briefly kind of where we're going and what does the future look like? So to get started let's talk about the accretion of knowledge Well, a lot of these projects that you see today have been the beneficiaries of hard work for the past 50 years I'm not going to go through all this in super detail But when you look at things like deep minds convolutional neural network and the capabilities for image processing that that enabled Are you look at the announcement of Bert? These two key technologies then and enable things such as chat GPT and other services And so you can see that over the past decade or so there's been a for trajectory in terms of how we can get to where We are today and to the you know services We know such as chat GPT I think that that's been really the first exposure the public has had However, many papers have been published and I think there's been incremental process progress Especially moving over to GPUs as it's for to train and to serve so I think what I wanted to sort of Reflect upon here is that we can see that there's some really interesting opportunities of genitive AI You know, it's great at being able to produce images and text and sound It's even better yet at looking at patterns and recognizing trends within those patterns and we can perform some prediction I mean people from the ML world. This isn't new, you know You have a dimension with features you perform a linear regression on it or you'll run out some other format of regression And you can start to build out fairly accurate models But I think what's interesting is when you apply technology such as transformers You're suddenly Decreasing the amount of time taken to train you can reuse models rather than to start from scratch And you're also starting to enable people to play around with hyper tuning and waiting I won't go too much into that side of the house today Other than to say that all of this sort of came out about the time I was working heavily in Kubernetes and I was experiencing exactly these kind of problems Looking at patterns that you find in Kubernetes debugging in failure modes looking at how to translate those complex error logs event logs and Figuring out ways to then predict if there'd be a future failure so it felt like there was a natural synergy between some of the capabilities and Superpowers of generative AI and some of the things I was looking for So I want to talk a little bit about the esotericism of Kubernetes. It's a bit of a mouthful that Effectively Kate's is a really interesting system because it's more than just a substrate these days And this is my little haiku that chat GPT wrote for me earlier on and I thought that was kind of an act an at Observation, you know fixing is certainly very rare for me anecdotally, I remember working in American Express and We had h.a. proxy failures and we were dropping packets and we Narrowed it down to several nodes and it turned out to be that there's a soft IRQ reset That was being caused by various contract issues and what was funny about that It was really nothing to do with Kubernetes But when you're starting at that viewpoint and that's your optic It's very difficult to see what's actually wrong And just in this little illustration here, you know the signal-to-noise ratio I think we often see is very skewed because you're looking at the the the yaml You're looking at the logs You then got your observability on top and then you might even have something coming out at the the Linux kernel Dmessage level so there's a lot of different places. We're expected to look and there's a couple of Second-order problems there the people who know how to do this stuff crystallize that knowledge in their head and get really good at it And they accrete that knowledge and over time they get more valuable But the delta between their knowledge and a junior becomes even larger and this chasm Is almost insurmountable because you have 20 years of sysadmin knowledge On top of the case knowledge that you're applying to the problem So that was one of the areas that was that was really difficult to surmount Also a lot of case issues. I would probably wage a 90 odd percent of case issues are effectively Linux issues issues I used to work economical, you know, who built the Ubuntu operating system and Many times I'd find I'm looking at Ubuntu rather than actually looking at kates. For example, you know U limits any kind of system ctl configuration Even ip table rules could often be a problem as well as system b itself I just love this gift. So I thought I'd put it in there But but effectively what I'm trying to take Emphasize here is that kates debugging is getting harder and harder, especially with virtualization, right? When you have a hypervisor layer on top, we're talking about things like gpu acceleration And there's a lot of intrinsic knowledge there. For example, if you're using kuda mig vgpu Even if you're doing network accelerator Acceleration as well on a smartnik as an enormous amount of context you need to have to debug that particular problem So here's here's sort of where we are in terms of what I think are the really big challenges that kates debugging That top one the amorphous end dimensional layers effectively means the complexity of kubernetes is not linear Because every time something wax something on top it comes with its own state of the world So for example, whether you're using backstage argo, whether you're using an operator They're all at odds with the modus operandi for kates You're now adding a new set of things that could break a new set of things that can break in new interesting ways with their own dependencies You've also got tacit knowledge formed, right? So you've got this idea that Only people who have actually worked through this before can solve it Because this requires Critical thinking that's not linear. You need to make a jump to be able to say, ah, okay I understand that problem because I've seen it before so that that's an inherent challenge When you're trying to troubleshoot kates and then in the last one just to touch upon that again That's signal to noise ratio as we start to move to all this world of managing 2030 clusters with several hundred namespaces with several thousand pods the signal to noise ratio is incredible If even if you have an observability vendor, you're drinking from the fire hose looking at events You're purely relying on alarms to to help you to debug before you then have to jump in to look at the The analysis of metrics and whatnot So it's challenging So this is where I kind of was inspired, you know I was really keen to take back the power in terms of it's getting out of hand This isn't this is my really poor attempt at building gifts here And what I thought was interesting was look we have an opportunity here There are a bunch of apis that are becoming available. How can we leverage those to solve some of these problems? So if you don't know kates gpt and I assume most people probably don't I'm just going to run you through it very quickly, right? So effectively what it enables you to do is to intelligently aggregate different types of signal and to return responses on those signals It has a bunch of different analysis capabilities that are built into it And it runs as both a cli and an operator So you could run it in your cicd pipeline as a as a cli or you could run it continuously as an operator It produces either crds or jason or whatever you like really to spit out the reports You can also see there it can categorize cvs based on their priority So in the beginning when I thought about this, this was probably last april prior to the kubecon then I was trying to figure out what should it integrate with I I was familiar with hugging face And I played with some of the generators, but I I'm not an ai ml experts. You might have Might have gleaned So I looked to open ai because they had a public api and I felt right but let's start there So we built an inference api the two of us at this point in time I'd roped a friend into to helping me And we had some really cool results. So we had a pod analyzer We could scan the state of pods We could compress down the events and create a context window with some relevant prompt data Soon after we atore a colleague and friend from spectro cloud Built out local ai which was a c++ wrapper for inference And this was really interesting because he was able to go off and build Interesting projects of ktpt and local ai and that created a natural community One that I wasn't really aware of at the time, but I kind of looked at it and moved on After this came an azure integration that was committed by the community That's not such a big deal because the azure integration was effectively the open ai api but hosted by azure After that we got a cohere integration and that was really cool because I think this is about june time We were working with several key contributors We were having a great time building up functionality And this was an inflection point where we started to see people outside of our immediate community want to contribute a back end After that came bedrock And today comes sage maker So all of these things now integrate in a first party way with ktpt And this is pretty cool because the project only got built about five months ago So how does it actually work? I've explained it. I've kind of bigged it up and I make it sound very grandiose And I saw this picture on twitter the day and I thought it's kind of funny This is just a remark towards what open ai has now said they're going to do to companies that are building on top of their platform So it's not a gpt wrapper, right? It doesn't run in a window in electron and and just call open ai The way it actually works is there are a set of analyzers I think there are 13 or 15 in total now. It's all based in golang It's all publicly available. These analyzers are just codified sre knowledge. There's nothing magical behind it I actually wrote part of this code when I was still a practicing sre trying to train up my team This is unit testable as well because you can mock out the kubernetes api You can see here that we're simply looking to see an ingress class as it exists This is packaged in a fairly neat way. This isn't particularly full of context But what I'm trying to emphasize here is that there's no magic behind it The magic happens when you actually pass the correct type of context and explain to the Inference api that what you actually want back isn't isn't kind of a guess But just to repackage the english or the language that you choose So that the user can understand it in a more meaningful way And I think that's an important differentiation from what a lot of people are doing with generative ai I think they're asking it to guess on things that it doesn't necessarily hold subject matter expertise on But what it is very good at is linguistics. You know, it's built on nlp So if we've got a complex error message, let's leverage that to make it a bit simpler Again, you can see that this is part of the cohere client There's no magic here other than we're sending these things off and returning back a simplified version of the message So probably that takes away a little bit of the magic there, but let's see what It actually looks like under the hood Effectively you have an analyzer. I picked the pod analyzer right here That pod analyzer interacts with the api server either directly through proxy And it has a few other modes as well Effectively what it does is it picks up different events of various types It looks at pod statuses pod health and then it will bundle those into a set of pre-analysis results The pre-analysis results will then go through the api provider parser So that it can then effectively build out what we call A sort of a result analysis and that enriches those results and comes back again What's interesting as well is that we've built this capability so that you can now start to Cache those results based on a key so that different api providers might give you back different results And you can see all of those different types of results Whether it's bedrock, whether it's cohere open ai You can start to build a broader opinion on what the best way to triage a problem is And so this very quick ability to switch between ai providers is a superpower in itself I also mentioned that we have a kate stupity deployment Again, you just deploy this into your local cluster or remote cluster and it provides you Custom resources so you can see here the operator will be running continuously And it produces these custom resource definitions So custom resources rather and you can then link those into your pipeline They show up in our go cd, which is quite cool as well So you can see the relationship between kate stupity and the custom resources And they can hold anything from errors to cvs to any other findings that it picks up in the cluster What we're seeing now is that people are starting to use kate stupity in an even more unknown way Bit of a bit of reverberation. I apologize We're starting to see it used in an even more unknown way and that is starting to use it in active training Where you combine two clusters together typically using cube flow Where lang chain will be feeding in hyper parameter tuning and then serving back into the kate stupity deployment This was originally done using local ai But having now brought this to cube flow was seeing that those workflows are starting to now Give us the capability to do fine tuning on top of the existing model being used in the cluster I'll talk a bit more about this later on because it's quite interesting to actually think about what type of models are supported locally And I think what's also interesting about that is you're starting to answer some of these ethical questions that will come up about in a moment So if you're interested in the project, I just wanted to call out some sort of key metrics here We have three different production use cases actively on at the moment We have a community of over 30 key contributors And we've I think were something like as I say 13 to 15 analyzers with over five different back end providers So check it out. See what you think and as I say we're moving towards a model of trying to toe the line And delicately walk between security privacy, but usefulness and application So let's go to a demo and actually see what it's all about Okay, so here's my screen. I'm just going to show you very quickly What my Kubernetes cluster is running? I've got 36 or so pods. I have a bunch of different Arbitrary programs that I've written. I've got some stuff that might look familiar as well Your first experience of kHGPT is probably going to be looking at the orth list Which gives you a kind of view on what are the local providers that you can use Secondarily to that you might want to see what kind of filters Are available oops So here you can see all the different types of resource that we scan at the moment and can make insights on This is kind of interesting as well because we have an integration with Tools like trivy that can bring their own custom resource definitions into the into the cluster, but also into the scan analysis So let's go ahead and just have a look and analyze without any kind of AI enrichment This was really key for me was that you should be able to get value out of this Even if you aren't connecting it to the AI back ends So what this is effectively doing is aggregating and simplifying pretty well known problems in this cluster When you apply the dash e command It will actually package these up. It was pretty quick there because I've cached some of this these responses So you can see we're immediately giving Some some ideas towards how do you simplify this? How do you then? Go about debugging and triaging it now I'll come on to some of the fall the fallacies and some of the shortcomings later on But immediately one of the things that this is Exciting for is because if you're not a systems Admin or you're not an expert and you've only been doing this a few years This immediately gives you a window or at least a path to try and follow to debug And so we're finding that a lot of the users of this tool people who are learning kubernetes for the first time And they kind of have this side by side, right? They literally they install kind They're doing a ksgbt analyzed. They're they're building their vs code Deployment putting it out there and they're running it kind of in that in that flywheel. So that's quite cool to see So so what we can then do is we can start to think about taking You know processing it in different ways we can put it into jason, but I think what would be quite cool For this group to look at is if we actually look at one of the error messages in particular you can see that We've got things such as container back off and Immediately because we're passing a very well-defined Window of context This is a pod that actually exists in the cluster and we can go have a look at what's going on So again, if I go to look at my foo, we can see that again It gave me the correct name in that pod and we can go off and see what's going on Equally if we want to do some security scanning, we can go integration list. We can turn on something like the trivia integration Now you've got a couple options here. This can either go fetch trivy and install it into the cluster Or you can use an existing integration We're also planning to build out a pre-prometheus integration and the interfaces are pretty simple as well What's interesting about this is that what you'll see also happens is in my filter list I now get this new vulnerability report integration That's an analyzer that gets pulled in by the integration and it knows how to categorize cvEs based on What that what their priority is in terms of their severity This is just an example of where you can start to bring additional functionality into the tooling We start to think about other things as well like having web service specific integrations So if you're running on azure or if you're running on eks, you could see some some specific stuff to that platform I'll just end this by showing you when we actually go back to the cluster And we can see that some scans have been run. Let's go and see if we get any difference in our analysis. So I just do another Analyze command It's always good when it's live because you never know how long it's going to take um at this point in time you can see that we've got Some configuration issues being found and we can see the severity as medium and so Even though this is fairly raw We're starting to be able to reduce that signal to noise for the user And hopefully as we evolve this as a as a project It we're going to improve the relevancy and make it so that there's a bit of a gilded path to fix things for people as they go along So I want to just recap there on how is this actually useful and what's what's what's the the the thing that I think is like the usp here So firstly it's faster debugging times right you as a as an sre as a sysadmin Whatever you might be you can debug quickly and you can solve those issues by looking at stuff that's come up before and seeing it codified Codifying that knowledge is a really integral part of it because you can add your own analyzers as well People can contribute those from the community saying hey you missed an analyzer for persistent volume claims where they're unattached And we see it all the time. It's a waste of time so people can contribute that and they have Also a consistency of diagnostics So actually having a repetitive process that can be tried several times in a row makes it Far easier to get to the root cause if you're analyzing it in the same way each time And finally lowering the bar for operators making it a little bit easier to be effective as I said sysadmins And other operators with a lot of experience that might be great, but for people who are just learning this stuff It's a really helpful thing to have So you might be asking yourself, you know, why about why don't you what about scanning logs? What about metrics? What about all of these other pieces of the puzzle that we haven't described yet in kubernetes? Well, that's where you get to this sort of ethical dilemma of What should I do to increase value and usefulness versus what's going to open up a lot of existential problems? these are some of the These are some of the key issues we see on github that come up time and time again, you know Due diligence around provenance of model data Accountability who owns my information is another good one as well as how can we make sure that things isn't aren't poisoned? If you if you were to try and summarize that I think a lot of it comes around So it's you know tiptoes around security privacy and perhaps ownership So I want to go back and look at one of the one of the integrations because this was sort of our immediate answer to it And you know kudos to the project. This is another open source project on github and What's really great is that it supports a bunch of different models, right? Whether it's gtp4, whether it's uh, you know lama What's nice about local ai is that this was our immediate solution to say ha well if you don't want to run your code through Through through open ai try this and so that's what we recommended This was the initial kind of architecture for that We had people who would run a local ai deployment in the cluster They would then serve Through this endpoint here and the khtpt deployment would pick it up and we had some pretty good success early on of course As you might know serving that endpoint is kind of heavy Especially in a small you know edge cluster or something that doesn't have a lot of oomph behind it equally and the storage layer Having these models locally isn't for everyone. So I wanted to make sure there was another solution that was Adequate that kind of overcame some of these hurdles Just about the same time titan came out and titan Is basically a catalog of high quality Foundation on models What's really interesting about titan is that it's starting to put filters in front of the models So that you can it has this thing called just ar responsibility so that There are certain things that won't return to you in a certain data It will scrub and this is appealing to some companies and individuals because they can control the kind of things That the model is likely to respond with and finally it has fine tuning built in And that was something that I think we'd been missing early on in these in this interaction with open ai I'd always felt somewhat frustrated the only variable I could change was the temperature or the top k So With that sort of in mind we we came back to the community and said look we've got open air We've got open ai we've got local ai and there's still a lot of people who are really curious about well How can we actually start to anonymize our data? How can we start to? Tokenize that data so that it's not so obvious what bank I work out or what what's a particular application I have In response we added anonymity into the api And so you'll see just from this little code snippet There's a a masking process that allows you to tokenize your Your prompts and the responses that come back and then it Unabufuscates them and prints them so that you can then Act on them with the knowledge that there's not been too much data leakage there It's not a perfect system mind you, but it certainly moves closer towards that ask I just wanted to post this Last slide with a few different statements there because These things might look a bit unfamiliar if you've not worked in ai ml, but You know, this is the one I think many of us have seen right and this effectively temperature Is a control on how creative the responses from the model are and so again one of our advice Pieces the community was to modify The the sample size and the temperature of your model But again, we were learning what the community were looking for and many people who started using ksgbt in the early days Was saying that open ai had a lot of hallucinations on particular types of analyzers If you think about it the corpus of data inside of open ai is about three years out of date So gateway api for example wouldn't be in there if you wrote an analyzer And this is where you'd have to start thinking about customer training or to have a model that was slightly more up to date So i'm trying to illustrate again that it wasn't that we had all the answers Is that we were we were attempting to rally around some pretty existential issues that were starting to crop up quite quickly in the project When we look at the project as a whole and where we're taking it There are things that we haven't solved and i'll be very clear about that right I'm not entirely sure Who is accountable for auto remediation If there's a failure or if there's a some sort of data breach of vulnerability Obviously, it sounds like the person who installed it on the cluster But then is it somebody who is responsible for the model catalog? Is it the operator that enabled it? We also look at things like should we be going more down the route of tasks specific ai rather than llms lms are appealing to us because we can get them off the shelf from You know places like hugging face But really we're going to have to rely on the quality of training and data And I think that's something that's going to hinder us quite honestly So I took this it took this to heart and this is also one of the reasons behind why we have so many other back-end providers Titan is one example But there are other companies like and fropic with claude who are trying to also solve this by building high quality a api So ai one of the criticisms you often see on the news is that a lot of these lms were trained on public information That not doesn't necessarily have the right to to draw upon And so I think many companies are now reevaluating that mind you it does cost a lot of money and take a long time So there's some big questions that are still open This kind of is where I wanted to get to in the last few minutes is that we're entering this period of time where We don't really have the answers for everything. I think that this was summarized quite well I was looking at the unesco website around ai ethics just before this talk and I thought that gabriella really succinctly kind of Put the world's rights in this in this passage here But this bit I thought was pretty scary is that the world is set to change pace not seen since the deployment of printing press I think that from everything I've taken away here at kubecom We're focusing very much on the initial feedback loop of ai models of inference as I said at the start of this talk There are already projects Such as lang chain such as a gpt agent that are looking at modifying the output of prompts You're going to get several layers of recursion that are occurring and at that point the abstraction layer becomes so complex It's very difficult for humans to intervene And so I'm trying to be very Thoughtful about what we do and don't decide to do with case gpt One of the early suggestions and options was that we could actually enable auto remediation in the cluster It's not that difficult to do because several of the major inference apis take functions As a prompt and can return to you the the output of those functions So there is a way to programmatically get there, but should we do a thing that's up for debate So I think what we really need to do as a collective is to come together to bridge the gap between the ai ml data scientist world and folks in the ai and data lf foundation and the cloud native computing foundation And all the projects that are within we have a great amount of opportunity Specifically with kubernetes to proliferate these projects very very quickly But back to that pandora's box photograph at the beginning sorry illustration at the beginning of the talk I think that once we do that it's going to be almost Impossible to pull it back in you know You think about something there are helm charts that you can install an operator with an lm packaged into it I was just in a conversation yesterday About actually making lm's into oci artefacts that gets you to the point where you can't Control the distribution anymore because they'll be in the wild So trying to counter this kind of gap of I guess I guess thought leadership and trying to bring people together I proposed This tag a while ago, and I'm really happy to say that there's been some amazing people who have stepped up and Helped us to build this into a into a working group. And this is a call for action here That we're spinning up a working group for artificial intelligence, and that's our slack sorry my My search light they didn't go very well And I would really invite you to come and share your thoughts You might be far more knowledgeable than myself on the subject. You might just be starting out but in every walk of life within the Underlying substrate of technology you'll find that your influence store at least can have some thoughts on the subject I also wanted to just share that we ran the first a i hub meeting yesterday in Conference, and there were some great sessions. You can see probably maybe not at the back But people were asking questions about how do you autoscale lm's on kates? How do you build ai for products? How do you train across different geographic locations? There are more questions than there are answers at this point in time And so I think what would be really powerful is that we can come together as a community to answer some of those In terms of the future, what do I see it holding for the kates tpt project? Well, I'm going to try and answer some of those ethical questions But I think one of the speakers in the keynotes this morning quoted Solomon hikes with regards to You know no is temporary and yes is forever It's very much the case the case with kates tpt is that I want to take it slowly and make sure we're thoughtful and provide value If all the ai back ends were turned off tomorrow the project still needs to have value So what I want to do is to Try and get this accepted into the cnc of sandbox because I'm not a company I can't maintain it on my own right and so What's great about this process as you may or may not know Is that you get access to a lot of contributors maintainers and community But it also sets the bar. I have to prove that it's an active project And so we've been trying to work towards this over the past few Months and I'm happy to say we're getting really close to that. I feel and the ai working group will be Certainly part of that process But aside from that I couldn't not scratch the itch Of trying to do full auto remediation by closing the loop And so I started a project recently which has started to yield some pretty exciting results This project is effectively only for aws Because I needed a model that didn't hallucinate at least to the level where I couldn't Or I could rely on What this project is starting to show is that if you connect the model To something like adb sdk and you can rely on the data source inside the model and you start to apply those changes You can create a full feedback loop And so what we're seeing is that This is potentially going to be the future direction Of a lot of DevOps tooling where we're starting to trust the outputs if we can trust the inputs to the model Isotope is very much in its alpha phase. I would invite you to Come and take a look at it I was a little bit obsessed with starship troopers when I was making this I apologize But it please there's a qr code. The the code is all in rust. It's it's pretty pretty quick It's pretty straightforward But what we're seeing already is as I said, this is yielding some interesting outcomes For example, when you have a public s3 bucket you run this it makes it private when you have An s and s q that's failed you run this it fixes the q it's scary where it's going to right? And so I think that's also where this idea of thoughtfulness And making sure that we're thinking Long and hard about the capabilities we're introducing is important So that said I wanted to thank you for your time and your patience And I would invite you to think about artificial turnips inside the cncf and bringing communities together If you have any questions, please let me know and get involved in the projects. Thank you