 Good morning. Just give it a couple more minutes or a few more minutes, so some more people to join. Okay, thanks for the link to the demo. So please set yourself as an attendee. All right, so it's 802, almost 803. So thank you everyone for joining. So today we have CORE and the project and Clive is here, and thank you for deciding to show us the project and what it's all about. Excited to learn about it. I think this is probably the first AI machine learning type of project that has presented in cigarette time, so we're pretty excited about it. Yeah, so go ahead and take it away. All right, cool. Yeah, great. I'll share my screen if I can. Let's see. Cool. Let me see if I can move this. Cool. Can you see that? Yeah. All right, cool. So great to see you. So I'm Clive. I'm CTO Selden, so I'm just going to give you an introduction to some of the projects that we work on, and yeah, good to get your feedback and see how it connects to the things that you're interested in. That would be an interesting thing for me. Great. So I'm just going to go through some sort of rationales of what Selden's trying to do, sort of set the landscape. So one of the things is this paper from Google from 2015, which is really setting the scene for what we were trying to do was that saying, you know, when you do ML code, you know, the data scientist sometimes think, or even the whole organization thinks it's just, you just got to write your little algorithm, and that's it. But actually, there's a whole set of other things surrounding it, and this was a paper by Google from their own internal analysis, and some of you might know it's got quite a lot of fame afterwards, where the size of the boxes, the amount of code they had to write for these other things surrounding the ML code. So there's a lot of technical debt that gets created in an organization in real Selden, and that was sort of inspired us to start doing what we're doing is that we wanted to solve those parts of the technical debt in terms of serving infrastructure, analysis tools, and monitoring of your machine learning, et cetera, and so really help our organizations in that area. We've seen a trend in recent years that the organizations are looking for best to breed parts for the whole ML lifecycle, you know, from initial data analysis, and food to the training, and then the serving, and that's really the direction we see that's taken, obviously also with the cloud native world and tools fitting into that, we'll discuss that a little bit. So that's one of the rationales for Selden to help organizations in that area for these things with their ML code, so that data scientists can focus on that part, the core ML code part. So another way of looking at what we do and sort of the issues that we find in organizations is one, you have the data scientists on one side and they've got their own set of tools that they know a lot about, you know, obviously all the training tools, and TensorFlow, Nvidia, Spark, et cetera, and then you've got on the other side, you've got the DevOps who've got their own tools in the cloud native world and all the tools they work at well, and there really is sort of quite a hard divide between the two, the data scientists don't really care sometimes too much about the DevOps, what the DevOps are doing, and the DevOps are scared about the machine learning and all this stuff which they don't really understand, it's not so simple as a normal app, they're going to just deploy onto their Kubernetes cluster, and so you really get this issue of these two teams working together, and that's also what we're trying to help, trying to put some DevOps into data science and data science, help them move this stuff out so they can get it into production to really speed up that time to get those machine learning projects from just being a project out into production and making a difference for companies and organizations. So that's another rationale, this one obviously should make a lot of sense to you guys, this is, you know, we're saying nothing new, you know, a person as a data scientist they've got toolkits, they want to really deploy their model scale and update it, you know, onto a compute cluster with CPUs, GPUs, and TPUs, and obviously we all know about the sort of stack that's been built up over the years from the container one times, then obviously Kubernetes is coming in to orchestrate more complex projects on top of those container one times, and then projects such as Istio for the service meshes, which allow you to do all the things that that allows you to do in terms of handling that sort of network management, and then interesting projects like Knative and other ones to really build on top of that for serverless, and there's that sort of gap in between that and what the data scientists need to do. And so really the issue is if you want to do use that stack, you've got all those things that I think we all know about it, but data scientists are probably less interested in, so I'm not going to go for the big list, doesn't think it will all be familiar as well, so Kubernetes, like aficionados here, but you know, it's quite a challenge for people to do that as data scientists or companies to get all these things when they're trying to get their machine learning out of the stack. So really, what we're trying to do here is with our projects, Seldom Core and KF Serving, is to build on this stack and sort of bridge that gap to our data scientists, sort of focus tool, which is sort of cloud native and allows them to get their machine learning into production. So that's another question now for us. And then one of the last things is also very important to us is the whole ethical side. So obviously there's a lot of interesting society of how AI is going to be used when it's put into production. And so that has built up a lot of momentum. It's obviously key for all the projects and companies that started to use AI and apply it to their customer bases. And then there's now beginning to be a lot of regulation in certain areas, you know, sort of GDPR in Europe and other regulations coming in, which need to be applied and trying to build up some sort of rules for how AI can be applied. But there's really still, there's still a space of how there is actually going to be how companies need to apply those regulations and what is the best practice when you actually put your machine learning in production, how do you know that it's actually not harming your user base and you know, it's really doing what it says and can be audited, etc. And so there's a gap there and that's also what we're trying to fill with some other tech. So this space is really emerging, you know, from how society is reacting at the base to AI as it affects them more. And then in the middle layer of companies and regulations being applied and how that's working, I think that's still being worked out, you know, how that's going to apply and then the tools at the top. But it's a space that we're in and we're trying to also help out there. And then finally, I just wanted to bring up one more sort of academic side is that obviously there's really big conferences every year on machine learning. One of the biggest is ICML. And the first the first time they had like a workshop ICML on challenges in deploying and monitoring machine learning systems. So it's really being viewed also in the academic world, what are the challenges when you try to put machine learning into production. And we actually were lucky enough to get two talks into that workshop or one on serverless inferencing on Kubernetes by myself. And then there was another talk on the work we're doing in Alibi for some monitoring and explainability of models in production. And there's a lot of research areas that are being applied in this area of what are challenges when you're actually deploying machine learning models. So I just wanted to sort of highlight that as an emerging area that's coming also from academia with a lot of research being done. So okay, that's the background. So if you understand what we're trying to solve at Southern and what's the projects that we actually work on to help to achieve some of those goals. This is our stack of projects that we work on. So starting in the layer in the middle, this is all open source in the middle. So the Southern core, which is probably the most mature project, we have four or five years, maybe I can't remember now, but it's got a lot of traction. That's going to into more detail. That's providing it like a abstraction, a custom resource value to define how you want to put your machine learning model and or whole inference graph out into production, managing it, updating it, scaling, et cetera. And then more recently with in the last year or so, we've been working with a group of companies on KF serving, which is building it's very similar aims. It was building on the stack of K native. So looking at serverless, how can serverless help in the area of machine learning deployment? So that's sort of the middle layer. And then at the bottom there, which feeds into that middle layer, a suite of projects that we've been working on to focus on some of the things you need to do. Once you've got your model out, which is obviously the core concern for a company, creating a data science model and putting it out there, then the things that surround it. So things like explanations. Why is the actual model giving the response? It's actually given try to understand that for various stakeholders, be it customers, be it auditors or the actual data scientists themselves who want to understand how the model is behaving. So I'll discuss a little bit about that. And then we have no project called alibi detect, which is focusing on the ways you need to monitor models when you put them into production. So things like drift detection, outlier detection. These are like adversarial detection and stuff like that. And these feed, this is also open source, you can find them. I'll give links to them on GitHub. And these feed into, and then these projects are buffed to actually allow you to actually deploy those models on communities and add these techniques surrounding your model. And then at the top, obviously, as a company, we need to make money. We have a project which is our core enterprise project, which is bringing this all together for companies to provide a full enterprise stack for machine learning, bringing it with this open source, like an open core company and feeding it in and then providing a full solution so companies can scale and manage their models. I'll give a quick glimpse of that at the end. I'll focus more on the open source. So going to a bit more detail on some of those projects. So we have sold and core. So what is that trying to do? That's basically allowing you to build up a whole graph of containerized components that are doing inference and put them together, allow them to be reusable in different projects and then manage that for you. So that graph is defined by a custom resource. So we have our own operator that's running that. And you define by that custom resource the various components, various customizations you want in terms of pod customizations and what the model is, what type of model it is, and how you want it to actually connect together. So you might define just a single model in your inference graph, or you might have much more complex things. We have customers, for instance, doing large inference graphs with things like multi-on bandits, which decide in real time based on the input traffic, which of the underlying models this particular request will go to. So you might add that in with a suite of different models and then you might tie in further things earlier in the inference chain, maybe some feature transformation that needs to be done before the request gets the model, maybe some transformations that need to be done after the request has come back from the model, then adding in other things like I discussed, like outline detection and explanation. So basically we allow you to define this whole graph in YAML, obviously YAML or JSON as a custom resource and then deploy it and then manage it and update it. So that really handles the core thing of what data scientists want to do, deploy, scale, update their model, and they do that by updating that custom resource with the various definitions. Then the next question I suppose is how do they get the core components to actually derive these sort of containerized boxes in their inference graph? There's really two ways that we provide. One is out-of-the-box machine learning servers. So all they need to do is say, okay, here's my artifact on S3 or Google Storage and then we'll fire up a, say, TensorFlow serving or Triton server for that artifact and manage it and tie it all into their inference graphs so that the actual API can be used through that. That's one way and that's very popular. That's obviously the most simplest way that they can use, sell them once they've trained a model and got that artifact onto some location, either on the cluster, as I say, or in the cloud in some bucket. Then the other way, which is surprisingly actually quite popular as well with lots of organizations, is if we have particular language SDKs. So in different languages, if you have custom code and what we allow is the data scientist just to focus on the prediction codes. Say in Python, we have a Python wrapper, as we call it, and they can just focus on the predict code of the Python wrapper and maybe some of the stuff to set up them all at the start. And then we'll manage that in terms of wrapping it up into a microservice, allow them to containerize it, add in the metrics, tracing and other parts so it can easily be slotted in here as part of the inference graph. And that's actually quite popular with a lot of our customers who have like custom code in different languages where people using Java, some people using R, there's actually a talk at the last KubeCon Europe about guys from NASDAQ using sold and core, and they've got some models in R. So we've various language wrappers that allow you to wrap up your code, so allow the data scientist just to focus on the machine learning code and then put it into the graph. Yeah, I have a question. So yeah, you mentioned the models on the right side, and they can be like artifacts in S3. But once those models are created, the models are created, are they loaded into memory or some other place where the inference actually can happen faster? Or is it still in S3 or is it still, or like maybe EBS storage or maybe typically some of these models may be kind of large, right? So yeah, that is quite a good point. So what we allow people to do is define a location where it's from, as you say, S3, and then the actual model server will download the model from S3 onto the local volume and then run it in memory. So there is still work to be done in terms of very large organizations that maybe have lots of using the same model from different locations, how you can use caching, and we're looking into that sort of caching layer that sits between S3 and the actual local model server, which will have the model in memory. But at present, we provide sort of standard downloaders, like a in-it container that runs once the model server starts up. It has the in-it container as given the location of the model and it understands how to talk to S3 or GCS, et cetera, and then it downloads it locally. And then the server starts in the main pod and reads from that sort of local volume and reads the model into a sort of memory. Got it. Cool. Thank you. Cool. And so one other thing we add as part of what we do is a service orchestrator. So obviously, you just define this graph and you don't need to decide how this graph connects together. Sorry, that's not right. You can define how the graph connects together but will manage the sort of request and response flow through the graph. So we add in a component which you don't see here, which is the service orchestrator, which is going to take the request and manage that flow. It's going to say, okay, first I need to call, say, the feature transformer. And then once the response comes back for that, they've defined this to go to a multi-unbanded and I'll send the response to that. And then the multi-unbanded says, okay, I want to send to say model A and in this case, I'll send the request to model A, the model A responds and it will send it back for out the graph. So that's an extra component, a sort of sidecar that we add in to the graphs. But apart from that, they've got complete flexibility so they can define parts of these to be in certain pods that scale in a sort of different way. So to say the feature transformations could have like a HPA that scales in one way and the models can be on like another pod that's going to scale on a sort of different matrix. You've got a lot of flexibility in how you define it as well. I mean then in terms of how people use it, so in terms of what we've done and in terms of our sort of life cycle, we've focused initially more on so RPC use cases of real-time machine learning inference and that's probably how most of our customers come to us and use us at present. But also what we're seeing is customers do want like a unified solution. So once they've requested the model, they don't just want to expose it via RPC or they want to also send as a batch request to it or use a streaming but I say Kafka or Knative and all that same that. So we allow them to do all free basically and it's very easy to use the same components irrespective of how they're going to sort of consume that model in their organization. So that's sold and core and I'll just go on if there's no questions. Sorry, I don't know if you can hear me though. Sorry it's a bit low volume but yeah. You mentioned something about containerization of their model initially. Is that part of what you provide as well or do they they are assumed to have used S2I or whatever when they? Yeah so that's also a good question. So we provide in our docs examples of how to use S2I and we have actually S2I builder to allow them to easily use S2I that brings in the appropriate sort of dependencies say in Python and stuff to actually wrap the model easily. But we also provide them docs for how they can just use a standard so Docker file to actually create their model and we have people using different methods because obviously not everybody wants to use this S2I. There's some people like it some people don't and there's obviously many ways of building that container. So we want to be quite agnostic to that so yeah there's not like a unified solution that's that we provide them with different sort of resources to really create that container in the way that they want to. Okay I was just trying to figure out the scope of it. Okay great. Okay cool. So that's sort of cool. So as I said there's never an area that we work on which is sort of looking at the things you need to surround your model with and that's two projects that we work on. One is AlibiExplain and one AlibiDetect. So AlibiExplain is looking at explanations so looking at once your model is out there how if you get a particular prediction back why did the model give the actual prediction? We have different techniques for this both black box and white box. So black box has advantage it just purely treats the model as it says as a black box and you just talk to the model over an API. So the big advantage of that is you don't care how it was created what technique. It could be like a deep neural network. It could be like a tree based model. It could be just a sort of simple linear regression. It doesn't matter because all the techniques do is just query the model many many times normally by changing some of the inputs from the initial input and trying to understand how the model is actually responding to those slightly slight changes in the input. From that it builds up a picture of how of what the model is taking into consideration and then it can give like a human understandable response. So that has a lot of interest and especially in organizations who want to keep it quite separate. So the team that they built the model they can keep it completely separate from the team that needs to explain it. It is obviously also more challenging though because it's treating as I said purely as a black box so you're just talking over an API to it. So we also have white box explanations which is focused on if you know how the model was created so you actually have access to the model weights and if it's a neural network you have access to the Keras saved model then you can load that saved Keras model and then you can look at the weights and you can do analysis of that. Again probably doing various techniques to actually understand how it's working or if you've got like a tree-based model or you can actually load that tree-based model and try to understand it and give an explanation for this particular prediction. So each has both their pros and cons and it's not like a single way to do it. And also just to quickly jump into the Alibi project just to illustrate the sort of different ways of looking at it. We have a large number of different state-of-the-art techniques and the key thing is those techniques give different ways of viewing those explanations. Also they have different focuses on the type of input data so it could be that some techniques work with classification. Actually most of them here work with classification but some are more focused on regression and also some are more focused on certain types of input data. So if your data is just tabular then fine and also if it has sort of categorical variables you know is it some work better or worse with that. Some work focus more on text, some on images and other features like that. So there's not really one way of solving the explanation question in terms of machine learning models and also the way these models give their responses is also quite different and so some are suited more for data scientists where I say some like sort of counterfactuals are maybe suited more for the actual customers. So for example counterfactuals would tell you what do you need to change. So if you have like a loan system say that sort of rejected your loan it's like an automated system and if you use the counterfactual explanation technique then it could tell you what do you as a customer need to change for this system to change its mind and actually give you the actual loan and so it's easier to understand that way whereas other techniques might help the actual data scientists understand what features are being taken into account in terms of how the models. So they all have their pros and cons and we need to work with the organization that's heard into which one would be most appropriate for the actual models being put out. This is also quite an active research area and also the other challenge is there's also quite complex so you need to train some of these on the actual training data. I'm saying even though some of them are black box so they don't care about what technique you use to train your model some of them need to understand the training data so if they're going to actually sort of perturb the input then they need to know say if it's an age if this feature is an age then it only makes sense to sort of perturb it between say one and 110 or something it's not putting in the value of 1000 and seeing how the model behaves because it's going to give you strange results. So there's certain you know things you need to look at in terms of that. I have a question so yeah this is very interesting so um do you actually integrate with some of the rafting tools I would imagine like some people want to understand you know some of how these models actually behave so for example like something like Tableau right so people want to see how people are using the model or how or how it's making its decisions right so oh yeah I'm absolutely I think there's quite a challenge there I mean it's one of the large challenges and so explanations is really how you show that data to the user and how you tie it back into the actual models being used and actually maybe I can quickly sort of dip into our enterprise part just to show you the explanations in action hopefully I can just give a quick demo so this is one model this is our like a top-level viewing panel cell and deploy which shows the different models you've got running I'm not going to go into too much detail here obviously we're focusing more on the open source but this will help hopefully answer your question so I've got one model here which is the sort of incomes of loan categorization model which is based on making predictions based on sort of demographic features is actually a standard sort of data set we have it's actually taken from the US census in 1986 so stands for open data set so we've got various demographic features and it will predict whether the person's going to have high or low income so we can take a particular example send it to the model so it's a like a binary classifier so it's saying this person is 86% chance of having low income and because we've added an explainer to this model it can basically try to investigate that model and understand why so this is using a technique called anchors so we've got the core demographic features here and basically this is showing that this model is quite focused on two it's saying mountain status separated and sex equals female so it's saying 95% of the time if you just had those two features the model would low income so obviously this is key because this is saying what is this acceptable to put this model out because it looks like it's quite biased in terms of gender and so it may be that you this is completely acceptable to put this model out you need to get more or it may be that you need to get more data for various sections of your model so I mean this technique anchors which is the technique I'm showing here really gives you the core anchors that the model is using but you can set a threshold in this case it's 90% so try to look for features obviously there's a lot of features here you know there's features on what gets white colored the race and other things where they are United States their age and other things like that and it's trying to find core features that from that initial request that we're trying to explain that really made the model go in one direction and here it seems to be mostly mountain status but because you set the confidence to be at least 90% of the time it should be true this answer this explanation then it tried it went further and tried to find one more feature which in this case was the gender feature so is the underlying model here is it you know random forest is it linear regression how do we see this yeah this model is actually just a very simple I think it's some linear regression and this technique is black box so it doesn't actually as I say depend on the actual underlying model but it's very simple model just illustrates that it's quite biased and actually this data set from the sense of state is actually very so unbalanced there's actually only one percent of people who have high income so most people have low income in the data center it produces very biased values this is where we're using it as an illustration of things you need to do to of how you can use explanations and it can help you understand your model more and then obviously get early indicators of some of these things okay so like if you have linear regression and you are asking for a year that's outside the model it might tell you look this is outside the range and so for the linear regression is invalid or something like that um yeah I suppose that's I mean that's slightly related actually this technique is actually needs to have access to the training data so it can understand the range of I think this is the I'm not sure which one is the age age thing but it needs to understand that if it's say in what it was doing it would have probably sent several hundred requests to the model it would probably change this feature a little bit it's 65 you know 55 and stuff so it needs to know that there's no point doing 600 because that's going to give you know strange results won't help the explanation so it it sort of understands those aspects of the actual training data and then it fires off lots of requests to get this answer okay so it's not really rule-based stuff uh-oh you're doing linear regression in the wrong way or something it's not going to yeah no it's not like that it's really just as I say treating the model as a black box and trying to understand but by um understanding how the model is behaving just around this particular value so it's changing features and seeing if the model changes its opinion and um what are the core things that make it focus on that result very cool yeah yeah yeah it's it's cool sort of techniques and so here down here we just give access to the actual training data so you can see examples which actually fit that explanation and then there should also be some examples which don't fit that so that these are people who have those features separated and female but have high income as you can see okay maybe I need to expand out my training set by getting more of these you know to sort of label these and stuff and you know it just helps the data scientists this technique is obviously as I said you need to be very understanding of the actual stakeholder that needs to look at this if you gave this to a customer I think would probably be very confusing to them so this is a technique that's probably more focused on the data scientists or maybe the auditor as opposed to some of the other techniques yeah just one comment that I think this is a great feature I mean some of the the things that I've been hearing about some of the models is is that they're biased and that's this is a great way to see that you know they're biased and then maybe telling some of the data scientists that they need to change their models right or they need to in certain ways so they are very less biased things yeah absolutely so it can be used at various different stages of the machine life cycle so it can be used in sort of development and sort of so auditing but also obviously in production as well that you know you might get requests maybe this is gone live let's not say hey well was I you know rejected you know for a loan and then we say oh look this is these are the reasons the models look here you know and so you know you can get early get early warning of that I'll just show you one more this is like a different model this is a sort of text-based explanations this is a model that takes a movie reviews and tries to decide whether it's a positive one or a negative sentiment so hopefully I can find an example and I'll predict and I'll explain that example and we can have a look at so this is a movie review so visually exquisite but narratively opaque and emotionally vapid experience of style and mystification so obviously that's some negative and so what this explanation here in the text thing is saying what are the words that the model was looking at to really make this sort of negative and obviously it's highlighting these two emotionally vapid as the two core things so it sort of shows like another angle that you need to have different explanation techniques for different types of data and and how those work for different types of what should be different so you know so obviously this is different you know to the tabular case and it's sort of highlighting things in like a different way but it's actually using the same sort of technique but for text-based data rather than sort of tabular this is sort of anchor text because it's called an anchor tabular technique cool so yeah so that's alibi explain and then we also have alibi the tech which is looking at so more the monitoring side so looking at outliers obviously because you don't want to actually give responses from your model if it's an actual outlier because you know it's quite likely the model is going to give you the very strange results if you have an outlier you know you you should probably throw that result away from the model then other things like adversarial attack detectors and there's also an area that we do research on obviously it's quite niche there's you know obviously particular areas that it's important for and this is an example taken you know from sort of a traffic sign detector so it's just a classic one you know where you've got a stop sign here and the model saying stop great but if you just attack it in certain ways by adding a few pixels it's still pretty much looks like a stop sign to us but actually the model gives a completely different answer and the same for these apotrophic signs always wondered whether stop sign was blue i think it's slightly taken from a german data search i don't know maybe stop signs are blue in germany i'm not sure um see is there a new stop they're in germany yeah that's true yeah so so that's um we do that we also look at things like drift detectors which tell you when do i need to retrain my model so it's the input distribution the model's saying completely different to what it was trained on because again you it's probably going to be giving bad answers you probably need to retrain it on on the new type of data and it's the same sort of thing so this is the alibi detector github repo and again we there's a suite of different techniques again based on some of the different um sort of um modalities and and sort of decision points you know is it is your data tabular image time series text it doesn't have categorical features you know do you want to do online outlets section do you want to have outlines at the particular feature level you know so for like a sort of image the feature level would be sort of the level of pixels rather than the whole image as you need to make decisions based on that and similar for the other sort of things abasile detection and uh a sort of drift detection that's obviously the key is that we we do this data science then we bring it back into seldom core so you can then deploy your model using seldom core and then you can add these explainers etc um explainers outlasts textures to your model to give the things surrounding it and that's obviously the goal to allow allow these organizations to do that um so cool so i'll go on to another project we welcome which is care serving so this is like focused on using some of the things from say k-native so we're focusing on scale to zero because obviously you know things like uh GPUs and other aspects of machine learning inference are quite costly so if you if it's if you got the model it's not being used wouldn't it be great just to get rid of the infrastructure from your Kubernetes cluster so i mean it's really great stuff they're doing k-native and so this is building on top of that everything is we're looking at gpu autoscaling in this project because actually gpu autoscaling is is quite a challenge i think i'd talk about it in the next slide just how that how k-native solves that and also what we're trying to do we actually founded this project with some very large partners google bloomberg microsoft and ibm and we're trying to really create some standards for machine learning inference and feed them across the whole industry so one of the things we've done is is created like a standard protocol for machine learning inference and we're starting to get people from these organizations to actually buy in to actually using it so obviously that takes time as all standards do but it's it's an interesting direction that's part of what the project was created for this project is part of q-flow so q-flow is actually you've heard of it it's like a sort of ecosystem of machine learning projects right from sort of data analysis training at scale and some hyper parameter optimization and serving and so this part of that so it's like a great sort of location to join together with these other projects in this ecosystem to work on machine learning on top of kubernetes to just focus in one thing in care serving that we solve so one thing that's really difficult is gpu autoscaling why it's difficult because when you've got gpu so models using gpu you've got various metrics you'll have metrics from the actual server using the cpu and you'll have metrics from the actual gpu itself and also those gpu metrics are sometimes hard to get from your kubernetes cluster and it's also then harder to combine all those into a single rule if you want to do autoscaling you know if you want to say okay my cpu or my server gets over this amount and my so gpu stats say this then I scale up and it's actually quite hard for people to do so I'm so luckily we can simplify that by using the ideas from k native which just are going to basically look at the number of in-flight requests going to your actual pods in this case machine learning obviously k native is quite generic so all you have to do is say what for these pods what is the amount of concurrent requests they can manage maybe this model server can handle 10 requests or maybe it's just one and so basically k native takes that into account it actually uses various sidecars and stuff it puts in like q proxies to understand how many requests are in flight and then from that it can take those stats from that and how many requests are coming in to actually decide should I scale up or should I scale down and so it makes it much easier for the actual user in terms of machine learning who sits at the top who just wants to have their thing scale automatically not really to do much all they need to do is say how many requests can I serve my actual model server serve at the same time and it really solves that obviously there's never great things in k native they if like there's a burst of requests then they get stored in a component called the activator before they get pushed on to the actual replicas when they've scaled up so you don't get like too many requests hitting your model at the same time so it's really interesting and we're trying to build on top of that technology for machine learning interest one question here is so when it comes to handling many many requests we're talking about many millions of requests so so the standard practice is to keep share storage you know for these for these spots or and so they can they can actually do the serving from that storage I mean if you if you have a really big model because spinning up parts for every single request and then creating us you know models storage for the model you know every for every single part then we take a lot of time right so yeah absolutely and that's a very good point you've actually I don't think I have it on these slides but I gave a talk at the ICML on just on kf serving and so one of the slides was the challenges presently with with using it and that is exactly what you said the challenges are that once you scale up you've got that big lag time so you've got all your requests waiting for this replica to start and so yeah I mean that is a clear challenge and actually even some people if so at the moment the way to solve that is is to where there's nobody many ways at present there off to solve that but one is to have your model locally so it doesn't have to be downloaded so you should get rid of some of the network time but even then you're still going to have time for the model server to start up or the other thing people do is just to get rid of the scale to zero and just say okay I want to have at least a certain number of replicas but then that's obviously getting rid of the whole point of k native so yeah it's definitely an open challenge and that is something we're actually looking at as part of kf serving or so with the k native community you know how can that be solved yeah got it thank you yes you put your your your finger on the one of the key points uh cool um so yeah so I've sort of showed you some deploy which is our sort of closed source architecture and that's really bringing everything together so it's it's it's it's allowing you to use core and kf serving and I love I explain the detect and tying it all up with standard components things like so things like githops which were like a big believer in which if you don't know absolutely you do that everything gets stored onto source control before it's it's put into the cluster so you've got your sort of declarative representations you know as you define a model in deploy it's pushed out to get up a bit bucket comes back on using things like argocd um which really allows you to give that full audit trail stuff like that and we tie it together with um you know metrics and elastic stack and and off levels with decks and key cloaks tying to L there where you've got enterprise api so that's how we tie everything together for like enterprise customers and obviously that's quite key so like you know in terms of um some of the models I've got like a model running here um where you get all the stats but the the key thing is just to highlight the githops you've got that full um you can go to see what all the actual actions you've done on the model because it's all stored in in github every everything you did see all the canaries you created or have a waste updated and you can see what's the difference between that and any sort of documentation and obviously if you feel there's a mistake you can go back to a particular point in this little chain of of your github repo and just restore to that state if you wish so it really allows to do that so yeah that's what we're really doing um cool so that's the enterprise product and that's pretty much my last slide I just wanted to finish on some things for the future and things which may interest you I'm not sure but um so some things which have come yeah last slide sorry the one that had argo yep someone is already using tecton you know instead of argo can they should they just you and they wanted to use seldan would they have argo and seldan on their system or like can you replace argo in that picture with tecton yeah this is um so actually something slightly different this is I'm like another project in like the argo ecosystem called argo cd so it's it's all about sort of um so githubs and sort of syncing things from a sort of source control onto the cluster okay okay now this is my point I got it yeah yeah thank you it's but it is actually confusing because like actually when I created this slide I couldn't find there wasn't any logo for argo cd so I just stole the actual argo logos this is why like exactly what you said happens people think it's just argo yeah so I think the argo cd people need to get a logo um so yeah some sort of final um points so some things we're looking at so so GPU sharing is comes up quite a lot in terms of you know because you've got these very costly um GPUs and you might have say with some of our partners like um some bloomberg and kf server you've got thousands of models which are say very slightly different and then all skykit learn model sale or something and and what they need to do is really share on a single server all those models to sort of decrease cost and also some of them might not be used be used very much so that's a definitely a challenge and actually part of the kf server project we're looking at um extension of that project to do so multi-model serving is what we call it to allow you to have an multiple models on a server and sort of pack them in um it's one server um and obviously there's other things we're looking at like volcano which has a GPU scheduler to allow you to share GPUs and stuff so it's very early stages and I think the actual volcano GPU schedule is also very much in alpha there's interesting stuff there that are coming right from our customers that we need to look at um stuff like edge i'm sorry that's like a company and also the code that you've seen we're not really sort of edge focused at the moment but that's definitely a air which which obviously we all know is growing and um it'd be great to get your feedback of what you're seeing from other people who have probably presented because i saw you had some cube edge presenting some weeks ago so it'd be interesting to get your feedback on that point actually rather than me i think we've we've certainly seen customers um in that area definitely um and so general model optimization system we're looking at into the future so you know so customers would sometimes want to just give us the model it could be just a TensorFlow model but then we can do various optimizations optimize it for various endpoints it could be for edge you know sort of and or have a way to optimizing the model you know perhaps take a skykit layer model and turn it into one that can be run on a gpu and stuff and so there's lots of interesting stuff there that we're looking at and then just one the final thing is to shout out about care serving and sort of general machine learning data plane we're doing so as you said we've created a general um machine learning protocol uh which was and is then going to be supported by various people in the industries it's supported by nvidia triton infant server present self and self care serving and and then hopefully in the near future torch server facebook doing some work there so that's an interesting development of cool so hopefully i haven't taken up too much time it's probably longer than i thought but they're happy to open up a discussion what any points thank you for the presentation it's it's very interesting space and yeah and you know i think a lot of a lot of people are moving more towards um you know having more of a cicd type of system where you can have like you know have the data scientists create some of these models and uh have some of the the cluster operators so the DevOps people in an organization handle some of the serving parts so so i think this kind of fits in in there to to fill that gap yeah absolutely yeah thanks yeah i think it there's quite a big need definitely it's hopefully like the slides at the start showed and so yeah definitely so what question that that i have that i is a seldom team um looking at maybe having one of the projects join the cncf for i'll be in part of the cncf uh yeah i'm certainly we are um thinking about it in terms of several projects but we can't say which but yeah we're sounding in talks with cncf and the guys at lfa i so yeah i think that it's probably a direction we'll probably think about for some of the projects definitely great anybody has any questions that don't want to be the only one to ask most of the questions i had a question about the kf serving it's part of kuflo yeah for other uh components that are also part of kuflo or is that that is i somehow had the idea that all of seldom was in kuflo but i guess that's not that's or all the open source parts i guess that's not right yeah yeah so seldom is like separate from kuflo but we have integrations in kuflo so if people want to use seldom core in kuflo they can and also the the project kf serving is actually in kuflo so yeah so it's a bit confusing so uh so kf serving is part of the kuflo so domain and that project's inside kuflo for serving so it's all both for the two projects one is outside but you can use it and one's inside being developed on there so if um seldom became part of cncf then kuflo might do that separately and they just interact together that would be the way to do that um yeah i suppose it's quite separate i suppose kuflo because it's so under so google i think probably it's up to them i think there's actually an open discussion in in kuflo of how the individual projects whether what the governance would be whether individual projects could then move into cncf or lfai or how that would but i think google is interested in keeping it together at the moment okay thanks yeah i think uh with cube edge there's also a good fit there you can maybe cube edge can be used to you know manage some of these workloads and then you know seldom can be used maybe to send the these workloads or the or the models and the serving mechanism over to the edge right where maybe you want to have faster response time and i've seen some of the use cases where you know you you do some of that um inference uh on say stop sign or or maybe license plates and uh at a tall booth for example right so yeah absolutely i mean it's it's not a um but i know too much about cube edge but it's something that's part of our sort of research path to see how we can get closer to those guys and see how we can collaborate yeah any other comments questions once twice got some people on the call that are very quiet well thank you very much okay no thanks thanks sir all the insights and yeah um let's keep in touch absolutely absolutely yeah thanks for having me here appreciate it all right thank you all