 Greetings, everyone. Welcome to the Project Knative Maintainer session, where you are not going to hear from maintainers this time around how great Knative is, but hear about the stories how Knative fares in the real world. I am Naina Singh. I am Knative Steering Committee member, and today I have with me four guest speakers who are going to talk about how they are using Knative in the world. So if you attended one of my previous sessions, you will think that I'm repeating myself, but I think it is worth repeating that Knative is more than serverless. Now, what do I mean by that? In Kubernetes, to create and deploy a service, you have a lot of constructs that you need to keep take care of, to create, to configure, to deploy. What Knative does, it takes away the cognitive toil that you need to have, because you run a command on a container. It gives you a ready to use URL with Auto-TLS and also provide you auto-scaling based on demand with infamous scale back to zero. And that makes it a serverless platform for Kubernetes, as well as a simplified Kubernetes for application developers. But that's not all. It is also event-driven platform for Kubernetes because it provides you the eventing infrastructure for all your event-driven needs. And with one more, what I want to say is if I can reduce it to tagline, it is Knative by default, and Kubernetes when you must. So today, we will learn from our guest speakers their use cases, what scenarios they are using Knative for, and why Knative. And with that, I am going to give it to my first guest speaker to kick it off. So I'm Andrew Sinatar. And I work for CoreWave, who's a GPU-focused cloud provider, so HPC rendering, sort of things like that. And we use Knative for a managed service that allows our customers to run serverless-style workloads. Most of our customers are using Knative to serve large language models and image generation. And one of the reasons they like to use these is because of the ability to scale up and down, scale to zero. These are very big features that they want. Some of them deploy a lot of fine tunes that don't get a lot of usage. So the scale zero is pretty important. So Knative serving provides a very simplified deployment for them, and it handles the scaling, the ingress, everything, all in one. You can set this up, and we have seen some customers do this manually with, like, K to other scaler and whatnot. But this is so much easier for them to manage and use to get the one CRD. So that's a really big driver for the Knative usage there. And we have a lot of clients using it. We have thousands of K services over with lots of pods. So I'm going to talk about a few challenges we've had through our usage. We've actually had a really overall good experience. But there's a few things that we've kind of learned from it. And I just want to highlight some of those. So the activators for K to serving, we've chosen to manually scale them versus the HPA. We found in our large cluster with our thousands of services that that extra kind of churn of the endpoints can cause less than optimal behavior. We've seen this in the past from other speakers as well. We've increased our activator capacity again. Per activator, because we've found a lot of our services have very cyclical views. And that helps it churn less on the endpoints. We have a lot of dashboards to monitor the health of our cluster there to make sure it's performing well. We run a dual ingress of Knative. So we run both Courier and Istio. Courier is great for what we need. Istio was kind of used at the time before Courier existed. It's a little heavy weight for our usage. And we've seen better scaling of Courier. That was something that kind of came through. We've had some bugs in the past with the activators, not the tech and pod readiness and whatnot. We've worked through those patch now. And occasionally very large clusters you can run into some slowness here and there. That is kind of the manage. There's some things we've done to improve that areas where there's a slowness but overall it's been very robust on the control plane side. On people using K services, we have a really good experience with Knative but not all our customers are experts in deploying Knative services. So some frequent things we see is a lot of our customers especially with large language models, they tend to have very large containers. They tend to put their models in their containers. They pull their models from hugging face. And they expect all this to scale up and down quickly. It doesn't. So we've done a lot of training and whatnot with our customers helping them go from long container start times well over five minutes down to 30 second start times or less. Our infrastructure is set up to kind of optimize some things. So we kind of steer them towards how we have things in place there. Other things, again, not every customer is an expert at Knative or how to configure the service. So we've seen like incorrect container concurrency, LLMs, a lot of times you have one GPU assigned to your container, you can handle one request at a time. If you set your container concurrency to zero or 40 in Knative, you will have problems because you'll queue in the queue proxy and a lot of your requests will time out. And then you'll come to us complaining why do we have all these failures? And I'll say, well, you kind of misconfigured things. So we've seen a lot of things there. So sometimes a lot of the like challenges we face with Knative aren't really challenges with Knative thing. It's a lot of the understanding of our customers and whatnot. So we've got a lot of docs for our internal support people to help our customers through things as well as be a public docs on our doc site for best practices for inference and serving. So that's kind of how, that's where a lot of the focus has been with like I call issues and problems. They're not really issues and problems. They're just getting the right understanding on how to use the tool to do the right job. So I'm going to hand it over to the next guy. Yeah, thank you. So my name is Noris Samosaranko. I work for SVA, a German based IT infrastructure and consultancy company. And yeah, I'm an architect, mainly doubling currently an opening of Kubernetes and distributed systems in the public sector. And we are using Knative mainly for the eventing component because we have clients who have like multiple departments and they have many monoliths which they are not transforming into cloud native applications. And this provides a big overhead and a communication problem inside the organization. So we have chosen to use Knative eventing to like decouple the applications and provide like a platform, software as a service experience to the different departments that are residing in the client side. One problem that we of course face is that we cannot use something that is SaaS and cloud based because we have high compliance standards. So we have to choose something that is very standardized, has open standards and can last maybe decades. Or we can think it can last decades at least. That's what we hope for. And we came to the conclusion that Knative is here the best fit. It uses like open standards, like cloud events. And the interfaces and the architecture is very stable and good design. So the end users that we will deploy in Knative too, they just have to interact with very simple and straightforward interfaces and can rely that they always work irregardless if they're having a developer experience inside the cloud for just development and then going into high side where it's air gap and secured. So the goal in essence is creating a big magic event mesh as a software as a service that we can deploy on-prem and the users can interact with through the broken trigger model. On the right side, on the picture you can see here, it's like a high level overview of what we are doing. So we have multiple Kubernetes clusters and we are putting a load balance on front of these clusters. And through the load balance, we address one of our Knative brokers to then ingest the events that are coming to the system. And through that Knative broker, we call them the cluster Knative broker. We then load balance these or forward these events to the other clusters which we then call the follower clusters. And from there, the projects then get their namespaces. They can apply for a namespace or departments and they can just publish their workload applications into their namespaces and then also consume the Knative services. So for example, the broker trigger model which is backed by Apache Kafka in our case. We have some requirements in regards to that which are specific to the public sector which is like we have to, we have some environments which are low side, we have some environments with a high side. So air gap environments and specific compliance standards. We have to agree to firewalls, a huge issue, custom certificate authorities and multiple clusters, many stages, different data centers which are also secured in a weird way which are not usual in the open market. So this is all things we had to consider when deploying Knative at the customer side. Challenges we saw during the build up and the implementation were mainly with the persistence. So at least back then in the documentation, the documentation and the details about how the persistence and EG for example, Kafka or RabbitMQ is implemented into Knative wasn't as straightforward as it is now. So the documentation is much more better now in that regard, but we had to figure out a lot of things on the go. And that's the second point about the SPARS docs. There were some issues we had to find out through looking to the GitHub code, but which is now covered as well in the documentation. And for our case, for example, is the custom certificate authority which we had to inject into all the pods that call like consumers. Because our services that we call our deployments, they all have custom certificates included into their image. On the user side and user side, one challenge is we are working in the public sector. You can imagine, pay is not that good. So the engineers that are there are not really the engineers that would be confronted with cloud native computing to its solutions. They usually have like multiple years of experience at that client and they're used in their stacks. So it's maybe .NET, Java, and not much out of that because getting new tools into that environment is kind of tricky. So the onboarding part was a big effort on our side. We had to re-document everything. We had to go through a lot of workshops with the users and as well the architectural parts like item potency, at least once delivery. We had to explain those concepts very much, very often in detail so that the users understand what they're really doing. We had to create a default profiler dashboard which is currently not included in the upstream project because this then allows the consumers to really see on one page what is happening in their system and to end and identify, for example, issues in their system and then go to Yeager to trace them down. And on the GitHub side, we had some issues, especially in the administration part, where we deploy a lot of like broken triggers and when we update the cluster and all the departments are already deploying their stuff while that is happening, the reconciler sometimes gets some hiccups and yeah, it leads to weird issues, let's say like that. So we are still working to finding out what exactly kind of issues we are getting but yeah, that's something that we are still not sure what is the reason on that end. Error messages as well are sometimes not that descriptive so especially in the controller level, the control plane, there might be some error messages you might see if you use it in production where you really have to drill down and sometimes also go to the GitHub Go code to figure out what is happening here because you get the code line, the exception but the error itself is not descriptive, something like context that like exceeded doesn't tell you much. Yeah, and that's kind of the challenges we had. We all faced those challenges and also found a fix to this but that's just something to keep in mind. On the usage side, Knative is very easy to use and it has simple interfaces. On the administrator side, you have to be very knowledgeable in all the systems you are using be it Kafka, RabbitMQ Knative itself, Kubernetes, how Kubernetes works, the programming languages that are being used into context of the customer. So it's very, I guess, you have to have a broad skill set if you want to deploy it as an admin. A recommendation from my side is load and chaos tests your Knative deployment very from the start. So I use something like chaos mesh use something like K6 or what we used was a hyperfoil, I think we use it to test it and get some histograms to see how it works in your system. Do some backup of the envelope math to figure out what is your throughput with your Kafka so that you get a realistic estimate of the boundaries you were going to push. If you are in a regulated environment like public sector compliance, think about that in the beginning, not at the end or you will face problems. And yeah, consider self-service. So enable the users that they can do it themselves and just help them on the knowledge part. The value of experience is key. That is what we found out. So make it even easier than the normal docs and tools are doing it. Create something that is unique to the customer experience or the environment that you're working in. Keep an issue lock about all the infrastructure things you're finding. We have a structure for that. We always document root cause, assumption, solution and ticket link. Optimize the workflow that you have shared responsibilities and create a unified observability plan above all your services, especially in the event part because you are then working with distributed systems. So that is the recommendations I can give. And yeah, give it a go, try, try out Knative. So far for our use case, we are very happy with it. And we are surprised that we could set up something like this in a very highly regulated environment. So yeah, that's for my part. Thank you. There is a use case study on this Knative eventing on CNCF website. This is the QR code for that. And I'll upload the slide deck later. So do check it out. All right, so I guess I'm next. So my name is Ricardo, I'm a company engineer at CERN. I'm also in the TOC and the newly formed technical advisor board of the CNCF. I'll give like, I guess everyone knows more or less what CERN is about, but basically we are a large physics laboratory and we have large requirements in terms of data storage and also analyzing processing the data. So we keep looking at all sorts of technologies that can help us today and in what's coming in a few years as well. So we've been using Knative for inference for quite a while, also via Kubeflow. We have strong requirements in terms of machine learning that are speeding up pretty quickly. I'm sure everyone has been hearing about JNAI LLMs, but actually there's a lot more around machine learning that was there more than a year ago, even more than 10 years ago. So these things are just becoming more relevant, but they've been there for a while. So the requirements we have for this sort of services that we run are integration with GPUs. This has been a big thing since a while. And also the integration with better ways to improve GPU utilization and efficiency, concurrency, things like NVIDIA, time slicing, MPS, make all these sort of things. And for us it's very important to have a platform that gives us easy access to sort of things like the NVIDIA GPU operator, which we already use elsewhere as well. So in the right side, you see a picture of multiple models being served. The fact that we can actually use even different backends for the serving is quite relevant. Knative also gives us an easy way to manage those services, versioning rollouts, rolling back as well, giving different endpoints to people as well, auto-scaling, and I'll talk a bit about that. And this is what we've been using basically in production for a while. There are other use cases that pop up that are quite interesting. So I'll talk about two of them. The bottom left one is about the, basically the key use case we have at CERN, which is when the data comes in, we get what we call raw data and we store it in a backend. In this case, I put CIFS3, but it can actually be, the actual storage system is not today CIFS3, but for this data. But I give an example because this is the use case we have. So we would push the raw data and we'd rely on events on the S3 side to trigger the analysis, the first step analysis that would generate something we call ESD, which is event summary data. And this would generate output that gets pushed again into our backend storage. This triggers another event that gives the next step, which is the AOD, which is the analysis object data. And this is what then gets pushed to the physics groups. So this sort of way to do workflows is actually quite interesting because we can manage the processing part on the server side instead of having to republish the software to run the workflows on large scale batch systems and give users the responsibility of maintaining those. So there's a nice use case there to follow. It is not in production, it's something we keep trying and we still don't have it figured out. Then the other one is GitLab CI. There's a lot of requests to do more than what you can do with continuous integration when it's integrating the repos and to use those webhooks and triggers to use managed services that can be, again, versioned and managed elsewhere. So those are the two we're looking at regarding challenges. So again, we do machine learning. One of the big issues, I've given a couple of talks about this as well elsewhere, but we have very large images. These are not necessarily machine learning models, but just the software we manage. Image distribution is a big thing. For this sort of serving use cases, it's even more important because you want the starts to be pretty fast. So the images have to be either pre-distributed, but if you have several versions of software like this, you're talking about potentially terabytes of data that have to be on the nodes. So we've been looking for a few years into this sort of lazy pooling where you just, instead of pulling the full image and starting the service, you actually start immediately and you pull the files you actually need at runtime and there's some nice integration in continuity. I put here an example. The benefits are dramatic. You can see the improvement on pulling times from minutes to just a few seconds. And then the ingress is also very important. We move a lot less data around. So this makes a big difference. Another challenge we have is that we actually don't have a big use case for service mesh elsewhere. So we had to learn Istio when we started looking at it just for this specific use case. So the knowledge about this tool is not necessarily spread across the teams, which makes it a bit harder to manage when we have incidents. The other challenge we have is that the remote serving is often done in air-gapped environments close to the machines. So this is not a use case we've explored completely, but it means that we have to have some sort of offline replication of the things that need to be served. Now I'll finish with the needs that we see for our use cases in the future. So we do have a lot of ML coming bigger models. Until now we actually didn't have that big models. We start having a lot of bigger models, but also some of the ML use cases, they actually rely on the container to have the model, but they rely on a lot of external data that is not inside the container image, which means that all these benefits that I mentioned before of optimizing cold start, they become an issue when actually you're pulling gigabytes of additional data at start. So this becomes quite a big issue. It's not something that the traditional container or OCI registry really handles. So it's something that the community not only Knative will have to figure out. The other one is very large models. We traditionally had pretty small models, I would say. Some of them start being big enough to not fit in memory on a single GPU. So something that also, again, this has to be figured out how we'll handle this kind of use case properly. The other one is we want to make the best use of GPUs. So making sure that they're not idle, that when they are claimed by a serving component that actually we can run more than one workload, maybe on the same GPU so that we don't have them idle when things are just hanging there, waiting for new requests, even before they are cleaned up for being idle. So for GPUs, we've been doing things like make with partitioning, slicing, but there are challenges because unless you do physical partitioning, make memory sharing is not obvious. So we are looking into things like the DRA and Kubernetes that is kind of promising for this. The other one is multi-cluster. So we are by design multi-cluster on-premises because we have this requirement of different types of resources that are located in different areas. But also we want to burst and scale out to public clouds and to give a good experience to our users while trying to do deployments that cross administrative boundaries is not obvious, especially because Kubernetes was not designed from the start to be multi-cluster, so scheduling is kind of still a big issue that we have to figure out. So it's not, again, a particular issue for Knave, but I think it's something that Knave will have to integrate properly to serve these use cases. And with this, I pass to... Thanks, Ricardo. Hi, everyone. My name is Adolfo Garcia-Vetia. I am an open source engineer with Chinggarn. We are a supply chain security company. For those familiar with the supply chain security space, what we do is we protect you to make sure that software that you ingest is safe to run and it can be safely used to build your applications on top of it. So obviously that involves lots of open source. And as part of that, we use Knave to power all of our services. So I'm gonna give you a brief overview of one of our services, which is Chinggarn Enforce. Chinggarn Enforce is our supply chain security control plane, if you want to see it like that. Chinggarn Enforce performs many functions. The way it operates is that it observes the workloads in your cluster and it reports back what you're running. And based on that information, we perform a lot of, it has a lot of features to act on that. So it can do from security scans, it ingests test bombs, it'll tell you what's running where, it can also does a multi-cluster, it has multi-cluster capabilities and you're gonna see how we do it in a little bit. So those services, all of them are powered by Knative service serving. We, our Chinggarn Enforce practically has no role deployments. All of our services are deployed as using Knative serving. And this includes around 30, 35 features that include a lot of those things like those mentioned that I mentioned before. Now, we also use eventing. And eventing is wired to report back to the users all of the events that we detect in your clusters. So for example, when we perform a vulnerability scan on your workloads, we use eventing to send the notifications back to the user and to the console. We also use eventing for the life cycle notifications of our admission controller. So Enforce scan has an admission controller that lets you admit or deny workloads based on policies that you define and that you define for each of your workloads. And you will use eventing to communicate all of those. Now, inside of your cluster, we, this is optional, but we can run an agent and that agent is built. It's the one that observes the running workloads on your clusters and reports them back to our SAS. And that one is built using the Knative controller framework. And so we discussed talking about some of the challenges. Inside of Changer, we have lots of Knative expertise. This, and I'm trying to collect what I heard from our engineers in one single slide. And this boils down to one thing. So in the main problem that we have with Knative when running it was making sure that the Ravid MQ broker was doing what we wanted it to do. So the main problem that we faced was that interacting with the project can be a little bit slow. We are proud to be submitting back contributions to the project, but the broker is in a, I don't know, in need of help. So when trying to find more documentation, when trying to get advice from the maintainers, it has been some, a little bit of a problem. Some of it has to do with the fact that Knative abstracts a lot of things for you, but it also has to make sure that the Ravid MQ back end is running properly. So it has to take care of a lot of things. But when you have to fine tune the features in the Ravid MQ back end, sometimes it's difficult, you have to deal with annotations and it can be difficult to pass them to the back end. So at this point I would like to, so one thing I forgot to mention is I have, this year I had the privilege to serve on the Knative String Committee, representing my company in the end user seat together with Naina. And so I wanted to finish on this land because I would like to open the call for participation in the project. So if you are a Knative user, you can do lots of things to help the project, starting with helping us track you. We're interested in hearing from you, hearing who's using it and how. But more importantly, try to approach and help some of the issues that are outstanding. So while things can work very well, we need your help to make them easier to run. So with that, Naina. Thank you. For me, learning about that Knative was serving AIML long before they were mainstream was the key with certain use case. But I found those end users from every part of the world, like eventing use case, serving use case and everything. We do have some time, if you have a couple of questions to our end users, if you have Knative related questions, please find us on CNCF Slack, but you might not have access to the end users. So if you have questions for them, we can take them right now. But I would ask you to come to that mic over there so that we can actually hear the questions and record them. So, yes. Is this thing on? Yes. My question is for Norris. You mentioned that the reconciler has issues with changes to Knative resources. What were those? And are those things that anybody that uses Knative needs to worry about? Depends on your deployment, I guess. So how many, so I think we have a unique combination. We deploy and all the projects deploy all their stuff with Agua CD. So you can imagine we are talking about maybe hundreds, maybe thousands of custom resources that might be changed during the reconciliation loop. And we specifically have that problem usually with the Kafka controller somewhere. So we think it's pinpointed somewhere in the Kafka reconciliation when we change stuff. But it's very hard to debug. As I said, the error message are not very descriptive and you have all this layering. So you have Knative, the event in core mechanism, then you have the Kafka controller, then you have Kafka itself. So to pinpoint very clearly what happened where is kind of tricky. So we are still figuring out what really is the reason. It could also be our deployment or our Kubernetes configuration at the end. But currently we are sure it's more Knative issue. Yeah. And since there's no other questions, I'll ask you another one. This is to anybody. What's often touted as a huge advantage of Knative is the ability to scale to zero. But what I'm wondering is when you get into a production environment, I understand that maybe in development and in early stages this is great because I can scale completely to zero. But in a production environment, I can see if I have coreweaves use case. This might be super useful because a customer only wants to run an ML pipeline once a day or something. But in larger production environments where I'm constantly running or doing work is scale to zero really that valuable? Yeah. So some of our customers scale to zero doesn't do them anything. They're setting their min scale to 50 nodes, five nodes, 100 nodes, however many pods they need to run their back end. Other customers, like I said, they do fine tunes and fine tunes are very specific. And so they have a tendency to allow scale to zero because they're okay with that startup time given that they have hundreds of these fine tunes sitting around. They don't wanna be paying for all those resources when no one's using them. And so they made the trade off decision that they're okay with the startup time. These customers have generally done good with optimizing their startup time. So we're not talking like five, 10 minute startup times. We're talking about sub 30 second startup times. They're okay with that trade off. And so there are use cases that we see some customers in production utilize it a decent bit and other customers not at all. So it's very use case centered. I will say the functionality and cave that allows scale to zero and allows the queuing in the activator is also very useful for burst. So while your containers are spinning up especially with long container start times you get hit by a lot of requests. The same thing that allows scale to zero handles burst and queuing. So like that being built in like that functionality and that core functionality helps you out even if you're not using scale to zero. I guess is what I'm saying. So it does a really good job of handling long container start times where you need to buffer thousands of requests before you have containers ready as well. Thank you. I'll just add if you don't want in production cases you can always keep scale to one. But that was a good. Do we have one more question? Yeah. Hi, my question is for Ricardo. You had shown that you have a multi step physics models that you run. There are a number of different ways you can do that. Why K native? And are there just specific use cases to use it K native for and? Right. So maybe I go back to that slide quickly. So you're talking about the bottom left. Is that it? Yeah. So that's kind of a still experimentation. We are evaluating for this sort of use case. This is not for physics models. This for workflows to do what we call event reconstruction which is when the data comes from the detectors. It needs like we have the raw data and we need to do the reconstruction to see what actually happens happen in the collisions. So this is some workflows that we have for 20 years in place. And the way we usually do this is we deploy some sort of batch workflows on a DAG but the software that is being run has to be managed by the physics groups by the people submitting the jobs. They need to describe the jobs which versions of the software should be running for those workflows. All the steps have to be defined. So but the main motivation for this is that the components doing the data collection they will just store it and this will trigger events that trigger the reconstruction steps, the proper ones. And also the actual reconstruction software is now an endpoint, an HTTP endpoint where what is running there is managed by a system administrator and can be described in Git and we can do GitOps for that. We don't have to teach the users how to build their workflows and which versions to use and all these things we can do as we do for services we can actually offer them a workflow an easy way to implement workflows. Now if this will actually be used at scale we don't know because people are already used to using large batch systems but it's a very interesting use case. And building complex workflows with this sort of eventing model event and serving is actually quite easy and you can describe it in a declarative way so there's some value there I think. Thank you. And I love the energy though. And we are out of time I'm just gonna this is our QR code you know where to find us the CNCF Slack Google groups and we have a lot of tutorials there and if you're still with us if you could leave feedback on our session that would be great. Thank you so much. Enjoy the rest of the KubeCon and safe trip back home.