 Okay, thank you, thank you for having us here. It's great to see so much people here. Short introduction, so we are both software engineers working at IBM, Germany. This is our picture of our lab, it's kind of, it sets its roots in mainframe, like everything in IBM. Linux on the mainframe started this Skunk work project there, and now it's missions from Quantum to Cloud. So, and we are working on IBM Cloud Code Engine, and this is where we're here, it's kind of a, it's based on Knative and provides a simple user experience on top of Knative and several other open source products that we are using. So we're providing kind of running your, enabling your containers in the cloud easily. We've also a batch experience. We're using Tecton to deploy source code into the cloud. So for example, if you know it from Cloud Foundry, we provide a kind of a similar push experience for your source code for your projects. So this is kind of a glimpse in the user interface, so the easy path is just you type your name of your application, you type your container, press create, and your service is running and you can access it secured via HTTPS in the cloud. I think for this audience, the second thing is probably kind of interesting. You can get a cube config using the command shown above, so you can select your project, you're getting a cube config, and then you can use the usual tools you're using. So for example, you can then KN service create and you can run your application in the cloud easily. And for all that, you don't have to care about your clusters at all, you don't see it. You just get this experience by getting your namespace where you can run things on. Code Engine started a while ago when GA in 2020 went better in 2020. NGA early 21. And I think we're using now connected things a longer time since 011 and now on 016, so we made a long journey with it. And same as for Istio, as you can see, which you use as a service mesh, started also with 1.5 an hour on 0.15. And at the moment, we are also working with some here with the community on getting a Candidate Confirmance certification running. So since Code Engine is running globally, I think now in nine data centers, it runs a world with lots of users, lots of services. We have seen that this use case is kind of special, so you don't see lots of blocks or documentation how to run it in a kind of this high number of services. So the documentation of running things multi-tenant is kind of sparse, so we want to try to tell something about the things we learned in running both as aspects, the multi-tenant aspect and the scaling to the lots of services and share that with the community and what you might can learn on running for those. So I will cover the first part, which is using Knative in a multi-tenant way. So sharing one cluster with lots of users and avoiding that they interfere with each other too much and better not see the other person's services and data. I will talk about three aspects why we did this. First is encryption. We have some network isolation and also we have to care about the resources we share. I will start with the encryption. We decided to use MTLS encryption provided by our service Magistio for all connections, for example, from the service to the activator, from the activator to the single service so that the traffic is secured inside the clusters and separate from each other. And though Istio is a Knative, it has a number of measures you can choose from. When we started there we had to make a decision because we decided that every customer project has its own certificate and at that point of time at least it narrowed down our choice of service measures and we decided to do one. This was Istio supporting that configuration. So we have to configure our gateway configuration. You see here for one of our services, the one service as an example, we see each of the projects which is a namespace in Kubernetes language gets its own certificate. We're using let's encrypt there in the gateway file. So we have this slide here with this teacher and this is kind of the lessons learned. We learned on the way and we went forward to share. So there's a lot of service measures around your requirements, sometimes limited to a very low number in our case, at least at this point of time, to one. And interestingly, look at this definition. So if you have this gateway file and we have for each namespace we're supporting we have one of those sections in. At least it somehow gets one of our limitations we have in the classroom because this file, because each of these entries repeats the cipher suites and the TLS information gets bigger and bigger with each service and at some point of time it will extend the limits of, for example, HCD entry size that you're allowed to do. Luckily for us with one is the 115 Envoy gets same defaults so we don't have to repeat, we have to reach it much less things but these are the things you have to look on also that your YAML files didn't grow too much and you are not able to store them anymore. Network, of course we have, we put network policies in place to shield namespaces from each other so one customer cannot talk to, for example to ports and services in other namespace. This looked good at the first glance but then we looked on local connected services and found out that this whole concept didn't work anymore because this kind of course, namespace is cluster local but not namespace local. This was kind of okay as long as you only provided public accessible services so there was this public pass anyhow you can access it but we also provide private access part to services from the customers VPC and then it gets a real problem that you connect something you're not allowed to do and we worked around that I'll show you, actually out of band of Connecticut from Istio by putting filters into Envoy which it's a simplified code here adds a secret to each outgoing Connecticut local call and when we receive that call in the same namespace we check that this added secret is fitting the namespace that it originated from and of course you make sure that this is not handed out to the customer and we overwrite it so it's a quiz. So lessons learned using network policy is good but not enough. Last thing I think just skip over the time of course we have to if we hand out resources we have to make sure that one customer cannot take the whole cluster so we're limiting the number of apps you can provide the number of revisions because revisions each service resources IP addresses in the end which are limited so we have to limit things and make sure that the customers play well with each other but you will have some flexibility there to react to customer demands The last thing we had to do is kind of where to massage the things that were created I think a good thing to explain is the image pool policy so of course the best for us and for the speed would be if you could take the image as it's on the cluster you don't have to do any outside call to registry but use that image but it's actually a better security problem because the customer another customer on the cluster could kind of guess the name of the image of another customer and there would be no authentication and could just use that image so we massage every image pool policy to always which is not as bad as it sounds because it doesn't actually pull that image if it's already there but only does the authentication part with the registry and there's work going on in the community to enhance that part so there's some security checks on the image pool this is driven by the IBM OSS folks and let's skip over that also I think taints and tolerations we also have to somehow control so lessons learned image pool policy if not present is fast but a security problem good thing is always what we really use is the connective really provides a fine-grained control by this capability settings you can do which helps us a lot controlling some aspects of connective and with that I'm through the market end part now I will take the scaling section thank you Martin maybe questions to that part first of all thank you for the insights that you gave us one question more high-level question would this be also possible for any connective installation so like the recipe how to set up network policies and also this extra envoy filter this something that would be helpful for the community because I'm asking whether it might make sense to add this to the documentation or what a blog post around that yes definitely it would make sense I think we have some blog posts out in the world about the setup not sure whether we cover these network policies already but it might be a good thing to do because this is asking also we get often and it would be super awesome if we find some solution about that you're welcome okay thank you Martin so after we've secured our tenants and they are isolated from each other the next thing we will want to do is to actually scale up so I want to give you some insights on what we do what we change in the connective configuration or how we configure connective to scale to the level we are currently at which is multiple thousand services per cluster and the things I want to talk about are basically three the cold start time so as you've seen we are using Istio as a service mesh which comes with overhead on top and we want to get that down as much as possible and also some particular configuration information for connective and the special section for connective plus Istio what you may need to be mindful of when you use Istio with connective and how to get the cluster to a sufficient size so for the cold start time there are basically two things with Istio we can obviously pre-pull all sidecar images out of band to basically pre-warm all nodes we have so that the image does not the sidecar images for Istio proxy and Q-proxy which we need do not need to be pre-pulled extra on each node when scaling up but that's actually not the bigger overhead the second part is the mesh tuning so most of the overhead when scaling up with running with Istio is getting the service information to all parts when scaling up and this will be most of the overhead you have when scaling up a service from zero to one instance and this is where you can basically Istio in a way which I come to in the separate section at the end on what you need to look out for we also experimented with Istio in a traditional way so that the Istio proxy which runs as a sidecar is configuring the IP tables as a networking layer and basically run it against performance tested against the Istio CNI plugin with no noticeable difference in start and cold start time the only thing that mattered in the end was the mesh tuning with getting the push the discovery information for the services as quickly as possible to the activator that is in the path and to the service endpoint this is the lesson there of the minor overhead of the actual sidecar image pooling both images are pulled and you can see and even if you do use the CNI plugin and configure the IP tables not in the actual in-it container of a spawned CNI application or CNI service it does not matter in the end for the cold start time what matters is the Istio state synchronization that needs to happen so the Istio control plane needs to get the information what the activator needs to know and the receiving end needs to actually be able to get that in time and set it up otherwise you will see a lot of errors when scaling up or scaling down so from a pure canative perspective we deactivate the HPA for the activator why? we saw that when we have the activator HPA enabled what will happen is dynamically if it's necessary the activator instances will spin up or down depending on need and what will then happen is the activator service will be changed and the endpoints for all services in the cluster will be changed which means Istio needs to pick up all that information and send it out again which is a huge pain that we need to avoid at all costs that's why we have for our clusters basically a static configuration for the activator at least to limit the scaling or the changing of the activator service as much as possible so to not overwhelm our cluster with Istio service synchronization information the second thing is you see there we are using proxy mode so we have the activator for all services in our cluster always in path it will never be taken out because that again will trigger Istio to need to push more information and the one thing Istio currently only does it only pushes state of the world information not a delta so even if one service in your cluster changes it will not send one service if that service is visible by other services it will send every cluster every information to that service which is a lot of information then we added an HA setup for all the Knative components so that we have run multiple and different availability zones to be HA with the that's not currently in the current Knative defaults then another thing to maybe be mindful of the queue sidecar size is defaulted in Knative and when you have as Martin told multiple tenants that have different resource quotas and can adjust those or request for adjust this might have an impact on the cluster if you don't pin it to something specific or make it tied to the size of the service itself then the Kubernetes API calls that the controller does there are two things the one is you need to increase if you want to scale the QPS and burst they are set pretty low and you should increase it to a sufficient amount and what's currently not possible is the controller we give them with our patching more worker threads and we currently have an issue open for discussion if that should be included in Knative because the controller will get a pretty big queue when rethinking and that might be might cause some hiccup in service provisioning times and revision creation in addition to that we don't want to only autoscale pods or autoscale Knative services for our customers we are also using a standard cluster autoscale with a provisioning feature on top of it to basically order nodes ahead of time based on workload demand we anticipate and then pre-warm all those nodes that we can get Knative services on that up as quickly as possible and these are basically as summarized again the lessons learned we had we want to avoid the state syncing for Knative that's based on the activator instances change then we increase the API limits and one future item is we notice that Knative is doing a lot of API calls and we notice at least some of them seem to be unnecessary for example we have an issue you will see it in the references section if you want to discuss or take part in the discussion where Knative tries to update all the deployments in the cluster when rethinking even though nothing changes and all that takes time and if you have constant monitoring on your cluster you will see those three things basically as increases in a lot of cold start time and provisioning times coming to the Istio section I think what was it Knative 1.4 that supported mesh mode so we don't want to or we don't have port addressability so we always run with mesh mode enabled to basically stop from the defaulting behavior that was present where first Knative tried to go directly to the port and then defaulted back to the cluster IP so we're running with mesh mode and as already mentioned the proxy mode we run in so the activator is always in path to avoid resings on Knative and Istio size if that's not present what we saw when scaling up further and further is basically 503s a lot of them if you take the activator back in path when it was out again the activator needs to have service information for that service it's going to be in path 4 and if that's not present at that time Knative switches the endpoint but Istio did not send the correct information yet the user will see 503s on there and we want to avoid that another thing is this is just the standard of Knative artifacts when we have a service a configuration basically a root and the root routes traffic to different revisions and when we have revisions what will be created by Knative are two services two cube services a public one and a private one and they always look like that it's not configurable and this is for an active revision and at least part of that is always necessary what will also happen is if you have users that create multiple revisions create a new revision use traffic management you will have multiple revisions and more importantly sometimes you get to non-active revisions there are defaults where they are retained for some amount of time or you can pin it to a maximum but for non-active revision Knative keeps all artifacts in place there is a deployment, there is a replica set, there is a port outer scalar there is a serverless service and there are those two cube services Istio does not know that those are not active and that those are not rootable Istio will send those informations with every push even though this is not necessary so to come to the last part we want to reduce Istio mesh synchronization because that takes up a huge amount of CPU and causes all the delays in code start time and for that as just saw the Knative garbage collection we also limited it to maximum one non-active revision per customer so the app or the Knative service can only have one non-active revision this gives each user some kind of safety net to go back if anything goes wrong but it prevents us from having all those services lying around that are not actively rooted because we don't want to send that information out and Istio supports basically a lot of features we want to concentrate on two and this is the mesh debounce where you can specify amounts of time on how often Istio pushes and how to aggregate pushes so that you don't overwhelm all your receivers in the cluster because the Istio control plan does nothing else but push and the receivers have to actually handle network so the Istio control plan will likely win and overwhelm the receivers with a lot of pushes and we want to avoid those at all cost and that's the summary of that to limit the Knative garbage collection or at least if you're scaling the cluster be mindful when running with Istio what the impact of having non-active revisions will be and to keep an eye on Istio settings especially for debounce, debounce after means every push will be delayed by that amount of time that will have a direct impact on provisioning time on revision creation and debounce much as how long to keep aggregating pushes until you finally send it out and as a future Istio does have something in plan to support delta the pushes that Envoy already supports to get rid of all this unnecessary synchronization effort yeah short outlook from our side for Knative means we want to support custom domains we currently do but not via Knative and we want to switch to the Knative support of custom domains we also want to support the proxy protocol together with Istio for enhanced auditability and we want to adopt the delta pushes once available and with that I want to thank the rest of our team and especially our open technology and developer advocacy team that is doing most of our contributions to Knative you all know Max, Paul and Angelo we are mostly from the operation side and we want to thank the rest of the community for all their contributions and to highlight from all the components you've seen in the beginning that Martin showed you Knative is the center part and you saw the size was it's basically the center piece everything together with Istio and the upgrade experience and the stability for Knative is good it's better than most of the other components we use and I want to thank the community for that thank you it warms my heart to hear that there are questions do we have time for... I have a question have you experienced any size limitations besides those that you have seen like for example a maximum number of services that you can deploy on a single cluster I'm asking because for example IP tables have a size limit and if you say that a lot of services are not used anymore they all still populate these these kind of data and I wonder whether you have seen some ceiling with respect to the number of objects that you can deploy on a single cluster actually I think we didn't hit a hard limit somewhere yet I think we knew the nearest thing actually was this gateway configuration we knew that at some point of time it will force us to do something on the database side to restore it I think our experience is that we have an ongoing fight with the service mesh to get the data around and connected to that is also this marshalling of this information center around it's astonishingly CPU and memory intensive so you have to also cater for that so your default sizes will not be enough if you hit it but I think we don't have any hard numbers where the end is I was just going to say a great presentation I just had two questions so with the limits with Istio there I guess how do you guys handle rollouts and upgrades to Istio potentially still running thousands of workloads that could be in flight when you have to do the rollout of those the second thing is with the chatter of Istio with XDS so obviously you've had to fine tune that would that chatter be reduced significantly with ambient mesh when that comes out that capability rather than running sidecars everywhere so I guess just those two questions Upgrade experience so we do upgrade the whole cluster in flight when it's working we have to XPSI Massage Istio installation to not delete some important things so you have to be careful what you're doing there I think we are able to update the whole cluster in flight so no downtime for that ambient mesh is the thing I think we will look in because it's exactly I think not for this sync problem but for the resources used in the cluster there are much less sidecars that you have to care for and so it would be in the end less expensive to run the whole system but the ambient mesh even with that the control plane still stays the same if not using delta and it still needs to send all that information so it will maybe help but probably not solve all of the problems we see so far those are quick questions so you mentioned that a lot of your work is focused on reducing what Istio is pushing out and reducing the number of activation sync what sort of tradeoff are you making by doing that is there some part of the experience of using Knative that you lose by trying to prevent Istio from sending out all that data so it can hit the cold start time you need to find a balance between cold start time and creating new revision or updating your service because those times will take a hit if you try to reduce what Istio needs to sync so depending on how many services you have in your cluster what rate of changes or how many services are provisioned per minute per second and how many cold starts you have you need to find a balance but yeah there will be a tradeoff made if I can add one more bit to that they turned off a bunch of the auto scaling and things that are efficiency gains to avoid Istio stealing all of that back by having to do an XDS sync so they turned off auto scaling on the activator and they turned off taking the activator out of the path when you get a big when you get a service that's getting a thousand requests per second or something like that the activator will still be so I'd say that resource efficiency in some cases they've had to sacrifice in order to exactly that's the trade we're making that we have this cost us of course to have these resources available to handle the traffic we have and the information we have and the marshalling we do so this is the cost we pay one more question in terms of advice of running this large cluster are you running one single large cluster with all the customers in one region or are you sharding and having multiple clusters maybe some advice for people that want to run at this scale actually we have a sharding solution behind but of course to reduce the over head having a control plane management component on that we try to use the clusters we have as much as possible so this is still a thing to get the smallest possible effective running those clusters but we have a sharding concept to kind of adapt to growth and customer numbers that we have thank you, thank you so much