 All right, hey, everyone. Welcome to our talk. We'll get into it here. So yeah, we're from Spotify. We're going to be talking about how we recreated our entire backend without skipping a beat. So my name is Dan. I'm a PM. I work on our compute platform and in our compute networking product area. And Nick, I'm an SRE working on our compute platform and our networking infrastructure. So if you don't know Spotify, yeah, it's that one. We do music things. And yeah, we work on the platform there. So one of our core values is playfulness. And sometimes what that means is you get team names that don't really mean much. So we work on a team called ALF, which is, yes, from the show from the late 80s. And you might be thinking, what's the best place to get support for a team that's called ALF? And that would be, of course, at Melmac. So that's super confusing for our users. But that's kind of how we do things. So in terms of scale, these numbers are actually, we got them approved so that we could share them with you. We haven't talked about these publicly before, while the stuff on the left is all public, but all the things on the right, that's all brand new stuff. So yeah, in terms of scale, about 602 million monthly active users in about 180 markets. From the platform side of things, we're about 2,800 engineers, over 500 squads. That's another thing we call them squads, as opposed to teams. We do about 2,900 production deployments a day, over 3,200 microservices. And then in terms of compute itself, we're over 40,000 VMs. These are all at peak numbers, across about a million cores, four petabytes of memory, about a half a million Kubernetes pods. And then the big one there, 1.5 terabytes per second of egress, which is actually larger than the Amsterdam Internet Exchange. So yeah, we have quite a bit of scale there. And of course, the fun things happen at the boundaries, so across clusters and things like that. And all this to say is when we have downtime, somebody notices. So January 14th of last year was not a super fun day. This wasn't compute related, but something that the ALF squad actually had to deal with as well. And you can see there, when we have downtime, people notice and they are very vocal about it, except I guess if you have a Zoom. But yeah, one of my favorite things to do there, pop open Twitter whenever we have a Notage, which luckily isn't that often. But yeah, people notice and they definitely are very vocal about it. So what does our backend look like? Actually, it looks like this. This is a graph of all of the microservices that we have on our clusters. The sizes there are the amount of traffic that they're doing, so you can see this is a picture of our entire backend. Not super helpful for anything, but yeah, it's just a cool image. So what we're going to be talking about today in the next 27-ish minutes here is we're going to be talking about what Kubernetes looks like at Spotify today. We're going to talk about why you would want to rebuild all of your clusters, and then we're going to talk about how we actually went about doing this. So let's talk a little bit about what Kubernetes at Spotify looks like today. The golden paths or the paved paths or the paved roads or the insert, however you want to think about them, is something that is very, very big at Spotify. The golden paths are really a way to deploy something to production. It's like a tutorial, essentially. And we have quite a large number of our backend services that are using these golden paths to get to production. What that ends up looking like from a Kubernetes perspective is we have these massive multi-tenant clusters. We probably have fewer clusters than you might think that we do. But they tend to be long-living, and we upgrade them. So we've had clusters that have been around since we first moved to Kubernetes, which was years ago. So a lot of our first iteration at Kubernetes and the way that we thought about it was for folks that are using these golden paths and supporting them getting onto the clusters. I think the big problem with that, of course, is that when you think about Kubernetes on aggregate and what it looks like at Spotify, you have a pretty big portion of those workloads, as I was saying, that are on the golden path. But then you have a whole bunch of other ones that are not. And, of course, as a platform, especially as a PM, we can't just address a portion of our users. These numbers are not accurate, by the way, more just for representation than anything else. But we have a ton of other stuff on our multi-tenant clusters. Things like non-golden path services, especially for optimized services, those big, big circles that you saw in that graph earlier, those ones can't be on the golden path because they have optimizations that require them to be a little bit more performant or fine-tuning because they're doing so much traffic. We have things like cron jobs. So yes, those are also on our production clusters. And we have other things. We don't actually abstract away the Kubernetes API at all, for the most part. So our users have access to clusters mostly at the namespace level and can deploy not whatever they want, but they can deploy things to those clusters as they see fit. So what we end up with, from a platform perspective, are these massive multi-tenant clusters that see quite a bit of traffic that we have to support for these golden path things and then things that mainly could be anything. And so recreating clusters, for example, is definitely a big task that Nick will talk about in a minute. So the first step here is when we're thinking about recreating our clusters from a product perspective, we need to think about a couple of things. One is the users of our platform. So who are those people that are actually deploying the things and what are they trying to accomplish? What are the things that they're trying to get done? And then what is our impact to the business? As a platform, we provide a layer on top of Kubernetes itself. And we always want to be thinking about our impact there. And yes, there will be several gratuitous alph gifts as we go through this presentation. So we always have to keep in mind these two things whenever we're building anything and that first starts with who our users are. So this is the kind of mental model that we use or we will simplify to guess. But this is how we can think about the people that use our platform, those developers themselves. And it's really a spectrum from everything from I don't need to care about Kubernetes and I don't want to, all the way to I want my own clusters. We do have folks that do require their own clusters for whatever it is that they're doing. Maybe they've created an operator. Maybe they are using a third party service and they need their own cluster. So we get folks that come to us and ask us about a kind of a whole myriad of things, and we want to be able to support them the best that we can. And having one kind of Kubernetes offering obviously doesn't meet all of these people. And so I don't care about Kubernetes at all perspective. We do have the golden paths. That's a way that you can be in production. That's a way that you don't necessarily need to care about Kubernetes. For people that want to do Yammily things, these are the people that want to fine tune their services or they want to kind of dig into things. Maybe they have other needs. We support them with these kind of golden path things, but not as much as we would for folks that are on the golden path. The problem, of course, is for people who want to do things like write operators or own their own clusters. And we don't really support them very well, right? So SpongeBob here, yeah, we don't have a great answer for those kinds of people. And that really is the kind of the product piece that as a PM I'm really focused on is how do we best support all of the people who use our platform as opposed to just the people who only care about one thing? As a platform team, this is always the kind of thing that is in the back of our minds. What is the kind of cost is proportional to reliability, right? Talk before about the kind of, if we haven't how did people notice, we could make something extremely reliable. It would be extremely expensive. We can make something extremely inexpensive, but it would fall over all the time. This seems super obvious, but it is really the core thing that drives a lot of our decision making, right? So is this thing that we're doing decreasing our reliability in favor of cost, or is this thing something that is going to increase our reliability and how does that impact our cost? And somewhere in there is a certain level of nines that we're happy to accept that we don't measure by the number of tweets that we get, but somewhere in there is the right level of cost for reliability, and that's something that we are always mindful of whenever we're designing something or working on our platform. So of course, as a PM there was something that sounded something like this when we decided to look at this. So the idea is how do we create these differentiated clusters themselves and ensure that we have the operational control and stability of them, while keeping in mind of things like cost and reliability, for example. A product manager obviously said this, unnamed. But this is the kind of the idea here where we're trying to create these differentiated offerings in order to meet our users where they're at and ensure that we're hitting those kind of key things that we care about as platform. So going back to our Kubernetes willingness and knowledge, our first kind of pass here at separating out these use cases is really along the lines of the folks that don't necessarily want or need to care about Kubernetes and the folks that wanna do Yemily things. And those folks are really the ones that are looking at computers as the product, right? They're really looking for just raw compute to run their services or do whatever else that they need to do. On the flip side, these other folks are really looking for the clusters themselves, right? They need maybe more fine-grained access to the clusters or they need other things that the folks that are more in the computer as a product bucket don't necessarily need, right? And as I said before, these are folks that are more advanced, they're writing their own operators, they can manage their own clusters, although it might not be their day jobs to do so. And so this is kind of where we've drawn the line today. There may be more gradations here, right? And there's really a kind of a, I think the devils in the details of where you actually draw the line between the folks that need one thing versus the other. But for us in terms of our size and our scale and our ability to manage and support our users, this is kind of our pass at it. So our current cluster offering of a single multi-tenant service definitely isn't this. And this is one of the kind of key reasons or the key reasons that we were looking at rebuilding our clusters. Another piece of it or what this actually looks like in practice is something like this. So for the folks that are doing the kind of computers as a product, these are the people who don't necessarily need the whole cluster, for example, what they have are things like namespaces on a multi-tenant cluster. They could have higher levels of abstraction. Depends on, you know, depends on themselves, depends on what they need. Like I said before, we don't really abstract away the Kubernetes API at all, but we could at some point if we really wanted to lean into the kind of golden path or paved path. Things like SLAs, as a platform team, we guarantee things like cluster stability and reliability to a certain level of nines, for example. Governance is actually a key piece here, right? So whenever you have these clusters where you're not abstracting away pieces, having governance on these clusters is extremely important to ensure that you know what's on them, right? Cost, also a huge concern for us, as I was mentioning before. And our squad at Spotify doesn't exist within a vacuum, right? We have other folks that their job is to take care of costs, reporting controls, and things like that. And so we are controlling the clusters from a cost perspective and feeding that back into different parts of our organization. So this is super, super key for us, ensuring that we have these clusters that exist within the ecosystem of Spotify. Specialized nodes, this is becoming more and more important. I think I've heard AI mentioned about 4,000 times in the past hour, so we care about this as well. Things like access to GPUs are very important for our users. And so having access to those node pools is super important. Things like model serving, for example, will have different needs than training. And those folks are all on the same cluster sometimes. Upgrades, so we still do upgrades. Cluster life cycle management is something that's key when you're managing clusters at scale. And for those computers or product folks, it's fully opaque for them. They don't even really necessarily need to know when the clusters are being upgraded. And they shouldn't need to know if we're rebuilding clusters, which is something that we've done and Nick will talk about in a sec. For the clusters of product people, these are folks that need that additional kind of fine layer of control. So they don't need necessarily higher levels of abstraction. They need direct access to the clusters. They still need things like cluster stability and reliability, which is what we offer them. But they get a cluster out of the box that fits in the ecosystem of Spotify while allowing them the control that they need. So things like managed upgrades. If they're doing upgrades, we help them with that. And cluster life cycle management is super important. And then, of course, the flavors of clusters, as I mentioned before, access to GPUs. So this is kind of one of the reasons that we decided to rebuild all of our clusters to support this model. And I think the key question that hopefully you're all thinking is, how would you actually do that? And this is where I'm going to hand it over to Nick to take over from here. Cool, thank you, Dan. So once you've resigned, you need to do this. There's discrete stages of work before you can do it. And most of the work is done before you do anything live. And it's important that everything is going to be done live. As you saw, our users can't handle any downtime. And we're going to work striving to maintain that. So we're going to go into each stage talking about, first, our architecture in terms of how we can actually support doing this live. Talking about our clusters themselves, their shape, their configuration, how they're managed, the workloads themselves and how they interact with our clusters and our ecosystem, and then how we actually do migration tooling to do this live. So to start with our clusters, our architecture, I should say, let's take a very simple global service. You have it running in many regions. You have a global load balancer in front of it routing to the closest client so you can have low latency. And when you think, oh, I need to recreate my clusters, I need to move workloads from old clusters to new clusters, obviously the first thing that comes to mind is your networking setup. We could have chosen to do this with very segmented partition networks, but we didn't think this was a good enough experience for our users. We thought something could be done better, something could be done within region. So we decided to do this by running service discovery beyond the cluster boundary and having headless services within the clusters. So what this means is our clusters themselves can attach to service objects similar to what you would within a cluster, but be routable to within our broader network. And this has given us a lot of flexibility. So as we bring up new clusters, we can deploy existing workloads on running clusters to new ones and still route traffic. And this has been a game changer for us in terms of actually being able to support something like this. So when it comes to preparing your clusters, obviously, Dan mentioned a lot about the golden path, but it's not just the golden path. You have to support it's everything else. There's over 3,000 microservices, and frankly it's impossible to know them all, so there's certain things you have to do as a platform engineering organization to protect yourself. So that starts with what does your clusters actually support? And it starts with your APIs and your kinds available. If it's on one cluster, assume it's used on all of them. If an accelerator or machine type is available on one cluster, assume it's used on all of them. And more importantly, your ecosystem of tooling for your platform org needs to understand that all of it has to work across the cluster boundary. When you deploy your workloads, it needs to be cluster agnostic. It needs to work across any of them. And your monitoring, alerting needs to not be tied to the cluster itself. It needs to have a strong level of continuity. And on top of that, everything else that a lot of best practices we talk about in the industry basically become table stakes. Things like infrastructure as code or automated cluster life cycles, that's just an expectation here, not a nice to have. So we have a modernist clusters now, right? They're nice, equal boxes. Shout out ChatGPT for this one. And our services are available in any number of the boxes. But now we have the problem of fitting them in. We have to put them places that need to be somewhat consistent. They need to be reliable. They need to be in a healthy state. So the first thing you have to ask yourself when you start thinking about your workloads and the cluster ecosystem that they live in is do you know where your workloads are? Because if you don't, you're going to need something like a software catalog. You're going to need a source of truth. You're going to need to get a lot of information from these services. It's impossible for us as a platform engineering team to actually know enough about all of our services and all of their unique bespoke requirements, because they're all going to find different ways to do different things that you're going to have to support. And backstage is sort of the key piece here to actually having a source of truth to dive into these workloads so that you can eventually support their migration. But past that, getting enough information about your workloads, you have to actually optimize them for the clusters. And we're solving that in three ways. So as Dan mentioned, we want to give users, our developers, the best possible chance at component creation. And that's the golden path templates. And that's the best sort of configuration for their coding language and their framework. And that's the best Kubernetes configuration possible. But unfortunately, services are long-living. Many of our predate as far back as Spotify's existence and automated improvements need to happen constantly. Spotify's made really significant investments in this in a program we called Fleet Management. This is where we're shipping code and configuration changes automatically to all of our components constantly. And I think to date, we've shipped over 8.8 million lines of code and configuration changes to all of our repos. So think about every Kubernetes version upgrade you've done that's changed the YAML spec just slightly. Well, we don't want all of our engineers to have to understand that to take that cognitive load. We want to handle that for them. And that's where Fleet Management comes in. So think changing HPA behavior or deprecating some other API. We can handle this for them because we can Fleet manage it. And for everything else, there's Coverno. Dan mentioned governance. Coverno has been a key part of that, acting as our governing source of consistency across the cluster, but also acting as an easy way to handle dynamic environment injection. So it has really strong abilities with admission mutation web hooks. And this lets us add dynamic content or dynamic hydration to all of our users' workloads without them having to know what cluster they're running on, which region they're running in. So so far, we've optimized our workloads. We've built an architecture that actually supports running workloads across the cluster boundary with an ecosystem of tooling as well. But unfortunately, the final step is actually moving workloads in a live setting. People are streaming music and podcasts and audiobooks now. And we can't stop that. We have to do this live. So the important thing when planning the migrations and preparing them is understanding that the migrations are going to break, toolings are going to break, tooling is going to break, but the workloads cannot. And you have to cover all the bases. Humans are going to make mistakes and your automation is going to have bugs. It's just inevitable. So the key thing is that your migration has to be done safely. And we sort of plan this with sort of the main principle being adeptancy when you're doing a migration in a live setting. So the first stage, obviously, of doing a migration for one workload and then applied at scale is validation. Is it actually safe to start it for this workload? Can you move this across the cluster boundary? Is it in a healthy state? Has the input been, is it valid? And what does this workload need? The next step is pinning the workload. And this sounds simple, but you have developers working across several time zones. They're shipping features and code. We're also automating lots of code changes. And we need to ensure sort of a consistent environment. We don't want two versions of code running in two places with an inconsistent experience as we're constantly launching new features and constantly shipping more code. The next step, being double deploys. So we're deploying to an old and a new cluster, or potentially multiple clusters. And this is where Dan mentioned cost versus reliability. This is an explicit trade-off we're making here. There's other ways we could have gone about this so that we don't have to double deploy workloads, which can be expensive at scale. But this is something we believe is super valuable for ensuring quality of service. And the next step sort of builds on that, which is traffic shifting. So graceful traffic shifting gives you a lot of things. But most importantly, your users have no idea that maybe their connection is moving across a cluster or across a region. Finally, now we can reap the rewards of our excess cost for reliability. And we can clean up. And finally, we can unpin. So we can allow sort of the Spotify machine of automated changes and developers to constantly ship new features to our users completely seamlessly. So we've done a lot to get here. By now we have a network architecture that supports multi-cluster services. We have a completely homogenous set of clusters and a cluster agnostic ecosystem of tooling. Our workloads are as healthy as possible to support this in a live setting without a sort of human interaction. And we've created a strategy on how to migrate in a live production setting that can fail at any stage. So putting all that together, you can kind of piece together what our principles are. But we can sum it down as these two. The first one being take the pain. We don't believe our users should really incur the they shouldn't feel any pain for a decision that we made as a platform engineering team. We chose to recreate our clusters because we believe it gives us a lot of value. But they just want to build features and bring value to the business. So we're going to take that for them, which leads to the zero developer interaction. With 3,000 microservices and 3,000 engineers, it can't be expected for all of them to interact. That's not how it's going to work for us at scale. So we believe the user almost shouldn't even really know where they're being deployed. And if we've done sort of our migration as best as possible, they wouldn't even notice they're being migrated. But we'll let them know on backstage usually. And with that, not a beat was skipped. Yeah, so we're happy to take questions here. Also, we'd love your feedback or anything else. So just hit the QR code. There's also some links in there to learn more about things like fleet shift or that fun incident that we had last January. So yeah, I think we have a little bit of time here. If you have some questions, feel free to jump on one of the two mics here. And yeah. Hello. That's a lot of people. Hello. In terms of managing that many clusters at that amount of scale and you having users who don't really want to give a shit about Kubernetes and all of that stuff, how do you make sure that those developers who are owning those applications feel like they actually own them? And if they break, then they are responsible for fixing them? Or are you guys responsible for fixing stuff? And what's that set up in culture a bit like? So we're a really big believer in ops and squads. So the squads themselves still own all the operations of their workloads. But it depends really on the outage, right? If we're doing a big change to the underlying fleet and we break their workloads, we're going to have an automated way to revert it or to get them back into a healthy state. And a large part of that, too, is also the integrations with backstage that we're building to support all their users. So in the best case, users don't even have to actually directly talk with us to fix their issues. They can likely fix it through backstage. And if not, then we have all the normal on-call goalie-style support. So we'll just go to one. I think there's another mic over there. Yeah, go for it now. Thanks. Could you just elaborate a little bit on your use of Kaverno and what you guys are actually doing with it? Sure. We use Kaverno for the two main things for us are it's as a sort of a generic admission mutating webhook. So injecting environment or cluster-specific variables that workloads may need to operate within their region. Obviously, we operate in 180 markets. So that means a lot of different things around legality. So workloads often need to be aware of the content they're serving and where they're serving it from. So injecting things like that is one thing we use for Kaverno for, that type of configuration. Another thing is sort of ensuring other types of best practices on the workloads themselves. So our workload owners get a lot of control around editing their workload. But say we want to enforce the best practice like your memory request equals your memory limit. We can set that as a Kaverno policy to ensure. Or end dots equals one, for example. Yeah. How did you achieve the serial developer interactions in demigration? Because that seems like a bit of a thing to do, especially given the demands on the engineering teams and how much you need to work on that. Zero interaction was kind of done a few ways. But the big thing is that we have a lot of platform sources on where workloads are running, where they need to run, like what regions they need to run in. We have access to the repos. We have tooling around triggering API-driven deployments. So for us, it's really with the whole ecosystem put together, you can start to build automation to actually codify that. And then with backstage is sort of your central source to tell users what's happening to their workload. Thank you. All right, we'll head back over here. Thank you very much. I have a question about your networking setup. You talked about the headless services. How did you implement the multi-cluster networking? As you said, thank you. So I can tell you, our services exist beyond the cluster boundary. Unfortunately, the rest of our network architecture I'm not allowed to talk about. Sorry about that. OK. I have the same issue, so too bad. All right, back over here. Hey, thank you for the talk. Really interesting. I had a question about the cross-cluster communications. You said you were using some kind of service registry. Where did you put those registries? Inside the clusters or somewhere in your region? So the service discovery happens outside of the cluster. OK. Yeah. So then multiple clusters outside of multiple clusters within a region can register their services to something external to the clusters. OK, thank you. And there's another Spotify talk today, Alex and Yannick, who work on the squad that deals with that most closely. So you can ask them to. And they might give you a better answer than. OK, thank you. I was kidding. Thanks for your talk. I had a quick question regarding data stores and how you'll actually manage to, whether any of your data stores were deployed in Kubernetes or if they are externally based, did you recreate them as well and migrate them to a more golden path? Yeah, I think one use case in particular was our use of Vespa. So that's something that was not historically on Kubernetes, but was a use case that we were looking to support with this kind of cluster as a product versus the computer as a product model. And so that is something that we now support. Search, for example, in its own kind of cluster as a product offering, I'll say. I'm not sure if that answers your question. You're talking about other kinds of data stores, but that is one in particular that we're using this model to address. Got it. Are the remaining data stores not deployed on Kubernetes and you'll have it stand alone within your infrastructure? We do use quite a bit of managed data stores. So in that case, it's managed, right? So we don't need to deal with that. But yeah, so maybe we can talk offline maybe more about specifics there, but yeah. Thanks for the talk. I had a question about the first step that was like validation of the workload. Does that actually ever require knowledge about the workload that's running, and therefore require interacting with the devs? Yeah. Yeah, so as I was saying with the ops and squads, we have to make certain assumptions about what we can do safely. So if we do a validation on their service and we don't believe it's healthy to migrate, that's immediately a signal for us that, OK, this is an exception to the rule and we need to reach out to a team to get their service in a healthy state before we can do a migration. So the thing is, before we do any changes sort of to their production workload and actually start it, you know, migrating it, we need to make sure it's good. So that's our first indicator, really, is the validation step. Thanks, thanks. Hello. I'm just interested if you could give us insights about the infrastructure is being used for these Kubernetes clusters. Do you have like a bare metal or do you run virtualized environments? And the other question is, for which strategy did you decide for lifecycle management for these long-running clusters? You do, for example, node replacement? How does it work? Yeah, so first question. I think we're a pretty big GCP customer. So yeah, we don't have anything on prem. So I think that answers the first question. Then the second question is, yeah, we're tied to, obviously, some upgrade cycles via GKE. And so that's kind of the process that we follow for upgrading things. As I mentioned in the talk, we have clusters that I think we've had for, I don't know, five years, for example, and we've constantly upgraded them. And so it's a little bit of a different model than maybe the kind of ephemeral cattle approach. And yeah, so our clusters are super, super long-living. This model is a little bit changing that, of course, as we get demands for things like, like I was saying, ephemeral clusters for testing or whatever else. But yeah, for the most part, our clusters today are super long-living. For some of the lifecycle things as well, a lot of it's dictated on what upgrade you're doing. So say, for example, like the Docker to container D deprecation, that's a full node pool replacement. Or maybe avoiding the C groups V2 bug. That's another one where you have to replace the whole node pool again. And then I think, obviously, a lot of that's going to be dictated by how node pool upgrades are done by the platform you're using. Yeah, so maybe we'll go back over here. So I had a question about managing network policies. So when it comes to sharing microservices, what is your approach for configuring the network policies per workload? Like what is your approach for creating these network policies? Is that a responsibility of the workload teams? Or is that your responsibility? I can't talk about too much of this, because I think our security teams would get mad at me. But yeah, it's within the networking plane, I can say. OK, great. And Alex here said we could pick these two. They have a talk later today. They'll be able to answer your questions better than I can on the networking side. OK, great. Thanks. Yeah, back over here. Thanks. Have you encountered a scenario when one of these squads move across the spectrum of willingness or care about Kubernetes and how complicated would be to support such a scenario? Yeah, I think from our perspective, if they're staying within those kind of use case product boundaries, then it doesn't really matter. But if they're, for example, they've decided to entirely shift and have moved from one to the other, it would just be another user of that other type. We want to productize these things as much as possible and not have snowflake configuration for any one supported use case. And I think that's where the level of abstraction comes in. So for the computers as a product, folks, they do have access to a namespace. So for example, if they wanted to do more yamily things, like I have in the slide, they could do that. If they started off as a golden pass service, but they needed to configure or optimize their service, they can go ahead and do that. And that's what I was alluding to in the presentation around where you're drawing the lines between the different things is very important, because it's going to dictate how you can support and how you interface with your customers. Thanks. Hello, thank you for the great talk. I have a question regarding continuous delivery to Linux you use for multi-cluster deployment and for migration between clusters. Yeah, we use a couple tools. A bunch of them are in-house, unfortunately. And then we use a lot of the bunch of open source things like Config Sync and Argo CD with a bunch of orchestration type APIs to then handle the different cluster-to-cluster assignments and things like that. Thank you. Yeah, two more questions. I think we have a couple more minutes, but yeah, go for it. Excellent. I just got a question regarding the double deployment and traffic shifting. Are you using some specific load balancer during the migration? How do you manage that to go from a full first version to the second one? And with continuous traffic, do you have some balancing or percentage that you can set or something like that? Yeah, so our workloads are all configured with graceful connection termination to start. And because workloads can then connect to the same service object across a cluster, across multiple clusters, we're not doing any special load balancing per se instead of just putting more things behind the service object and letting that round robin. And then by enforcing scaling in the two directions, you can shift traffic. OK, thank you. All right, there's one, two more questions. Thank you. Well, my question may be partially off topic, but I'm curious. I have seen your slide where you said that you have 1.5 terabytes, not even terabytes, but terabytes per second of rigorous traffic. But normally, as far as I know, the encapsulation algorithms for bots in Kubernetes, like VXLAN, Genif, and so on, they have a limitation in the broadband. Sometimes it's 1 gigabit per second. So how do you deal with this bottleneck? Are you using a special encapsulation? I think with half a million pods at peak load, it's just spread out across them. It's just such a large amount of containers across so many thousands of VMs that we haven't run into individual node limits. All right, thank you. Yeah, last question. Hi. My question goes towards the cluster as a product offering. You mentioned your team is responsible for the stability and the reliability of the offering, but still you probably hand out cluster admin or more or less cluster admin to your users. Where do you draw the line there, or how does this work out for you? Yeah, so I mean, so far so good, I guess. It's something that we're definitely keeping an eye on. I think the folks that fall in that boundary are very knowledgeable about Kubernetes anyway, which is why they want their own cluster. So there is a fair amount of trust there. And when it comes to doing things like upgrades, it is a little bit more hands-on for us, than fully opaque in the case of the computer as a product. So a lot of the time we view that as kind of a partnership. We'll see as theoretically more people move from one to the other how scalable that is. But so far, it's something that we've been OK with. But yeah, it's a great question, because I think it's something that we're not worried about, per se, but it's definitely something that's on our mind. How much is too much support there? Of course, we're also a small team. And yeah, that's our concern. But it's a great question, because it's definitely something we're thinking about. Thank you. Thank you. Yeah, I don't think there's any other questions here. So we're happy to chat offline or outside of the room as well. Again, if you have any questions, feedback, comments, or you just want us to chat via email, feel free to hit that QR code. And yeah, we'll get back to you. But thank you, everyone. Thanks, everyone.