 My name is Anu Mukam and I work for VMware. So today we're going to talk about three main points, basically the mission of the SIG, the different updates about what we did since the presentation in Detroit and how to get involved. So the first one is describe what is exactly the SIG. So the Kubernetes infrastructure of SIG is one of the SIG mainly focused on managing and operate the underlying infrastructure that you use by the test infrastructure. So any different IT services used by the community is managed by the SIG. So we go from the different GCP resources to the AWS one and at some point we also operate with things like GitHub and stuff like that. So any DNS request, any DNS entry, we have to manage them and make sure we ensure we provide those different services to the community. So one of the first mission of the SIG and I'm gonna do a little of history here is basically the Kubernetes test project start as a Google project. So in the early days, Kubernetes was mainly developed by Google employees. So they leverage the Google infrastructure to build the entire test infrastructure. Now we have CNCF, we migrate the Kubernetes project and when I say migrate the Kubernetes project basically the code base to GitHub which is independent and controlled by the community. But it's not the case currently for the underlying infrastructure because for a long time we were using Google resources. So one of the first mission of the SIG and still the main mission is to migrate all those resources still inside Google. So we have a few bunch of workloads still running inside Google we have to migrate. So this is one of the first mission of the SIG. So we work currently closely with Google and the GCP and for to be more specific with GCP to ensure we do a transition from workload inside Google infrastructure to the community infrastructure. So that's the first mission of the SIG. The other mission is basically define policy and trying to enforcement around anything related to infrastructure like how we consume the different I would say GCP resources. Can we allow people to basically request a TNSN try and do the main delegation to another third party those kind of things. So it's about policy and for sense governance and policies around infrastructure. That's many responsibility. Also how we drive construction in the sense how we make sure we still on the budget. I will come to the budget later. And third, I will say third mission but it's like an implicit mission and basically provide transparency to the community. Basically what's used in terms of costs how we present how we give access give access to any contributor interest to have to request access to that information. So because we manage the infrastructure we have to do some collaboration with other SIG. I will pick for example SIG release. Basically SIG release is responsible to do releases either for a major milestone or patch releases but a real estate main generally artifact system packages and control image binary. So we have to work with them to make sure this is we need to define where they're supposed to publish all those artifacts and make them accessible to the world because we are an open source project so we have to make it, we have to make everything public in terms of artifacts. So we work closely with SIG release and one of the things we have right now is basically one of the mission is to like I say have full ownership of anything related to infrastructure. So currently I would pick for example the system packages they are still built and published by Google. So each release in any, yes, any time we cut a release we build all those artifacts the system package are basically released by Google. We have to interact with a specific team to make sure those packages are signed and published in Google infrastructure for the moment. There were talk about this by the SIG release about what's the robot and make sure ownership will be transferred to the community infrastructure. Another example is SIG scalability. Basically we work on scalability tests make sure they run inside the community infrastructure for a long time. We have for example one test running a 5K node cluster inside Google, we had to basically transfer that because it's a massive scale and that test needs to run every day, at least once every day. So transfer basically where this test is supposed to run is not an easy task. So we were working with the Google team for a long time now we plan to basically extend, I would say the infrastructure to AWS so we can also run, I would say one K node, two K node or five K node on AWS. The one of the benefit is also for, I would say cloud providers like AWS or Azure to be able to test directly with the open source project the scalability of communities under infrastructure. We also work with SIG testing to have full ownership of the CI supply chain because currently we still have work, like I said, we still have workloads inside Google we need to transfer and we have to have those relationship with all of the SIG. So we are not SIG, like some specific SIG, we can't operate by ourselves. We have that relationship with those SIG because they rely on us to do the work. Everything they do every day rely on the infrastructure we provide. So the first of it is basically registry.case.io is GA. We made an announcement eight months ago we say anyone can use it. Now we want to phrase the old registry. So I would say a little bit of history is basically the first release of, I think the first release of Kubernetes I would say the container image we were using a Google owned control registry. For a long time we were using that. Now we decide to migrate to a community endpoint. And we release, I think it was around May, registry.case.io. So now this endpoint is shipped to all the version of Kubernetes until 1.22. So starting 1.22.17, yeah. Every time you basically create a new cluster you will use the new endpoint. I will kind of explain why we're doing this. So you can check the link in this announcement in blog post explaining why we're doing this. So historically, like I said, we were using case.gc.io. And in this graph you will see the annual expense of distributed container image for the last year. And that's almost two millions for a year. On top of three millions we receive every year from Google. And that's a lot, which means basically if we try to do something else we cannot start because we don't have an approach yet to do the rest. And for a long time we were, I would say we were struggling because for almost two years since we have this endpoint, it's difficult to manage cost or manage infrastructure because having spending, basically when we spend two million on top of three million it's difficult to migrate the rest of the workload still in Google infrastructure. It became a struggle. You have to basically push back on request from the community because we don't have enough budget to, let's say, make sure any SIG want to do something specific for the own aid. Like we won't basically, I remember like, signals we come and ask specific nodes to run conformance tests related to CPU features. We cannot basically provide that because we struggle and we say, oh, we don't have enough budget for you. You might wait until we solve this cost situation. And that's why we have registered.gc.io. The other thing we did is to make sure we have our user or company using us is to freeze case.gc.io. So basically starting this month, we don't allow the community to publish new tags for the container image to do all registry. So we enforce and guarantee that any new deployment or basically any upgrade of workload using any sub-project of the community will use the new endpoint. Because unfortunately, we realize over time not everyone is interested to upgrade. We still have people using all registry for whatever reason is difficult for people. It'll difficult all companies slash platform teams slash developer teams are not interested to upgrade. So we want to force them to do that. And that's basically I, this is directly related to governments and policy enforcement. We establish a policy where we say, we don't allow anyone consuming the community infrastructure to use specific resources. We did another thing with Google. Like I say, we rely on Google teams or the cloud providers to make sure we don't break infrastructure in production because you say it's almost two million a year. So that's mean you have, I would say infrastructure in production using the community registry. So in order to not break the user, we walk with Google to make sure we redirect the traffic from kits.gc.io to the new registry. And we make sure that we download the same image layer. So anytime you, I think now, anytime you're trying to download, I would say DNS, not cache, you will be redirected to the new registry. Without, so your cluster will see walking and you will see the old endpoint but you'll be redirected to the new one. Basically the backend storage used by the new registry. And that's to make sure if people cannot upgrade, they are not break, is basically they are not broken by the phrase of the old registry. And how we did that in October, AWS announced they did a donation of three million. So we saw that as an opportunity to extend the infrastructure because the one thing we realized during our walk with Google is basically most of the traffic is coming from AWS infrastructure. So you have, I would say teams, companies, startups, deploying Kubernetes cluster but not using the managed services. And that in some situations, I think they don't really have the choice because regulation, I would say, yeah, mostly regulation because you have in some countries, you cannot use a cloud provider. So if you want to bet the strategy of the infrastructure, you have to use open source. So a lot of Kubernetes installer rely on the convention for structure. So when AWS announced the donation, we saw that as an opportunity to instead of structure because the bulk of the traffic is coming from AWS. So one idea would be to serve the blobs layer of the different container image directly from registry. That's why we did. So this graph represents the traffic from k.disha.io signs, I would say beginning of the year. You will see on March, around March 30, there is a drop in traffic and that's much the redirect I was saying. k.disha.io is redirect to registry.gizotero. So we don't serve directly blobs layer from Google, we now serve them from AWS, more specifically, A3. So we have a drop of traffic. So we make sure anyone is coming from AWS because we built some lodging in the registry.gizotero. Anyone is coming for AWS will pull from A3. You will see that signs, yeah, for 28, yeah, you have traffic increase and that mostly match the redirect. Most of the traffic is also coming from the registration because most of the user, most of the infrastructure consuming us is in US. So this is the current infrastructure of registry.gizotero. So anytime that I would say a Docker client or a content runtime, it registry.gizotero, it would go through a GCLB, basically a Google load balancer to make sure basically to make sure we have a global load balancer. So we cover basically any requests coming from the entire planet. So we leverage Google infrastructure because the load balancer is global. So from any point, I would say, from an internet perspective, when you eat the registry.gizotero, you will be redirected in a specific GCP region. Where there's a serverless, I would say a cloud-run service, which is a serverless platform that basically will enter, will take your request and identify where to redirect you. If you come from Google, you will be redirected to artifact registry. And that's to make sure we keep traffic inside the craft provider. When I mean that, it's basically make sure anyone's coming from Google, we stay in Google. From a bandwidth perspective, it's interesting also because that minimizes the cost of the egress. And anytime you basically come from, I would say AWS or a non-cloud provider, you will be redirected to S3. So if your infrastructure is on, I would say you are the worst one from AWS, you will still be in AWS. Because we want to basically maintain the same latency we have with the overage registry and also minimize the cost we have. So the entire approach of this architecture is based on cost optimization. So there's no, I would say, not really technical point to make that. We wanted to basically address the cost program because spending two millions on just egress is a lot for an open source project. I mean, I don't think there's, maybe the new scanner project, yeah, they use fastly, so they don't have a cost problem. But I don't think you will see something like that in other open source project. So the entire infrastructure here is cost, driver, and at some point we became fine ops expert. Another update is artifact case scenario. Basically, this is an endpoint focused on serving chaos binaries and cry-to-bind rates, because also chaos is one of the sub-projects inside the community acting as one of the first Kubernetes installer on AWS and also GCP. One of the, I would say, one of the first Kubernetes installer for a cloud provider. For a long time, we have a lot of, for certain people using that, I used to use this project to deploy Kubernetes customers on AWS. And we realized that we also have a cost problem here, where we also have the same problem where there's an issue with egress, related to chaos, and we have to migrate. So we leverage. So we have, I would say, a global problem with egress, because somehow it's a metric of the success of the Kubernetes project. We say people adopt that, people use that, but that success as a cost, we have to pay for that. So we are at a risk of basically being able to render infrastructure for 2023, because if we run out of a project with my shutdown infrastructure accidentally, so it's a big risk. So with donation coming from different cloud providers, we leverage that. And we say, okay, we're gonna move artifact is a title to S3. And you see in the graph here, you see basically since February 20, again, February 28, we did transition, we have like, we serve almost, I would say almost 12 terabytes per day, almost. I would say, if I pick just one region, one region is nine terabytes, and the rest is three terabytes per region. We beyond, I think we pass 10 terabytes per day. That's a lot. And someone has to pay for it. So the idea is to basically, okay, if we want to be able to serve anyone making a bet on us, we need to leverage any feature possible or any billing optimization from cloud providers to make that happen. And that's why we're trying to transfer specific artifacts from one cloud provider to another one. So those are basically the main updates since Detroit. We're gonna be focusing on address cost problem. And the next thing is, how are we gonna continue as a SIG? So we're working on specific company, mostly CNTF members like Ubimatic to basically extend the infrastructure. I would say the C infrastructure to AWS leverage that donation. So we have multi-cloud approach related to CI. It's kind of dog footing because we also want to make a point about basically the Kubernetes project is capable to leverage different cloud provider to make sure we can provide CI for the community. And community in the sense that it's not only about a Kubernetes project but also any open source project in the CNTF landscape. Because, yes, Kubernetes is a project but we also work for other CNTF project, Contronergy for example. I know we have like confirmance tests. We run on our CI to make sure there's a compatibility between kids and Contronergy. I think we also have the same conversation we've prior at some point. So it's not only about us, it's also about basically the CNTF landscape because a lot of projects are based on Kubernetes. So how we make sure we support also those projects if they come to us and there's a need to ensure compatibility or confirmance. Other than working with CNTF, we also work with basically third party like Fastly to basically get access to different services. Fastly, well in the future, I think this year we talk to them, they will provide city and services to make sure we can distribute different and artifact generated by the community. So if anyone want to be involved, we have a bi-weekly meeting around 10 p.m. Central time and 4 p.m. is time. There's a charter you can check. We have a Slack channel, a GitHub repo where we gather all our issues and we have mailing lists. Is there any question? Hey, quick question. So one aspect of the Kubernetes infra is the Slack, Kubernetes Slack, and there's a moderator bot for it which has been logging every message in full back to Google Storage, so really kind of wasteful. So I fixed that, but it still needs to be redeployed. Is that something that I can help with or like how does that get done? Okay, so in that case, in that case we should reach to SIG Contributor Experience because they are full ownership of moderation on the different communication platform. So what we do as a SIG is we provide infrastructure where you can deploy workloads for a specific SIG. The responsibilities share at some point but if you want to basically improve, I would say this specific tooling, you should reach to the technical lead of SIG Contributes. Any other question? I got one question. Currently you're leveraging AWS. And that's just one party. Why not leveraging all cloud providers? So if, yeah, okay. Your question is we currently use GCP and AWS. Why not use, for example, Azure or Edge in a cloud? Because I will answer your question with two parts. The first part is I'm a secure, so by definition I'm a community server. And the project is not a legal entity. So we rely on CNTF to basically interact with all those cloud providers. And one of the, I will not say an issue but more like we cannot go talk to a cloud provider that can say we want to use them. So I mentioned something during my talk is basically we rely on donation. So that's what's happening. So basically GCP donated nine millions three years ago to basically boost the infrastructure. AWS donated three millions for this year. So the intent is talk to cloud providers, basically talk to cloud providers and say, we would like to extend this infrastructure to your cloud, inside your cloud providers. So how this is work? So we need to not directly talk to them, but express our, I will not say the need, but basically the expectation we can run on specific cloud providers. Like we can talk to any cloud provider, we can reach out to any cloud provider and say, oh, we want to run to you. But the thing is like we rely on donation. So we need to be independent. That's the thing. If you basically know a cloud provider willing to donate resources, CNTF has a credit program where you can basically donate that, we will interact with CNTF and we will leverage your donation. That's how, that's the overall process. Any question? I think that's it. Thank you for coming.