 Well, welcome. Thank you so much for being here. This is our talk about how SIG release makes Kubernetes releases even more stable and secure, hopefully. My name is Veronica Lopez. I'm tech lead for SIG release, and I also work at PlanetScale, the company behind VITES. Okay, everyone. Thank you for coming today. I am Marco Mudrinic. I am a senior software engineer at Kubermatic. I am a release manager for the Kubernetes project, a long-time SIG release contributor, and we are definitely happy to see you all today. Let's get started. Yay. Okay, so this is our agenda for today. A little bit of a SIG release introduction for those of you who are not familiar with it. A bit of the release team then, but the supply chain security registry changes that I'm sure a lot of you are a bit familiar with that. And last but not least, the packages, the RPM and WM packages. Yeah, so let's get started. Welcome to SIG release. If you've been to our previous talks, you might already be familiar with what we do, but if you don't, the quick recap is that SIG release is a group responsible for ensuring quality Kubernetes releases. This means a lot of things. This includes managing the releases from end to end, following the progress of our release cycle, guiding contributors along the way, but also maintaining tooling needed to release Kubernetes. This is very important because if you're a release manager or a release engineer for any other project, either open source or internally, you might use other tools that are available for free or paid or whatever, that are already available, but we build our own tools. So this has many implications that we will mention along the talk, but we're a self-serviced team. We don't just produce the releases, but we run the show behind that. And well, also SIG release ensures that each release is stable, reliable, on time, and secure, especially with so many collaborators from different companies and individual contributors, et cetera. And of course, this wouldn't be possible if we didn't work in collaboration with other SIGs. So the latest Kubernetes release is 127, called Chill Vibes. If you know Sander, you understand this vibe. Sander was the lead of Kubernetes 127 and is the first release of 2023, with 16 enhancements, 18 on alpha, 29 on beta, and 13 on stable. And the release team is a sub-team under SIG release. I know that a lot of people are confused about this, like, who's the release team, who's SIG release, et cetera. So I hope that this will shed some light on it. We usually avoid mentioning people's names, because we're always afraid that we're going to miss someone, and it's heavily rotating. So yeah, if you have specific questions, you can ask us. But anyway, people who are on the release team take care of roles such as enhancements, release notes, communications, bug triage, CI signal, and docs. Again, for a project as large as Kubernetes, we need to have those sub-teams, and it is critical to have a sub-lead for those teams. One of the main responsibilities includes ensuring all changes on time that has been mostly our responsibility lately, and making sure that the project stays stable through the release cycle, but especially before the release. And another popular question is, like, how do I join the release team Veronica? So we have a shadow application survey that just went out, and we will have that, okay, yeah, that's already there. Sorry, it's very recent. So we recommend taking the following steps to prepare. So first, join the dev and SIG release mailing lists. Second, which is the most important for me, is check out the 128 release cycle calendar. This is especially important because it's a long-time commitment. It's like a multi-month commitment. So there are many people who will be joining you in the release, but if for some reason you have an entire month of the release that you cannot join us, probably don't apply in this time, and there will always be more cycles. And last but not least, check out the release team role handbooks that my team here has been putting a lot of effort and love to keep updated, especially Jeremy. And so yeah, they're public for everyone. And all of these resources, you can find them through this QR code. I'll give you a few seconds for you to scan it. And yeah, now, this is not part of the slides, but it is very important for me, Veronica, to let you know that joining the shadow program is not the only way you can contribute to our releases. That is one way. However, for better and for worse, this has become really, really popular, so chances are that you might not get it. However, you can always join us in Slack or different GitHub issues where your talent might be needed. And the only thing you have to do is literally show interest, ping us, attend meetings, and it's all in these resources. So there are some nuances that require the shadow application, but it's not the only way. If you are curious about this, you can talk to us the whole way, track. Okay. So these are the dates. The shadow application closes on the 2nd of May, and people will be notified around the 9th of May, starting right away in the 15th of May. The release team is Grace. He's here somewhere. And the Immersive Advisor is Leo. He's also here. And they're sitting together. Okay. So with this done, let's go to how we keep things secure. So I have to say that part of our security process is not just the offensive and cool things that other people are talking about in other talks. It's like things as simple as having everything on time, all the chair picks in at the right time, and little things like that that sound like organizing obsessions. That helps us be more secure. And if people in the community respect deadlines, we are able to perform the test cycles and let things sink in to ensure that what we are shipping to you all is reliable. The more exceptions we have, the less secure we are. So we start simple. But then let's go to supply chain security, which is a very hot topic to say. We're now Salsa 1 and Salsa 2 compliant. Salsa 3 is in progress. We have to be very honest about this. So Salsa 3, we thought that we would achieve it a bit earlier. Especially because Salsa 1 and Salsa 2, if you have attended other talks by us or our teammates, the truth is that in order to achieve those levels, we already have a lot of tooling done. It's not like we worked on purpose for Salsa 1 and then we got it or certified however you want to say it. It's like we already had a lot of things in place. But for Salsa 3, things get more complicated. I don't know if you're familiar with it. But if you're curious about all the things that have to be in place and that will have to be in place, feel free to look at the link in our slides. Now, this is where it's relevant to know, to remember, that we do our own tooling. So things like, for example, the promoter, our image promoter was built by our team at some point many moons ago. Sorry. And when Salsa didn't exist. So now we have to make it work together with what Salsa requires, signatures and many other things. But yeah, so it's an interesting challenge. You can follow all the work through the cap for more information about the progress. So what we have done is that design and efforts have already graduated to beta in Kubernetes 126. We started signing the binary artifacts starting in December. However, we encountered some issues with signing the container images along the way. And unfortunately, I hope that you can empathize with this. Like we didn't know the failures or the weak spots till we tested it in production. So we learned a lot from this and we're working very hard on fixing all those issues, particularly Adolfo and Carlos right here. And what we do have is we have been refactoring the promo tools, the signatures, registries, rate limits, like a huge amount of work that you cannot even imagine. That the only thing you see the outside is like someone is cutting a release with next, next, next. So we did encounter many problems implementing salsa concepts themselves. But again, it's all about the tooling behind the implementation. So just for you to have a bit of context, we have 30 images for each release that we published to over 40 different registries. That means a lot of push signatures and other relevant operations. And you know, as a distributed system, anything can fail randomly for reasons that are up to us or not up to us or whatever. So if you usually have to be mindful of that for a couple of computers, five computers, 10, whatever, imagine our scale. So it's super easy to fall into rate limits, for example, and other issues that are not so obvious when working on the implementation. Some examples of this, as for example, the Google artifact registry rate limits, and the six store cosine rate limits that even though we have people who we have the privilege to work to have people in our Kubernetes team who work directly with six store or chain guard or people who are able to move to bend some rules for us temporarily, it has been challenging. So then we had that right and that got temporarily fixed. But then if the promoter fails in the process, we can always recover. So it can happen that some images remain completely or partially unsigned. And like, there is no way to say like, Oh, let's just go back and don't sign it. Because it was it was very messy. I don't all of this, by the way, is documented on slack very unglamorously. So if you are curious about this, literally just go into our release management slack and you can find all the details. And yeah, so this actually happened for the February and March 2020 23 patches. And so the way you experienced this was that we we had to delay the releases for like a day or two or something like that after we experienced this. So when when you get a delay of this nature, it's not because someone forgot to cut a release or was lazy. It's like, because we were working very hard, trying to put things together. Okay, so enough of the negativity. Now, on the positive things, what we do have and things that are really working really well. We have many tools working right now that you can start using both on Kubernetes, but even if you're brave, start adopting them for your other projects. So we have bomb, the whole lot and more. Now, this is not in the slides, but salsa one is right around the corner. So I believe that there will need to be a couple of tweaks to upgrade to bomb bomb and other related tools to make them compliant with salsa 1.0. But as far as I know, that will be on our side and not on the user side, but we'll see. The goal, as I said, is to make these tools available and usable for everyone. That means now that many projects like BTS are looking for to be salsa compliant, one, two, three, etc. Our dream, or at least at all folks dream, is that our tooling can be literally adopted and into transparently into your other projects. Right now, in my experience, it's not as easy as that, but I have to say that we have borrowed some knowledge. But if you want to learn more about this and challenges behind that specific effort, check out the Secure Your Project with SIG Relief Supply Chain Kit session that Carlos Adolfo are going to have tomorrow at 4.30. So yeah, with that, I'll give it to Marco. Yeah, thank you. I'm now going first to talk about some change that you definitely had a chance to hear a lot about. And this is a joint effort between SIG Kit, SIG Release and many other SIGs in the community, as well as individual contributors. And it is about register changes. As you may know, we introduced the Registry KFS IO at KubeCon NA last year as a new front for all Kubernetes images and as a replacement for KFS GCR IO. What is actually the idea behind this and why are we doing this? Is that we want to serve images from both GCP and AVS, but potentially also from other providers. We got $3 million in cloud credits from AVS last year, also at KubeCon. We wanted to take that opportunity to use those credits and serving images from AVS seemed like a great idea and a great way to use those credits. However, just introducing the new image registry was not really enough because we had to make sure that users migrate from KFS GCR to Registry KFS IO. And we had a lot of communication in various different outlets and we tried to reach to as many users as possible because this required some manual interaction from the user. We had to enforce this change in some way because the real issue that we had and why we had to do this in the first place is that we were at very high risk of not having enough GCP cloud credits for this year, for 2023. To put in some numbers, we spent 600k more GCP cloud credits in 2022 than we initially got from Google for that year. And also thanks to Google for saving us that year, but we couldn't afford that to happen this year again. We had to enter those some changes. We had to enforce that migration to Registry KFS IO and we are going to see now what this means from the cigarette side and how exactly we decided to do it. To make that direction go faster, we had to bend our policies and if you're not aware of that, check out the document Kubernetes application policy. There's also a link on those slides here. If you didn't already, some TLDR version is that if you are talking about some stable feature, at least 12 months must be allowed for users to migrate away. But we didn't really have 12 months. That's way too long time for us to wait and we wouldn't be able to make with GCP cloud credits. While we didn't want to remove KFS GCR IO, it would still stay in place and would be accessible. We had to introduce some backwards incompatible changes and much before those 12 months that we should have waited for. How it looked like actually? The first part happened a little bit before December. In fact, it was changing default registry to Registry KFS IO in Kubernetes itself, concretely speaking in QBADM and QBLID. That initially happened in about 25. But what happened in December? December, if we cherry picked or backported, this changed to all releases up to 122. This was the final supported release at that time. We backported this change for all those supported releases at that time. Why is it a problem? Because as per our rules and how we defined it, you can only backport bug fixes, but this was a feature change. It's something that can affect users, especially those that are running secure set up that have favorable rules and stuff that it has to be adjusted to be able to pull from Registry KFS IO. But unlike the next change, this was overreadable and very well documented that you can go, for example, change in QBLID settings or QBADM settings and get back to KFS GCR IO. However, they didn't give us results. When we were looking at billing reports, the amount of credits that we spent was not going down at all. And this is because users are probably not that fast to upgrade that we needed it to be fast. So we collaborated with GCP and with many other contributors as well. So starting with 20 March this year, we or GCP implemented mandatory redirection from KFS GCR IO to Registry KFS IO, which works as in diagram below. So when you request something from KFS GCR IO, GCP is basically going to decide if the image is going to be served from GCP or you're going to be redirected from to AVS. This is something that users can't control that is like completely bending our policy because we did something that is eventually going to break someone, but the user can't really choose like if they want to migrate now or not, basically we decided for them. And this was not the easy solution, but we had to basically choose if we are going to break all users in case we run out of credits, imagine that GCP project that host images gets closed because there's no credits or that we eventually break some users who are running some more secure setup who might more easily adjust than us to, I don't know, once the project is closed it's like because there's not credit it's not easy. And this is basically the result. The first diagram is showing the traffic on GCP for US and you can see that starting with March 20 it went really down like the peak that we have in traffic is similar that we had to traffic on weekends before the migration and this had significant impact on billing because we are now going to be under three million dollars that we have from GCP. We are actually going to be I think around 2.2 million, so this is pretty great. And you can also see that Bytes download that on AVS is going from 10 terabytes to 40 terabytes in some regions, so it is seeing constant going up, so it is definitely working. That would be it about it for registered changes. I want to take opportunity to thanks to everyone, especially to Benel, their deems, and all the other folks in CK, in C-release and every contributor that work on this. Now let's talk a little bit about packages. One of the most demanded improvements is that we improve the state of Debian and RPM packages. Debian and RPM packages are basically the most often they used to install and provision Kubernetes cluster. They are in the official documentations, many tools are using those packages and we are constantly asked that we improve them a little bit. Now let's see what's the actual issue. So we need to improve the stability and reliability to give some example. A few months ago we wanted to update CRY tools to a newer version to 1260 and because the way our packages are structured we were able to do that only for all patch releases at the same time. That was like for 1.25 to 1.22. We could say we want to do it for only for 1.25 or only for 1.26 when it's come out. It's already out, but that was a few months ago. We had to do it for all patch releases and there were uses that were running with NAD 1.5 with all the Kubernetes versions and these CRY tools upgrade basically broke them. They were not able to downgrade because we were enforcing 1.26 and basically they ended up in a very bad situation. We reverted that change eventually but we want to be able to structure our packages in a better way. We also want to create packages for pre-releases, for alpha, beta and RC releases. We think this is super important for the Kubernetes project and for release team because if more users can test those pre-releases we can be told about issues, stuff that we need to improve. So that would be a great change for sure. And we eventually, if traffic allows to allow others, some projects to easily create and follow these packages like for example, Minikube. And this is something that we are trying to solve as part of CAP7031. Definitely keep an eye on it if you're interested into this. We will also be keeping everyone up to date as we progress. To be able to better understand what are we going to change, let's see the current situation. Initially the Kubernetes packages were built, published and hosted by Google. The situation today is not much different. The only change is that recently we started building packages in our own pipeline but packages are still published and hosted by Google. So the way it works is that we as release managers, C-release or anyone in the Kubernetes community don't have access to the Google info for packages. The way it works we have currently two folks from Google assigned that are publishing the packages for us. And even those two folks have very, very limited access to that info for packages because this is the same info that is used for GCloud and for some other important packages that are being published by Google. So they consider it very strict and don't allow access that easily. The other problem is like in the workflow, like in the what release managers are doing. For example, the way we cut releases is that release manager runs stage. Stage creates all the archives, binaries, images, puts that is staging buckets in registers. Then K comes image promotion, which takes images from what we did in staging and promotes them to register KSIO. And after that we can really proceed with the release but we have to go to Slack, ping Google build admin, ask them to get ready. After they receive the message from us, they have to internally get the permissions, get the access to the info, which is temporarily, I think maybe 24 hours or something like that, maybe even less. And let us know that they are ready. The problem with that is, for example, all those people helping us are in Pacific time zones. For example, we have many release managers in Asia, in Europe. So we usually have to do some handovers. So we need multiple release managers. And it is like complicated stuffing on our side as well. And only after we get a green light from them, we can go to the release step, make the release public. And after that, we have to ping them again, they need to start the publishing process. We have to wait like 30 minutes per release. And only after get okay, we can announce the release. The way we want to solve it is that first of all, we are being sponsored by OpenSoC to use their OpenBuild service, or in short, OBS platform for building, publishing and service packages. I would like to take a quick opportunity to thanks to OpenSoC, and especially to folks who supported us to create some, let's say, proof of concept that we can see that OpenBuild service can work for the Kubernetes project. And this time, we will have access to the platform so that we can fully manage and publish packages on our own. So we don't depend in this case on OpenSoC folks, but it is us who have the full access, who can create the packages, who can manage the structure of those packages, how we publish basically everything. And the important part, they are two important parts. The first one is that critical stuff like GPJ keys and something that is hard to manage is managed by OpenSoC. So like infra and such stuff that is, that will be burdened to us because we don't really have operational people in the project, like no one wants to have a page or something like that to answer calls because some infra is down. So yeah, that part is solved by them and thanks to them for that. And also it is open source, completely open source. So if we want to eventually do that hosting ourselves, we can do that. But we need to change our release process significantly because right now it doesn't know much about packages and now it needs to integrate with OpenBuild service and this query is some significant changes. The way how is it going to work now is that first change is going to be in-stage. Besides building binaries and container images, we are also going to create spec files for packages but also one archive that is going to contain all the binaries for each package. And then we have image promotion as usual and we go to the release. Now we don't have that ping part. So as soon as image promotion is done, release manager starts to release and then release becomes public. Release step is going to via API reach to OBS and push those spec files, push those archives that I just mentioned that is automatically going to trigger the build on the OBS side that takes again 20 to 30 minutes but now this is a bit more automated. And then OBS is going to automatically publish those packages and eventually this is now a little bit simplified. It is going to let release step know that packages are ready and that is going to be propagated to the release manager who can then announce the release. So where we are right now is that we change the WN and RPM spec files so that they are better support OBS and that we have better structure and we even change the tooling so we can generate based on those templates that we created that we can generate spec files and archives for each release. This is like 95% done. There is some minor changes that should be introduced but we can say that we are done. Then we have to integrate new tooling for generating package spec files and archives in Corel so that this stage step that we mentioned basically happens actually. And then we have like to implement what we had in the release step to invoke the OBS APIs for Corel and finally what is also important last two steps is to implement test to assure stability and reliability of packages because we want to be we want for all folks to be able to easily upgrade from Google packages to OBS packages without changing much. So the idea is that you are going eventually to change the JPG key. Maybe we even find some solution for that too but basically we just have to change eventually JPG key and the repo that you are now pulling with OBS and then you will be good to go. And we also have to secure supply chain so we have to sort out permissions access to the open build service platform and generally stuff like that that is going to be a little bit in background so for us to make sure that packages are secure. Once that is done we need to communicate a lot, communicate, communicate, communicate. This is now blocked because we don't exactly know how migration is going to look like from the user side and when this is going to take time I will be honest this was supposed to be done like some alpha in 127 but it didn't happen especially because of register stuff. But now that we are a little bit done with that we can spend some time on this and try to get it working for 128. We are also going to get some support from CK, Cinfra, CNCF and other contributors so I think we will be able to do it. The help is always appreciated so if you want to join, if you have any feedback, if you want to contribute, if you had some issues with the current packages and you would like to see those issues solved please reach out to us. For more information about the progress follow cap 7031, if you want also you take a look at the cap if you want to see about the implementation and make sure to subscribe to the single list mailing list and dev mailing list. We will also make sure that we keep the mailing list up to date that we provide updates on blog and stuff like that so that you are aware of it. Now we are coming to an end, how to get involved with CIG release. You can always reach out to us on CIG release channel or Kubernetes Slack. We also have our own mailing list and we will be very happy to see you all on our weekly meetings. They're on Tuesdays, altering times so we have meeting for EU and Asia time zones and for US so pick what is working for you and all those links will be there's already presentation on sketch so you can all links that we had you can access them via sketch and the presentation that we attached. We would like to take opportunity to thank you all for coming today and please leave feedback on our session here is a QR code for that so yeah thank you for coming today. I think we have some time for questions so any questions? There's one question. Thanks for the presentation. Regarding the registry redirect, I was not impacted but do you have stories from customers that maybe did not have firewall rules to go to AWS or that were not expecting the traffic to go not go to Google but to go to another cloud provider? So when this change was happening the folks who were responsible for it tried to reproduce as many cases as possible so we had documentation in place that is like what type of error you can expect, how you can solve, how you can detect if you are affected or not so we didn't hear much from users. The GCP users were handled by GCP internally so us as CK CMFRA and CG release didn't hear much so we think that it went relatively well but yeah there are probably affected users and I hope that documentation that we worked on helped them to mitigate issues they encountered. Any other questions? No? Okay thank you. Thank you.