 Today we'll be talking about the cluster API release team History about how why we started it how we assembled it and some of the learnings that we had from running a release team I am you would urge I work at VMware and I mainly work on cluster API And I served as the release lead for the 1.4 release cycle. Yep, and I'm Joe Kratz it I'm one of the release leads for 1.5 And I'm currently one of the maintainers of the cluster API for OCI provider And I work at Oracle So today we'll roughly go with the problems that we had and the goals that we set for ourselves to address these problems The release team and some of the learnings that we had from running a release team So last year at Kubecon EU We announced cluster API v1 and that was a huge milestone for the project because it signified Maturity level for the project that means there'll be more and more other projects in the community that now start depending on cluster API and Examples are for example, the cluster API provider for OCI, the cluster API provider for AWS Azure and so on And this means that the cluster API Had to get on to a predictable release cycle so that all of the other Projects and softwares that were depending on it can base their own cycles and release planning according to the release schedule that was announced And other problems that we were trying to address were at that point until a few weeks until a few months ago cluster API Was basically doing releases ad hoc As soon as we think that a release is ready. We just used to cut it no dates announced ahead of time in most cases and The other problems were some of the other problems were the releases were mostly done by just a couple of folks within the community And it was generally the maintenance and this was generally chore work And the knowledge about how to cut releases for the project was in their minds But not really documented so they were the only ones were able to do it They knew the know-how of how to cut these things and how to Properly get releases out of out of the project So we wanted to address these problems and we set us set some goals for ourselves to address these things So the first one as I mentioned is to move cluster API to a predictable and deterministic release cycle and communicate this release cycle ahead of time with all With the community with the broader community so that they can plan their own work and their own cycles they're based on the timeline that we announced and And next is to basically spread the knowledge of how these things are done to more members of the community and not rely on just one or two people because that would not be great but both for the health of the community and for like sustainability of the community in mind and This meant improving our tooling improving our Documentations and taking advantage of automations to do some of our releases and We to start all of this we started looking at the Kubernetes SIG release team because they were built for problem solving similar problems and We were looking at them for identifying best practices some of the learnings that they had and we tried to come up with what we could do along the same lines Yeah, so like you Raj said we looked at the Kubernetes SIG Release cycle and we kind of took the best practices and kind of mimicked what they did excuse me they did there So we went with a 717 week release cycle, which is about four months, which is what Kubernetes does as well So in this time we're going to cut patch release cycles and these are going to be in predictable times and so you'll have a Minor patch and then we'll do four of those within the release and then the other minor patch will be within the four weeks Sorry that we have four months as well. So it'll be eight patch patches total. So in this example, we would have the 1.3 X release and the 1.4 X release multiple times throughout this release cycle At the same time we're going to towards the end cut to beta releases And as well as to RC releases for the community to test and and work through and at the very end then we'll have the minor release and so what that kind of looks like is We're gonna have We're gonna have this offset from the Kubernetes Release cycle as Kubernetes is a dependency, right? And so there's other dependencies that we have such as the example here would be container runtime So we want to make sure we're offsetting We're pulling in that next version of container runtime that supports the new version of Kubernetes that we're trying to Support as well. And so you can see that we're gonna in the 1.4 release that just came out We're gonna pick up the the new 126 as a part of the the original release and then over the life cycle through patches will pick up new releases as well and so this is a rolling window so the the 1.4 release will have a rolling window and then the 1.5 will have the rolling window of releases that will support as well So the other thing that we're like we said kind of trying to do is make sure all of this is communicated Clearly so in our documentation and in our in our github repository is the documented release cycle for the new releases so this is the the 1.4 release cycle so you can see kind of the times of when the next patch is gonna come out and so this will let our providers know When the next patches come out to test and then we can also inform them of when the betas come out and and testing Now that we had the goals that we wanted to achieve and the cycle that we wanted to set ourselves to the next thing Was to form the release team that takes these goals and sets them into action So as I previously mentioned we were looking at the Kubernetes SIG release team to basically get inspiration for how to form our own release team and Looking at that particular SIG release team We realized that they had a lot more roles to fill than what the cluster a pair project needed So we had to shrink the size of that release team to something more suitable to what the cluster a pair project needed So we ended up with three broad categories that are part of the release team one is the release leads team The next is the communications the docs and the release notes all of those roles were combined into just one role call the communications manager and the last one was We had a team that was dedicated on like CI and buck triage and automation So their role as defined as like just to improve our automation that we have like Maintain CI health and then bring up if there is any other issues Joe will go about it So we'll talk more about it in a bit and the last part is shadows They are a really important part of the release team will go about it in a bit So starting with the release lead so the release lead is overall responsible for making sure that the release is on track and the primary primary purpose of the release lead is to make sure that it's All the work across different teams and the community is coordinated So the release lead is responsible for making sure that if there are any release blocking issues Then they are brought up in the community or if there are any Critical patches that need to be go that need to go into the project Then they are made sure that they go into some patch release as as dictated by a backboard policy and so on and Basically take ultimate Accountability for the release and making sure the release takes to the release cycle Besides these roles the release lead also was responsible for Mainly assembling the current release team and also helping with assembling the next release team and Grooming candidates so that they can become the next release leads and release candidates Yeah, so the next team is the comms team communication team And so I was a part of this on the 1.4 release And so some of the responsibilities are going to be communicating the important dates to the Community so when the next patch release comes out we try to get that in front of people ahead of time so it's not just a Jumping up in front of them And so we try to communicate this via slack and office hours And get that out in front of the community the other thing that the comms team is responsible for is Generating and and making the release notes prettier or more more readable and more communicative as well as Doing Relevant info to the to the release community so one of the things that We spent a little bit of time on was was updating notes specifically for our end users as well as the providers Right so those two different kind of classes of users are going to want to and care about different things So trying to make sure that we communicate that in a clear fashion And then the other thing is improving documentation and automating some of that documentation So trying to make the job easier of the release team, and so they're all in charge of all of that So some examples that we have of what we try to do at least in the in the slack channel is Informing the users. Hey, these are these are the releases So in this case the one dot one for one release and what's changed and the important relevant information With an obvious link back to the actual release itself And same with the the RC release so that way people in the community can kind of see hey This is what's happening, you know a lot of people are probably on github notifications cool That's great, but we want to try to make sure it's it's widely disseminated and shared amongst the group The other thing that we tried to do was I want to say we borrowed this from project Falco Probably but a weekly release update Sorry, not a weekly release, but a weekly update of the changes So we would try to say hey, these are how many PRs that are landed here's how many bugs were fixed Here's hey Maybe there's some some updated features that you might care about and can kind of see those landing on main and Getting ready to get into this release right that way the community knows what's going on So trying to just communicate as much as possible The other and the other big team is the CI team and so this is kind of a CI and and bug triage team right they're responsible for Gating the release so if a if a test is failing or something's going wrong They'll let the release leads know hey something's breaking. Let's stop right the typical things you would expect But then they're going to also communicate to the to the wider group CI is unhealthy CI is degraded we need to look at this and working with the community so they may not actually fix the issue right but They're responsible for driving the change so they might work with the maintainers to say this is flaky This is failing. Let's let's make this happen in a fix And so they typically are the ones that the release lead will say are we ready to go and they're kind of the blocker if things are breaking And then also just like the release Community sorry the the comms team they're responsible for automating and getting that stuff cleaned up and ready to to go so just a real quick example of Test test grid so they'll they'll scan this and look to see if things are failing They kind of are also aware of like if the test is flaky to either pause or or not for the release And they may set up automation to say hey the test has failed five times in a row Let me know instead of having to scan Through the change through the the test UI and so in this case we're looking very green everything's good So that the comms team or sorry the the CI team would say let's go ahead and cut the release The other thing like I mentioned was the automation so in this example They put together a really nice template super simple To be able to say there's a flaky test or a failing test And they they would be responsible for for cutting the issue and GitHub to let the maintainers know So this is a really like simple easy fix and easy change to make the process a little bit better for the other teams As they come along for the release So it's just a really simple UI to drop in Changes or things that are happening to better describe why it's failing and work with the maintainers to get the failing test fixed Thank you one of the things that we wanted to do with the release team was basically Grow the knowledge within more parts of the community and not just limited to a few members within the Maintenance group or the people who generally work on the project and shadows is a great way to do it So we had shadows in every part of the release team So there were three shadows in the last release team per section So we had three shadows following the release lead three shadows following the comms and three shadows following the ci manager and They they they become ideal candidates both for being part of the next release team leads being part of growing in the community and becoming reviewers Because when they are part of the release team, let's say they are part of the ci team right So they they generally get a sense of oh what prs are going in what bug fixes are going in what what broke tests or what didn't break This and they generally get a feel of how the project is laid out and so on So it's a great It's a great excuse for people to go and explore more parts of the project Once we become a shadow and this is a great way for the community to grow because then you have a way for people to Get into the community and then just share grow knowledge and this segues nicely into the secondary goals that we had was Yeah, grow the community make it more sustainable. Don't just rely on a few Few members of the maintenance group or few members of the community that regularly show but just Find an organic way to grow the community get more people involved and Being part of the shadows team allowed them to do that because It was a low barrier for entry and The shadows only felt like they they they just could it was a great chance for them to just observe what's happening And then jump in and get them get their hands dirty as much as possible as much as they're willing to do Right, so we were able to grow the community a little bit. So the knowledge About how our release was cut Was shared across three times as many people than it was before so that was a huge Improvement that we had by running this release team and the release team Helped us grow the team to become more hechas So we had more members across time zones working in the community and more of more of them were available throughout the throughout the throughout the day to like see the ci's health or maintain the ci's health and See if there were any Release blocking items that were coming up and so on Yeah, and so one thing on the shadows as well one of the things we're actually trying to do in Somewhat of an iterative approach is the release leads team We're going to hand off actual work to the shadows to drive those changes, right? So Cutting a release Shouldn't just be the the the release leads Only job So some of the shadows are going to help do that as well And so I think that's to uverages point is a really great way to grow the community and hand off some of that work And give people the empowerment to to work through that stuff And so some of the stuff that we we learned across the way over sorry Over this time is we've kind of taken the iterative approach So we're going to make small changes make them fast and move forward And so we're also going to make sure we have clear set goals. So part of the release leads goal Tasks is to set the goal for the sorry the timeline for the for the release So that way we can communicate to the to the wider audience And so that's where the timelines come in as well The other thing that I think we found was just simply asking for help People want to help people are genuinely good and want to help and grow the community as well, right? So We have 12. I think actually for the 1.5 release. We have 13 people helping to to Get the release out. So that's really awesome So From the 1.4 release, we've learned how to some of our improvements. We did the retrospective. So typically the The sprint, you know at the end of a some sort of sort of sprint you have the Retrospective so we've kind of got got the retrospective. We have clear goals Clear actions to hand off to the next release lead So we also wanted to do a warm handover between the the different leads. So in this example Uvaraj and I got together and did a warm handoff to say, okay, here's some action items Now you can take those and move those forward. So getting the warm handoff helps knowledge transfer as well and then lastly Interteam communication so trying to have the comms team or the release leads team Communicate internal to each other before going to the wider audience of the the community to kind of Work through some some issues or and work off the round. Sorry round off the corners of some rough Rough work. So for example the weekly comms release We talked about that internally in the comms team before we went to the wider audience to say, hey We're going to do this and so that way we could kind of work With with a little bit of privacy and kind of work through that before we went to the the wider community so Kind of moving moving through the releases, right? We want to streamline this process. So the 1.3 release was an ad hoc release cycle The 1.4 release was the actual first release team release With some documentation updates and and um automation The 1.5 hopes to improve upon 1.4 to really streamline the process get automation in place get documentation updated And then kind of hand that over to the 1.6 and really keep moving this more into a streamlined fashion, right? The fun bit about what we do Is automating it so we can actually go do the fun bits, right? Like I don't want to sit there and click a button that says hey We have to release this then update the notes then do this So the idea would be one button click then we get the release out And so we can go work on the fun fun bits of of what we what we enjoy So since the 1.3 release The comms team had a bunch of templates. We set up to automate some more of the The release notes processes as well as automation tools The the ci team has done quite a bit on around automating tools as well But the the big thing in my opinion Is is we have some really good documentation on how to actually cut the release how to do the tasks And that's all out there in our in our repository And this is huge because it formalized the release work To to hand it over to just about anybody, right? So the term runbook was something new to me when I came to oracle But basically it's just a way to document the process. I assume we all do that I just had not heard the term runbook before But I think the 1.4 release had a really well documented Here's how to cut the release and then when I the 1.5 came along we had to quickly cut a 1.4 patch release and that was a very quick process thanks to the runbook and the documentation So this was a huge improvement from the maintainers Getting the document out of their head and onto the paper So that that way the rest of the community can run with the process So the big improvements that have been made with the release team is the predictable releases, right? Like if we can communicate that to our providers to say next In four months you're going to have a new release and then the next four months We're going to have a new new minor release as well. That's critical to the providers So that way they can prepare and understand what's what's coming The the knowledge transfer is huge as well to be able to get the community to grow So it's Really really important so that way it's not just two or three people doing all the work and also So it's not in their head alone, right? So anytime something's in my head I try to get it documented as quickly as possible so that way I can share it because It just makes life a whole lot easier when you want people to work with with your your team And I think I forget who said this but it's a it's a it's a stabilizing function for the ecosystem Right so that way you can predictably know when the next patch and the next update is going to it's going to happen So just a real quick like here's the release Release timelines and months and you can kind of see it was all over the place before 1 3 And then after 1 3 it's been a four month solid cadence, right? So Typical growth growth graphs you want to see up into the right But this in this case you kind of want to see down and flat right so that way we're continuously doing the same for four months So basically to recap the problem was Not predictable releases knowledge transfer was not happening. It was in it was in people's heads So the the the maintainers and the community came up with the the release cycle to kind of parody cluster api Sorry to parody kubernetes They they developed and documented the release team and the release team tasks and then over time the release team is going to drive and and Improve the the change process to get the releases out on a time in a timely manner So all that to say the 1.4. Sorry the 1.6 release is coming up in august and We would really love to continue to grow the community. So we're looking for people who can can write code document Run projects so any kind of skill set is what we're looking for and if you guys are interested We have the the slack we have a weekly office hours Or you can come speak to you roger myself about trying to join the team and helping us grow with that Is there any questions? Thank you