 Okay, welcome, we're going to get started a couple minutes late. This is the OpenStack stable, what it actually means to maintain stable branches in the community. We're going to go over, quickly over introductions, talk about who cares about stable branch maintenance, what it is, how you do it, and then open it up for questions. If you have questions, please use the mics since this is being recorded. So introductions. So I'm Matt Trinish, core reviewer on QA projects in StableMaint Core. It was the QA-PTL for four cycles from Juno through Mitaka and I was just elected to a TC member starting this cycle. My name is Matt Redeman, I'm core reviewer on NOVA projects and StableMaint Core and I'm the NOVA-PTL for Newton. So first of all, who cares about stable? Production Clouds, this is from the latest user survey in April. If anybody has seen this yet, the dark blue bars are from the October survey, the lighter blue bars are from April and then these are the actual releases that production clouds are running on. The thing to take away from this is really when this was released, when this was put out, I think Mitaka was about two months out from a release. So Kilo is over 50% of the production deployments which is the oldest supported stable branch and last fall Juno was also end of life and being used for production deployments and we should note that a quarter of production deployments are still being run on Juno which has been end of life for about six months. Distributions obviously care about stable branches because they service them and they have long support contracts on stable branches. A lot of them release their products well after we've actually cut the latest release in the community. So it could be anywhere between two and six months before the latest release is actually released by a distribution product which at that point it's for upstream development it's in stable mode. As I said, they support far longer than upstream. What I'm doing right now is about 18 months for the oldest branch and security fixes, distributions get their security fixes from upstream and then there are production clouds that don't use distributions. They roll their own packages, patches, they get fixes from upstream also. So what is stable branch maintenance? It's a few things. It's hard to sort of pinpoint. It's about enforcing a policy, back porting fixes obviously because that's why they exist and then keeping the CI system running so we can actually land fixes on the stable branches. So at a high level the stable branch policy is four things I guess in this talk. The first and most obvious thing is what the support phases are. Right now so this would be like stable muntaka. The first phase is the first six months. This is any appropriate bug fix. After that phase two, six to 12 months this is really supposed to be limited to critical and security fixes. And then phase three right now which is stable kilo, those are security fix only mode. The appropriate fixes so based on the phase but in general we don't back port features. Since a lot of production clouds are running on older branches they maybe would like that we back port features but it's generally not in their best interest because features are buggy and we can regress things. The fix should have a clear user impact or benefit so this means don't back port like spelling typo fixes. And then self-contained to me means not large refactoring patches because refactoring generally will have regressions and no new dependencies because if you've already done a bunch of packaging for your production cloud and you've done some releases we shouldn't be imposing new dependencies on you or newer versions of minimum dependencies. Stable is also about the team structure so each project has a specific core team which is generally a subset of the overall core team but not necessarily because the rules for stable and the rules for trunk on master development are a little bit different. So the project specific teams are really tasked with identifying back ports, reviewing back ports because they're the subject matter experts on their project, requesting releases, monitoring the CI status so you can actually land patches on the stable branches. The stable main core team is a sort of super set of that. The stable main core team approves core membership in the project specific team so the project specific teams don't actually have approval to add new members to their project and Garrett that goes down to the stable core team but it's really sort of a nomination process. The projects nominate the person, the stable main core team does a sanity check and most of the time they get added. And then the stable main core team also guides the project specific teams on policy and does a lot of the I guess when the gate is completely broken knows sort of how to fix it and gets involved in that. The final thing I guess is review guidelines. So this is what I was talking about with the core team on master is different from the stable branch core team because the review guidelines are different. It's really about making sure that the fixes are appropriate, using the right process. This is cherry picking, using the same change ID, this is how we track things and that the fix is falling within the support phase. So we want to talk about how these stable branches are maintained. There's the obvious back porting, reviewing of back ports, making sure like Matt was talking about it conforms to the policy and we're weighing the chance for regression against the bug fix and the amount of code being changed but there's also part of it which is often overlooked which is how we keep the CI system running and the impact of doing that on stable branch maintenance and that's what I'm going to be talking about. We do this, we run two sets of jobs for stable branches. Most of the testing is done with periodic stable jobs which are tests that run every night, I think it's like 2 a.m. Eastern time for the U.S. And they run against each supported branch. They run the unit tests for the project and then we run one set of tempest tests. I think there are five or six configuration. No, it's two or three configurations for each supported stable branch. The failures for those are reported to a mailing list and the results are also viewable on the OpenStack Health dashboard. And it's pretty straightforward, nightly, basically like canary jobs. This is just an example of some of the jobs we're running. Text really isn't too visible on this because the resolution on that's a little laptop. And but you can see we're running a bunch of tempest jobs there and you can see failure rates and things like that. This is the OpenStack Health dashboard. The other way we test things is with branchless tempest. So the tempest project is the integrated test suite for OpenStack. It runs API driven black box testing. So you have a deployed cloud and you hit it with API request, check the response, make sure the servers you're spinning up or whatever operations you're doing work correctly. Since the Ice House branch or the Ice House release, Tempest has been branchless. It doesn't branch at stable like other projects, traditional services. And we run the same tests against master and all the supported stable branches. And on Tempest, we gate Tempest changes on all of the stable branches as well as master. So when you push a Tempest change to master, we make sure it runs on stable metakas, stable liberty, and stable kilo, I suck at the alphabet. We do this to ensure API compatibility between releases, because that's a big user experience and interoperability thing is that the API always behaves the same. This is great for OpenStack, but the problem is Tempest throughput, we have hundreds of patches in a week. It's more than two or three runs every night. We see a lot more of the non-deterministic failures that plague the gate that a lot of developers complain about things being flaky, bugs in OpenStack, bugs in tests, bugs in infrastructure. We see all of those in branchless Tempest because we're running tests all of the time. This is something that where a lot of these issues in stable come up first as part of this, as opposed to in the stable periodic jobs or even on back ports, which also run the tests as part of just a normal CI workflow there. And these categories apply to everything in the gate, but specifically we have bugs and dependencies. We always hit those. OpenStack is this gigantic project, and we have hundreds of dependencies that we pull in. We hit bugs in the upstream dependencies all the time, whether it's related to incompatible requirements based on a new release, or just an inherent bug, like we've hit bugs in the kernel with two different features that we're using, and there is race condition in the kernel, and things would crash the VM. We're testing things in. The solution we normally do there on stable is we report the bug upstream, and we have to pin the version in requirements if that applies. If it's like a requirements mismatch, we pin it in upper-constrainer requirements to make sure we don't use the buggy release and try to move forward. Another category is infrastructure breaks. The machinery that runs the gate is a gigantic set of awesome projects the OpenStack infrastructure team runs. But it's complicated machinery, and sometimes things don't go as we expect. Things happen like in any production system. When that happens, there's not much we can do. We just have to wait for infra to fix it and work with them to make these systems more resilient in the future, because it's all a community effort, and we have to work together on this. On that same note, all of this testing is reliant on upstream service providers. All of our tests run in public clouds, running OpenStack. I think there are five at this point, maybe six. There's a bunch of them. Sometimes those cloud providers have outages. When that happens, there's not much we can do. Infra will turn it off. We'll have some delays, but we just have to wait it out, because we're completely dependent on that. And that happens more often, especially in recent, in the past month, we've had a lot of outages, but that's unrelated. Then there are also bugs in OpenStack, and this is the tricky one to fix, because there are bugs, race conditions that are inherent to OpenStack when we released it, underlying bugs. Sometimes we can backport a fix, try to make it more stable for running the tests. Sometimes that's not possible. You were looking at the support phases. If we're in security backport-only mode, and it's not a security bug, it's just an underlying bug, it might not be applicable to backport that fix. So you get kind of stuck, as opposed to like on master, where if you find a bug, you can just fix it. Then there are also bugs in tests, like in Tempest, or in unit tests, race conditions, global state corruption, hard-coded URLs, things like that. And we can fix those if they're on Tempest, because it's branchless. We just fix it on master. Or unit tests, if it's applicable to backport, we backport it, otherwise we just have to suffer. And then it gets really interesting when you start looking at things over time. So the graph on the left there is the number of tests we run across all jobs in the gate and periodic cues against stable branches. So you can see that stable kilo is the blue on the bottom there. We run a lot less tests on kilo than we do on liberty, and even Mataka has even more. And that just has to do with a combination of the frequency of backports, because Mataka is obviously newer. More backports are applicable. More people are looking at it. So we're going to be running more tests on that. And as things age, we run less tests. That doesn't mean everything else is static in the rest of the ecosystem. So if you look at the graph on the right, that's a snapshot that I took from OpenStack Health back in November or October, whatever a month before the Juneau end of life was, and showing the failure percentages from each of the branches. And you can see, if you can read that, Fanta's kind of pixelated. But you can see that the stable Juneau has a 13% failure rate, and the other branches are at 2% and master is at 1%. And there's a lot of factors involved in that. But it's mostly that the tests we're running in tempest grow. The test week grows, and it changes. They test features that we weren't testing before, and we're exposing more race conditions in the older releases, as well as we're doing less testing. So failures are more prevalent when you look at it as a raw percentage. But what this means for backporting things, especially around more critical periods in the cycle, like if there's going to be a release for a stable project, is that it makes it much harder to get things backported, and it makes it much more difficult. And no one really thinks about this, because stable is kind of off in a corner when people are backporting. And it doesn't get the same attention as master, where it's day-to-day development. And that's something that stable branch maintenance, there's a lot of time spent on fixing these issues and working through them. OK. Before I talk about some news in how Neutron Project handles stable backports, I'll note that my colleagues already covered how we maintain the stable branches, how we maintain CI for those branches, so that you are able to merge new patches into those branches. But the main point of the goal for the existence of those branches is to actually deliver bug fixes to users. Stable branches are supposed to be a safe source of bug fixes. So sadly, oftentimes it happens that we have identified a bug, we have a bug fix for that in master, but it never gets to the stable branch. So those who rely on those branches to run their clouds, they are actually exposed to some bugs that the software that they run may even be less stable than master in some ways. So in Neutron, we were looking at how we deliver the bug fixes to our users. And ideally, your stable branches would be one of the points on the whole chain of delivering bug fixes to your cloud or your distribution. And in an ideal situation, you would have obviously all bugs that you have fixed in master first, and then you backport everything relevant to the latest stable branch, which is Mitaka right now. And then in some cases, if policies allow that, and if it's technically achievable, you also backport it to the later, older stable branches. And then through this pipeline, your distributions, or if you don't use packages, but you consume directly from those stable branches, you get those fixes into your cloud. So ideally, that would be the case. Sadly, the reality looks more like that. Meaning that you have a lot less bug fixes in stable branches that you couldn't, in theory, expect. And you, in most cases, get backports for those bugs that already hit someone in some real production installation so that they cared enough to get back to upstream, maybe report a bug, or maybe handle the backport for themselves. And it's not ideal in a way that, first, you don't get a lot of bugs fixed in stable branches, but also that you expect someone to be hit before you fix the issue for them. So you are not actually proactive. So the problems with the current process as we handled it to liberty is that we are not proactive. We also don't have a proper way or no process set to identify bug fixes that we can consider for backports. So what we changed in liberty. Once liberty was released, we decided that from now on, all the fixes that we'll get into master, which was Mitaka at that time, will be tracked for S-backport candidates so that we don't leave any patch behind. And this requires quite a lot of work. So we set some process where we do this consideration weekly. We remember the last patch that we checked the previous week and we get the list of all the bug fixes that were merged in master since then. And then we make some kind of filtering, sometimes automated, sometimes not really, to get to the list of patches that we want to be backported. And then we feed this list of candidates to developers that they can handle the backport for. Since the process is very involving, we tried to automate as much of the tracking process as possible. So we implemented some tools around that. We collected some information about patches and bugs from Launchpad, from Git. And we obviously contributed those tools to upstream. There is Release Tools repo, and we are trying to put those tools there. So I will show you basically what you would do on a weekly basis if you would want to apply this kind of process. So all those tools that were implemented, they work like traditionally UNIX tools in that they get something on the output and they produce the same output. They work like usually UNIX filters. So you can combine them as you wish. So in this example you see a standard pipeline. So the first part, it gets the bugs that were mentioned in all patches since the last time we checked. And then this list of bugs is passed to the next filter which filters out some wish list, some bugs that have the priority wish list in Launchpad. Because usually it's like blueprints or RFEs. You don't want to backboard them. And then you pass it to some annotation tool which gets quite a lot of information from the Launchpad and produce something that is more rich text. You have a link. You have a title. You have some tags pulled from the Launchpad. The author, so all kinds of things. And then also you can operate on this output with standard tools like just apply grab. For example, in the first case, we filter out all the bugs that were already marked as fixed by the infra bot in Launchpad. So we get only the candidates for Mitaka. Or maybe we may want to get just the list for bug fixes that are related to Linux Bridge back end. Or the third one is something that is not yet fixed in liberty but was marked by someone as a proper candidate. There is also a tool to remove some backward potential tags that are used by people who are interested in backpours. Specifically, it basically checks whether the fix is already merged in the branch. And if so, then the tag is removed. So you can use it to maintain Launchpad. So the process gets us to the point when we have lots of bug fixes in stable branches. Obviously, the more patches, the more changes, and the higher the chance of regression, there are some concerns around that. But still, it's worth it. We had some internal estimation of how many real customer escalations would be reduced if we would be proactive about back porting. And we got like 50% plus of escalations. So it means that in 50% cases, you could avoid your customers being hit by a bug, which is a lot. And also, when you back port something in Garrett, the original author of the patch is automatically added to the reviewers list. And if there was a regression in this bug fix, he's probably the one who is aware about that. So he will usually tell you that it's something that you may want to skip or there is a follow up. And obviously, the more patches you release, the less granular updates are. So you want to consider to release more frequently. But there are tools for that. So we were running it for Neutron for the whole cycle. It has some success, meaning that we deliver a lot of bug fixes to our users. But we were thinking that maybe this process should be adopted by other projects. So if there are someone from other projects that may want to adopt this approach, please talk to me. And there are still things to do here. The whole process is not really documented. I should do it probably in some project guide for everyone's reference. There are some pieces of automation that are not yet there specifically. I was thinking about another script that would automatically propose the backboards, trying to cherry pick them, and uploading them to Gerrit, so that you could completely automate the whole cycle. And we feel some lack of contributions from the community. But we are trying to pull more people in it. The main point is you should identify people in the community that are really interested in consuming from those stable branches. And then either reach their developers and they will maybe try to get to their management line. And that's, I guess, it. This is just some places you can get more information about interacting with stable branches in the community. There's obviously an IRC channel, just like every project in OpenStack has to have its own IRC channel. We've got the OpenStack dev mailing list. I did mention the stable mailing list before. That is just for periodic test results. Please don't send traffic with general stable questions there. Or try not to. There's docs available. And these slides will be posted for all the people who are scrambling to take pictures. There are docs available for stable branches and the whole policy around this. And those are on the Docs OpenStack website. And we have a weekly IRC meeting. Which I personally always forget about. But that's just me. That's a wiki page with all the information and the time on that. And then there's also the OpenStack health dashboard that I had mentioned a couple of times where you can track those test results on stable branches. And that's there as well. With that, are there any questions from anyone? And remember there are mics in the center if you have something. Just get up and walk to a mic and ask away. Thank you. Oh, there's no mic in that one. But there's mic in there. I can repeat it if you want to ask. It'll work. I can say it won't work any worse than master. So the question was OpenStack Ansible is deploying from master. It was deploying from stable. Right, right, right. It's deploying from stable. And I think generally the deployment projects are behind the distros like Chef. And there are several, because they have to wait for Trunk to release and put out the release notes. And then the deployment projects have to catch up. As Matt said, it shouldn't be any worse than Trunk. Because it's stable and because we shouldn't be landing features that could be regressing anything or partially implemented features. So I think that's a good point. I think that's a good point. I think that's a good point. I think that's a good point. I think that's a good point. I think that's a good point. I think that's a good point. So with the implemented features, it should be stable. And we have to pass the gate tempest tests to land it anyway. Which isn't exactly the same question. Would you be confident deploying a production cloud from stable? I have. Right. In the past I have. And that was when things were much worse back in Icehouse. So just to clarify your question, you were asking if OpenStack Ansible, if you're comfortable deploying from the stable branches that it tags based off of. I'm sorry. I work on OSA on an OpenStack Ansible. No, no, no. I just wanted to help you out if I could. If you're comfortable deploying from trunk. Stable was once trunk. So yes, you should be. Rackspace private cloud consumes OpenStack Ansible and we deploy based off of the hashes and branches and whatnot. So we do that in production for all our customers. Oh, I'm sorry. We deploy from the stable branches and the hashes, like the get hash updates every two weeks for us. And whatever that update is, we run our tests and we're good. Yeah. Yes. So one thing to note here is that if you deliver directly from the branch without waiting for the next minor release, in theory, you can hit a case when, for example, there are two backwards proposed one is fixing a bug but also introducing another one. And then there is a follow-up fix for that regression. And if you somehow is lucky to catch in between, then you obviously will have this regression. Usually when release managers, release a minor release, they look into the queue whether there are any regression fixes there and they will not release a new tag without it. This is also sort of why I mean the stable core team for the projects is made up of generally the majority of that team is from the core team on master. So I'm core on Nova. I'm not going to review every single thing that ever gets merged into Nova but there are definitely things where they stick in your mind where we merged it, it regressed something and you remember it. So when somebody tries to backport it, you can say, like, definitely not until you get these two things in at the same time. But otherwise, yeah, from stable, people should be able to actually release from any get hash on stable. Okay, I just want to add that shouldn't the recent changes to the release structure of the stable branches fix that? Because in Tokyo it was discussed that every patch should trigger a minor release. I don't think it happens. You could release. There are downstream consumers that can rebase daily and pick them up and put them in production. Sorry, I actually had different questions. Sorry for that. Oh, sorry. Up until Kilo, we were doing coordinated releases for all of the projects and now they also release whenever they need to. So I wanted to ask about the neutron process. I wanted to ask, so you still have to tag the backport potential tags? How do you do that? You can't automate that, I guess. We actually don't tag. We just make sure that every patch is considered either through by our filters or by people. So it's just as long as the commit message mentions a bug, it will be caught by the filters and then considered. So you just think that everything is potential for backport? Yeah, that's the assumption. So the tag is still there even in case someone wants to mark some bug specifically for the backport. But usually you may assume that this patch will probably be considered as part of this proactive pipeline. The second question, I know that on the docs it's forbidden to backport schema changes. Yes, for a very good reason. Yeah, I absolutely understand the reason. And I wanted to ask if you maybe downstream somehow handle that. So if there is a bug that's clearly fixed by a simple, I don't know, widening the field in the schema, do you just... I wanted to ask your personal... So the question is whether there is technical way to backport a change that includes schema change. Yeah, and you like, if the change is, well, if it's really simple, I know that there are no really simple changes in the schema. But if you are confident enough that this won't break anyone's day and this would just make life's easy and remove one bug. So one of the things almost every project does is they pad if they're using SQL Alchemy Migrate. They leave empty migrations at the end of a cycle for the potential fixing... For fixing potential schema issues like that. I don't know of any cases where we've actually backported one. They're there for backporting like super critical bugs. I don't know of a case where we have that, but as a downstream consumer we theoretically use that to try to address an issue like that. I wouldn't recommend it because the schema on master could shift and then when you migrate you have to manually... Yeah, this would break your day when you'll try to operate. Yeah, yeah. Sure. I mean, in general, the schema migrations should be out of importance. So if you're adding an index, you would check to see if the index is already in the table and if so, ignore it, you know. Otherwise add it, but... That's NOVA. We're crazy, yeah, but... In Neutron we don't leave any empty files. We use Alembic there, but we don't obviously backport anything and there was a project done in Juno where we got to the point where we don't have any conditionals in our migration files. So it's always consistent. But from downstream point of view, yes, there is a technical way to do it. We actually did it once in Red Hat Distribution. But it was just like a backport. We haven't actually made a change, but for that you basically need to... Since Alembic migration is a chain of changes, so you just rearrange them as long as they don't conflict in between. It should work. Okay. I think we have like 10 seconds. No worries. Maybe a quick one or take it offline. On the things like we cannot do, like one thing I believe is we can't do requirements.txt change. Is that... Is the right assumption in the stable branch? Like I said, we don't... You don't raise the minimum dependency on a dependent... You don't raise the minimum version of a dependency on stable. We do backport things like blacklisting broken versions of dependencies. Sure. So it goes back to the... If there is a bug fix that needs to be a version of a particular dependency, is that okay to do in a stable branch? It depends on... It depends on what the change is. Yeah. Okay. If it's making the stable branch require a higher minimum version of a dependency, then generally no. You need to, in that case, backport something else that can handle both versions of one of the dependencies on the stable branch probably. But I think we're at right time. Thanks. Thank you. Thanks. Thank you.