 Good morning everybody. I'm Alvaro Brescioli. This is Dave Sebedi. We work on the release integration team at Pivotal, and we love screen savers. Yeah, we're gonna talk to you today about breaking up the CF release monolith. So we're gonna talk a little bit about deployments versus releases, fast iterations, so we want to be, you know, developing quickly and we're gonna talk about scaling development teams because as we add more and more people to Cloud Foundry and add developers and contributors, this becomes a problem. It's a fundamentally a human problem. We're gonna have basically at this point teams all over the world trying to communicate and they're all in different locations. They're different companies, different time zones and one second let's move into the first things first. What's a release and what's a deployment, right? And for the purposes of this talk, we're just gonna say a release is just a set of Bosch jobs, right? And a deployment is one or many releases running on an IaaS somewhere. And we're gonna talk a little bit to give you guys some context, right? A history of like CF release. Why is it a monolith? Yeah, so I think in order to frame the problem properly, it's probably best to start with a quick history of CF release and how we got to where we are now. So here we have CF release. Inside we have some of the main components that are part of Cloud Foundry. They're implemented with Git submodules. It's not super important. This black box represents the actual Bosch release and so that's where CF release actually comes from. It was the idea that you could take the different Cloud Foundry components and use Bosch to deploy them. And that's just a get repo out there in the world. But this is currently how we still ship versions of Cloud Foundry. We have all the components living inside of CF release. Operators take CF release, they're repository and use Bosch to deploy that. So this is actually still to this day how we ship new versions of Cloud Foundry. There was, you know, maybe the important part of the story here is that there was a time when we decided that we needed to add some new features to the platform, mostly around application logging and getting those application logs to users. And it was going to be this relatively large change to the platform. It was going to require some new code bases and we really had to scale the team that was working on the core part of the platform in order to do that. So there was some new code bases that I've globally called LoggerGator. That's the name of the team. There are actually several code bases in there. But we made two decisions at that point in time. The first one was that LoggerGator was actually going to continue to be part of CF release. It wasn't going to be a separate code base. It was going to be Bosch deployed with the rest of the components like everything else. But the other interesting decision to be made is we're going to spin out a new team. So until that point, there had only been two teams working on CF release. One was the identity team. The other one, at the time we were calling the backend team. The identity team at that point in time actually was not incredibly active. They weren't contributing too much to the code base and it was a pretty stable interface. So the main team doing active development on CF release was the backend team, one team working on one release. But with the introduction of the LoggerGator team, we now had two teams pushing, regularly pushing code to CF release. And we ended up with some interesting issues with that. So two teams contributing to the same release. But each release goes through sort of the same testing pipeline, goes through one testing pipeline. And so we would see, because we had two teams working on it, if one team pushed code that would break the build, we might see that it would block the other team from contributing code. That's the sort of the process that we go through when a build goes red. You know, we sort of halts any new commits going into the repo when someone takes a look at the broken build and tries to fix it. And this would actually happen, this became a huge problem for the two teams. One team would push code and like it might take days for the pipeline to go green again. At which point the other team is like stacked up a bunch of commits that they're waiting to push. And so then they would flood the pipeline with their new code and then you would see another break. And so this was actually like there was a back and forth where the two teams were sort of stepping on each other's toes because of the way we did testing at the time. And so that was a huge pain point for a while. Then there was an opportunity to add yet another code base to the Cloud Foundry ecosystem that we wanted to rewrite the main back and the runtime for Cloud Foundry. They called it Diego, but we made a different decision with Diego. Instead of putting it inside of CF release, there was a new release that we called Diego release. So we had a new team, a new release, a new set of environments and a new testing pipeline. And so what we saw is that a lot of those issues between the back end team and the loggergator team, we didn't run into them as much with the Diego team. The two teams didn't block each other nearly as much and they could sort of test independently. But we inherited a different set of problems instead. Given that we have two releases, how do I know which versions of each release are compatible with each other? Or suppose that I want to add a change to the platform that requires commits to both CF release and Diego release. How do I push those in lockstep? So different set of complications by using a new release for a new code base, which brings us ultimately to the core problem. How do we scale development of the CF core? We want to add new subsystems and new teams to build different features or fixed bugs or whatever it is to the Cloud Foundry ecosystem. How do we scale that effectively? And if the answer is to break the monolith, how can we still prove that all of the new releases work properly together? How can we test things like integration and also make sure that we don't lose the sort of quality assurance that we previously got from a single pipeline where everything was being tested in one place? So if we do break the monolith, we want to make sure we don't lose a bunch of things that we already had. Cool. And before we actually answer those two questions, we're going to talk about what do we gain from this? Like what are the advantages of breaking up this code base? Some of these are in part the solution to those two questions. Firstly, we can now extract components. So things that a lot of teams were doing the same way, we can now just pull that out and say, everybody just uses one component to do your route registration, for example. We can find dependencies. Maybe dependencies we didn't know we had, which is really nice. It gives us a little bit more stability. And we can start to establish contracts for components, which means that when developers are writing their code, all they really have to worry about is like, does it still meet this contract that my release has with someone else, right? So you just worry about keeping your APIs stable. We can test the components individually and together. That's nice. We can iterate quickly and we're kind of working independently. Very nice thing to have. And you can commit as much as you want without stepping on people's toes. So, you know, to go back to your question, what might we lose if we broke the monolith? There are actually some nice things about having a monolithic code base and we want to try not to lose those things. So we might lose the idea of a single integration point. There's something nice about being able to point to a single pipeline or a single build and say, yes, it works or no, it doesn't. If we distribute that logic across multiple testing pipelines, we might lose that idea. The other thing is we might lose the idea of a canonical Cloud Foundry. What does the world look like without CF release? How do we refer to a version of Cloud Foundry? Right now, I can tell you, oh, go download CF release 235 and we're all talking about the same thing. But if we break CF release apart, now we have to refer to a collection of releases and the various versions of all those things. So we might lose the idea of, oh, this is a canonical Cloud Foundry. And the last bit is that we lose the single code base that represents all of CF. So currently, right now, I can get Clones CF release and know that I have what is essentially all of Cloud Foundry. And the really important part about this is that if we break it up into multiple code bases, I have to know how to glue them back together to get my Cloud Foundry. There's an important missing piece there if we just have multiple code bases. We need to know how to bring them back together. All right. So now we're actually going to talk about how we're splitting this up. So picking up where Sabedi left off, talking about what CF release is, this is one single deployment. At this point, we've spun out all these different teams that are all contributing to Cloud Foundry, right? And there's actually more teams than you see here. But just to keep it simple, we kind of left it at this. And this is essentially where we currently stand, right? Everybody's still pushing their code to one single release. So I'm going to just move the present over to the left to make a little bit of room for the future here, which is where we're trying to go with this. So now this represents, we're starting to take CF release and break it up into other releases. We're already sort of in the process of doing this. These are now releases of their own, but at the end of the day, they're not being used because the way that the manifest is written, it's still represented all in one single release, right? Yeah, sorry about that, guys. Okay, each subsystem gets its own release. So now every team is working on their release independently, right? And they can push code and not worry about if somebody else is going to push something and have merge conflicts, etc. And we can have a lot more people now working on Cloud Foundry quite happily and quickly, right? We're not stepping on each other's toes, as we say. So this is really nice. If we want to add new features to Cloud Foundry, now all that means is, okay, well, let's make a new release, package it up and throw it in the Bosch manifest, right? So the future and where we're going is this one deployment, right? So now we're not talking about releases at all. CF release sort of goes away. It's not a thing. CF release becomes a single, it becomes a Bosch manifest and it's no longer a release. It's a deployment that represents all of our components, right? It's essentially the glue. And I'll let Sebadi talk a little bit more about Bosch and the Bosch 2.0 manifest generation. Yeah, so the idea here is that instead of CF release, we're going to start publishing this manifest as sort of the canonical, not just the canonical configuration, but as your canonical glue for taking these different releases and publishing them, you know, being able to deploy them. So we'll all actually be using more or less the same manifest as well. And in order to do some of this stuff, we're actually going to leverage the new Bosch features that you've probably heard about, things like links, AZ striping, all that sort of stuff. And hopefully what this does is this like really tightens the configurations that we don't have to, you know, pass around credentials, you know, to all the different jobs or pass around static IPs. All that stuff can be handled under the hood by Bosch. So the goal, the dream here is that we get rid of all these complicated templates that you may have seen before that use things like SPF or Spruce to produce a manifest for CF release. So in this hypothetical future that hopefully is not too far away, we don't have any more complicated templates and hopefully no more SPF. So that's sort of the, that's sort of the dream as far as manifest generation goes. There'll be one manifest with maybe a few places where you have to pass in your credentials, hopefully just the one time and maybe a few fields. But it'll be, it'll be logicless. It's just a pretty much copy paste. It's a macro or something like that. Nothing too complicated. Yeah, nice advantage that Bosch links give us is like, you're not talking about this, like this manifest is not going to be a 5,000 line YAML file anymore. So yeah, how do we see this? What are we going to do to make sure that everything still works together correctly? And the solution that we've come up with is that, okay, so you have all these different teams committing and pushing their code. They're each going to have their own pipeline, right? And every one of those pipelines is going to be spitting out final releases. We're going to feed those in to the release integration pipeline and we're going to introduce this new concept of blessed versions. All that means really is that, okay, this version of Diego we've decided works well with all of the other components of Cloud Foundry that were blessed. So it becomes another blessed version, right? And then we're going to take those and we're going to feed them right back in to the team's pipeline. So every team is now working with whatever the latest of every other release is and testing against that. So when they push their code, they know that it works before they've even tried to give it to the release integration pipeline, right? So they're testing against the latest of everyone else's stuff. And to bring it back to one of the original things we were talking about, this should all enable faster iteration on new releases. So shorter feedback loops and really being able to quickly know how your updates to your release are going to integrate with everything else. So who tests what? Every team, all they're testing is their release against blessed versions and the Relent team is going to be the one blessing those versions. So yeah, the main advantage is exactly what Sebeddy said. Teams now are responsible just for one thing so they can iterate really quickly without worrying too much and without thinking they're breaking things, right? And what this means really for operators and developers, so smaller releases means a faster release cycle. It also means that it's easier to compose releases. So the example that we often use for this is that there's a routing release that contains pretty much all the components that are involved in the routing layer at Cloud Foundry. But it also can package up its route registrar which is how you publish routes to the router and say, hey, traffic to this URL should be routed to this IP address. Instead of inheriting all of CF release in order to reuse this one job, you now only have to bring in a smaller subset of Cloud Foundry in order to do that. So by breaking down releases into smaller releases, you get the option to bring in and exclude functionality as you so choose. The other thing, we're going to have new manifests so we'll have a canonical configuration with same defaults, a simple manifest. And lastly, we talked about this already, but potentially more stable interfaces and explicit dependencies. One of the things we found as we started to break apart CF releases is that there were all of these hidden dependencies that people had completely forgotten about. They were lost to the sands of time. And as we're starting to split things out, we think, oh, my God, this has a dependency on that. How did that happen? And so we moved more towards service oriented or even microservice architecture. I would hesitate to really use that word to describe the architecture of Cloud Foundry, but just the idea that you'll have a defined contract and everyone listens to that. So hopefully you're all excited about these changes to come. We're happy to stick around and answer any questions. Thank you, Jesse. Yeah. So bringing down, like, CF release is great, but doesn't that mean... So David, you mentioned earlier that this does make it difficult to test cross cutting changes. And so I imagine that if in the future you do have to introduce a cross cutting changes that require changes in both places. I said both, that's hypothetical, that's more than one. You theoretically have to go through the pipeline three times, right? You have to introduce dormant changes in both places, but leave them inactivated and turn on the feature flags on the third time. So there are a couple different... It depends largely on the nature of the cross cutting change, right? So sort of in an ideal world, there's sort of a layered, a properly layered architecture, and if you treat, for example, Diego as a service, right? You can add a feature into Diego that doesn't actually have to be consumed by any other part of the Cloud Foundry ecosystem yet. And then, you know, the appropriate changes could go into the routing layer or the API layer or whatever it is that consumes those changes, right? So if you have good layering, you might not... You might not... You can treat them as sort of service, different services, the Diego service, the routing service, and then on top of that sort of layer at the very top is the API. So that's sort of the ideal situation, I think. I think there are actually still some open questions about how to make good cross cutting changes. Like, for example, one of the ones that we keep coming around to, again, is how we deal with updating tests and features in the same commit. And that's actually still an open question that we're figuring out. Who owns, for example, the tests like all of Cloud Foundry. So if anyone has any ideas, feel free to talk to us about it. But those are still, like I said, sort of open questions. Amit, do you know? Amit's RPM over here. He could probably better answer that question. So if anyone didn't hear that, the question was about release cadence, how often we'll release new versions. And I think the plan is to stick with our current practice of releasing, trying to release every week or every other week, something like that. Yeah. Yeah, currently what there is is this concept of a compatibility matrix. You're probably familiar with it. Yeah, I think in the future we're going to try and go with, and Amit, you can correct me if this is wrong, but the deployment will be the source of truth. So we're going to be publishing a manifest of essentially versions that work together, right? And this is going to be continuously updated. So to stay up to date with your components, it'll basically just be like staying on those versions. Amit? Yeah. Right. The manifest becomes the source of truth for that. Well, hopefully with the way that manifest are going to look in the future and be a lot more simple, this shouldn't actually be too much of a problem. But yeah, if it is, obviously we would love feedback on that. Yeah, so as we said, it's the glue, and that's exactly what its purpose is going to be in the future, right? You can still override values in the manifest, et cetera, yourself, right? But because it's going to be so much more simple, it may just consist of maybe like your overrides living, and I don't know if this is exactly how it's going to look. Maybe your overrides live in one file, and the manifest is just a single thing that is almost static in a way, and all it is is just keeping up to date in that sense, right? And you know that the compatibility is there because it's been tested together. Exactly. Yeah. Right here. Oh, great. So, like, the world right now, like, we used submodules, and some of the stuff we're talking about, you can do with submodules, right? Like, you have your gitmodules file, which is kind of like blessed versions, and you know, if you're worried about, like, one upgrading one component affecting another, then, you know, we don't bump the submodule, right? So, like, what's the biggest win of this new world? I think it has as much to do. I mean, I'll sort of refer to this at the beginning. It's primarily a human problem, not a technical one, right? Like, that's, like, when you have everything in the same release, it becomes really hard to iterate on your corner of the ecosystem, right? So, the idea of breaking it up into releases is to make that a lot easier. Yeah. So, the big win is that, and also, you get to logically encapsulate a subsystem. You can point and say, this is the routing layer, this is the API layer, this is the runtime layer, you know, or these are service layers, like, at CD and console, things like that, instead of having them all sort of shoved into the same release. So, that's sort of, that's the big win. I don't know if that really answers your question, but... I think a good way to put it is, like, any core contributor is probably familiar with the can I bump, and that's essentially how, right now, we communicate, right? How we solve this problem of, like, how does everybody know whether they can push or not push or whatever? That sort of goes away, right? At that point, like, you're constantly fixing up your code, and when something's ready for you to consume, it's just going to get fed in as a blessed version, right? But you don't need to worry so much about whether you can bump and are you going to break everybody else's things, because you won't bump until you've seen your stuff go green, right? I think I saw some other questions. And also, in the back, if we're running out of time, just stop me. So, this is more of a process question. You said that everybody tests everything. Have you, you know, have you figured out if there is a need to have a release testing? Because, you know, you, at the end of the day, are going to take pieces from different releases, and at some rate, you know, certify that. Bless it. So, there has to be blessing by somebody. Who's that controlling body in your team? So, that's actually going to, that's, that responsibility is pushed to the teams that develop those releases, right? So, it's up to them to decide what, what tests make sense for that release, right? So, each layer gets to build their own test, their own test suite and run those tests. We are going to ask every team to, the last step of their pipeline is to pull in the stable versions of all the other releases and run the normal CF acceptance test against that combination of bits. So, before anything gets pushed into the integration pipeline, we should have already tested an update to your release against stable versions of everything else, so that we release, so, yes. Anyway, the idea is that you should run, you'll, these people will be running acceptance tests before any of this stuff gets promoted into the integration pipeline. Okay, and this is happening every couple of weeks where, every commit. The goal is, Yeah, but what I meant was the release certification, if you will. Yeah, that's up to the team to decide. Okay. Yeah. Well, lots of questions. So, for now, our team, the release integration team is owning those two things. I think that's the plan for the near-term future, long-term, as we automate more and more of this way, that might, the team might shrink, or we haven't figured out, but for now it's the release integration team. Yeah, I think they can just change it themselves. Same thing you do now. Yeah, but I think the risk of that is a good deal lower than when, you know, pushing to the same release, but that's a good point. I don't know. I don't have a lot of concern about that, but it could definitely become a problem. I could see hypothetically, so. I think the idea is like, if we're going to publish this plus versions, we have to also give deployers the information for how to do this properly. I think that's definitely part of our responsibility. Now, that might be automated by doing that aggregation that you just described in an automated way. So, just like in the CF release notes, it might just amount to an aggregation of the release notes of all the sub-releases. But yeah. So, but yeah, I think we will continue to be responsible for that. Other questions? I think I saw some hints. I want to make sure I don't forget anybody in the back right there. Sorry, are you talking about specifically about which releases you want to use, or like configuration? Yes. So, I think the idea is that we'll replace those scripts with a canonical manifest that will be updated as we make changes to the releases that require them. So, you'll just do a get pull instead of, you know, generate deployment manifest script. Yeah. All right, I think we might be out of time. I'm happy to stick around for a few more minutes. If you have any questions, you can just come bug me. I'll be standing out in the hallway or something like that. But thank you very much.