 Hello everyone Welcome to the talk streamlining the Bosch experience. My name is Maria Shaldivina. I'm a software engineer at Pivotal I've been working on Cloud Foundry for four years and primarily on Bosch. And now I'm working on concourse. Hi, I'm Rupa. I'm also a software engineer at Pivotal I've been working on Bosch for close to a year now and before that for the past couple of years I've been working on pushing apps on Cloud Foundry, writing service brokers on Cloud Foundry and using Cloud Foundry. So today we're going to talk to you about streamlining the Bosch experience in which we're going to talk a little bit about some of the features we worked on for the past year and a little bit about what motivated us to work on those. We'll also talk a little bit about our testing to tell you about why you should be having a lot of confidence in these features. So why were we doing some of this work for the past year? We have an incredible community here at Bosch and we got a lot of feedback about how people were using Bosch and what they'd like to see on Bosch. Specifically, one of the things we heard was there was a lot of YAML and this YAML was kind of hard to manage. Also put in with that networking and put in with that availability zones this made it that there made it so that there was a lot of duplicate YAML. Second thing we heard was you wanted to use Bosch for robust production use cases and you wanted to see Bosch be ready for handling some of that. At Pivotal as we were using Bosch for deploying dynamic services we noticed also that there wasn't really a great way to share properties between deployments and to share networks between deployments. So these were some of the things that we wanted to address for the past year and this is what this talk is going to be about and telling you how we address some of that on Bosch. So let's start by looking at one of the biggest pain points that used to be in Bosch which is infrastructure and network configuration. Managing IP allocations used to be a big headache for deployment operators. If you ever manage deployments with Bosch, you know every time you want to scale up the number of instances or introduce a new job, you need to make sure it's not going to use IPs that already used by existing VMs and if you introduce a new deployment you need to make sure it's not going to use IPs that other deployments are using. So sometimes you even had to go back and update your existing deployments with the reserved IP pool of your new deployment. So operators like surgeons had to slice out IP pools for different deployments. Also, since all infrastructure properties were spread out across different deployment manifests whenever you wanted to update your infrastructure property, you had to go over all your deployments and update them to keep them in sync. So introducing Cloud Config. Cloud Config is infrastructure-specific configuration that your director is configured with and it is shared by all deployments that are managed by that director. Once director is configured with Cloud Config, Bosch starts taking care of IP allocations for you. So you don't have to worry about what IPs are currently in use. Since all the networks will be shared across deployment, Bosch will know what IP is currently in use and what IP it can allocate to the next instance. Also, that allows to separate responsibilities between deployment operators and infrastructure operators. So now deployment operators can focus on their deployment layout and don't worry about infrastructure-specific properties. So let's take a look what constitutes Cloud Config. This is how deployment manifests used to look like. It contains some deployment information and lists of releases that deployment will use. It listed some networks with cloud-specific properties and listed resource pools, which are basically types of VMs that deployment can use and it also had some cloud-specific properties and referenced some stem cell in them. Also, disk pool section listed types of disks that you can provision in your cloud provider. And compilation section defined how Bosch should compile the packages and what type of VM it should use to compile it. Usually it's the most CPU-efficient VM type. And update section defines how to perform a rolling update of your deployment and jobs define what to install and how your deployment should look like. So what is Cloud Config? Cloud Config is basically that. It's your infrastructure-specific properties now in one place and it defines types of resources that your deployments can use, like networks, types of VMs, disk types compilation. So that leaves your deployment manifest focused on your deployment layout. And potentially, it makes it easier to migrate your deployment manifest to different infrastructures. Also notice that resource pools were renamed to VM types and disk pools were renamed to disk types and stem cells were split out to deploy manifest because different deployments can use different stem cells. So the way you configure your director with Cloud Config, you run this command. Once the director is configured with the Cloud Configuration, Bosch will start allocating IP IS for you. And you don't have to worry about IP allocation from that point. Once your director is configured with Cloud Config, your deployments that are already deployed will still behave as they used to be. But they will start using Cloud Configuration whenever you redeploy them on the next deploy. With Cloud Config, your deployment manifest will look a little bit different. As you can see, it is much smaller. It doesn't have any Cloud Configuration in it. The stem cell section is a new section and now you can reference stem cells by operating system image name instead of using specific name of the stem cell. Also, notice that jobs were renamed to instance groups and jobs used to be an overload term in Bosch. Every time we're talking about jobs, we had to clarify what jobs are you talking about? Is this release job or deployment job? So now this confusion is resolved. Now jobs are release jobs and templates were renamed to jobs and jobs were renamed to instance groups. So we know very well that Bosch is good at deploying Cloud Foundry. However, if you've tried to deploy any clustered software with Bosch and you are often in this situation where you have a master node and you have a couple of slave nodes trying to find the master node. How do you do this? Some people were doing this with static IPs, but that meant you have hard-coded IPs in your manifest YAML, which made it harder to manage. Also, even if you could do that for a single deployment, what if you had a web server that now depended on this? Would you hard-code static IPs for your different deployment in the manifest of your web server? That sounded a little bit awkward. Some people were using HA configuration servers for this, so something like at CD or ZooKeeper, but that added more points of failure and that was also the spiff solution, but I'm not going to talk about that. So now, introducing links. Links provides a way for these authors to very concisely state what you provide as a release and what you consume as a release. This makes the public API of a release very clear and you can now depend on other things without having to specify implementation details like exact IPs or exact properties that you depend on. It also makes it really easy for the deployment operator because they now can only see what release, what each job consumes and what each job provides, as opposed to specifying duplicate properties across each section and specifying particular IPs for each job. As a release author, you also have access to other things that a link provides in your release templates. So for example over here, this release author is consuming name, ID and address from primary DB. This also reduces the amount of YAML because you no longer need to duplicate all these properties across all jobs in your manifest. You can now just specify a dependency on a link. A good success story for links is concours used to use a lot of static IPs in their deployments, but now their deployment manifest is completely free of static IPs and is only using links. So another feature that was introduced is availability zones. If you ever had to spread your instances across availability zones in Bosch, you know how awkward it used to be. Basically, you had to duplicate some sections of your deployment manifest like networks, resource pools, instances and give them different names like Z1 and Z2 and the only property that's going to be different there is their availability zone cloud property. So now availability zones are first-class citizens in Bosch. So that what that means is that you can list availability zones in your cloud config and you can reference them in your instance groups and Bosch will take care of automatically rebalancing instances across those availability zones. Whenever you scale up the number of instances or scale down, Bosch will automatically rebalance those instances. Also, we plan on adding a feature to manage your availability zone state so you can turn off your availability zone and Bosch will distribute instances in that availability zone across remaining availability zones. So how does it look like? You specify the list of availability zones with their cloud properties in the cloud config and then you reference them in your instance group. So as you can see, there is no duplication. It's much smaller and looks much nicer. In case if you already have an existing deployment that used availability zones, before this feature was introduced, you can migrate to availability zones by using migrated from property. And basically what it does, it merges your instance groups into one and Bosch will take care of preserving existing instances, renaming them and assigning availability zones to them. Usually release offers would like to know what node is being deployed first so that they can run some setup like database migrations. And they used to rely on instance index to figure that out and you can imagine how complicated that looked like if you had those instances that was spread across availability zones because you had multiple instances of index zero. Well, you should not rely on indexes anymore and instead you should use this bootstrap property which was introduced and that will tell release that the node that's being deployed is the first node and you can put that in your release configuration file templates to run some setup like database migrations. So as we introduced a lot of these new features, we noticed that the Bosch director was doing a lot, a lot more than it was doing before and we wanted to make sure that we were constantly testing it as much as we could. So keeping in mind that it was now responsible for IP allocations, we wanted to introduce a set of load tests. Load tests that as we have it currently deploy a hundred deployments in parallel and they try to catch race conditions, specifically read-write race conditions around IP allocation to ensure that when we are allocating IPs we're really extra cautious and there's no IP conflicts that happen. The other thing we introduced was first tests and what we wanted to do with this is we wanted to introduce some exploratory testing into our process. So on top of the unit testing and integration testing and test-driven development we wanted to simulate a user using AZs and trying to rebalance across AZs in a very random fashion. So we were trying to force edge cases and make sure that Bosch director behaved in the way that you expected to even after strange rebalancing or making sure that your data was always safe and did not get detached from the deployment even after it was rebalanced or different IPs were allocated to it. So we hope with load and first test we've tested the system to the extreme and that what we've released is really ready for production use. However, there are always accidents that unfortunately do happen and in this case the one thing we really wanted to be clear with is that we never lose any data. We all love our data and it's very precious to us. So we introduced the idea of orphan disks. Orphan disks introduces a form of soft deletion of persistent disks to Bosch. This means that if your persistent disk gets detached from your deployment either due to an accidental delete or due to rebalancing, it's still always kept around for five days. So you can do Bosch disks orphaned and notice over here it'll list out all the disks that were orphaned and what time they've been orphaned at. You can then easily do a Bosch attach disk and reattach the volume to your deployment. This also makes sure that in case in the meanwhile the instance group did have a disk attached to it or the deployment did have a disk attached to it, that disk is now orphaned. So you're never in a state where you're losing any data. Another feature that we introduced to kind of make data loss even harder is Bosch backup and restore. With Bosch backup, you can back up the director database and restore it again at your convenience. We intend for this feature to be used continuously or for you to do it before when you're being extra cautious before a particular upgrade. So release offers can affect their deployment lifecycle by implementing callbacks to certain deployment events. As you know, we have a drain hook that release offers can include in their release and Bosch will call that before stopping the job and that basically notifies job that it's about to shut down. It should stop accepting new requests and it should process all currently executing requests and we introduced now new hooks after the job is installed and before it started job will call a pre-start script and this is a good place to include your setup scripts like database migrations and you can use that bootstrap property to figure out if this is the first node. So Bosch will run it before the job is started. After the job is started, it will run a post start script and then after deploy is finished, it will run a post deploy on every node. These are the good places to put some logic to notify services about completed operation. Another piece of feedback that we heard kind of in line with reducing YAML was that there is some software or some jobs that need to run on every single VM on every single deployment. These could be something like security software or log forwarding software, but it's tedious to have to put it in every single manifest file you have that leads to a lot more YAML. It's also difficult to manage release versions for each of these software if you have to do it individually per manifest file. To address some of that we introduced add-ons and the runtime config. The runtime config is another YAML file, but it's only one file and it doesn't duplicate your jobs. So here you can have releases. For example, here we use log rotate and anti-virus and if you add an add-on which is just like the job section to your runtime config, the next time that you Bosch deploy, that particular deployment will have all the jobs that were specified for the runtime config of that director. So it's pretty easy to use. You update runtime config until the next time you deploy, you're not going to end up with all these jobs on your deployment. So it won't effect existing deployments. However, the next time you do deploy, you'll end up with new jobs on every deployment. To reiterate everything, every feature that we introduced here is backwards compatible and you can opt-in to each of them gradually. So you can update your director to start using cloud config and from that point director will start managing IP allocations for you. Then you can remove those cloud properties from your manifest. You can migrate your deployments to availability zones. You can start using deployment hooks. You can remove all the jobs that you have to specify in all the packages that you have to include in every job into add-ons. And that will minimize your deployment manifest. So while we were developing all these features, we wanted to make sure that we don't break any existing functionality. So we set up separate CI pipelines for new features and existing functionality and it is easy to do so with concourse. Concourse is a CI system that is used by every team in cloud funding and it's a revolutionary CI system which simplifies our lives a lot. So as we spoke about some of these great features that we worked on for the past year that we know you're gonna try out when you leave from TF Summit, we also wanted to give you a glimpse into what's coming up for the next year and what this talk is gonna be about next year. So one of the things we're working on is stem cell hardening. Keeping in mind using Bosch for doing more robust production use cases. We are working on making Bosch stem cell come closer to complying with Stig and CIS benchmarks. So that's something to look forward to. We've introduced a Bosch events command which provides fine grained audit trail for what happened to particular deployments. So you'll now be able to see things like who created the VM, when was it created, and you'll be able to view that all with Bosch events. Last year we spoke a little bit about integrating Bosch with UAA and we've added more announcements to that. So for example now service brokers can specify UAA scopes and limit visibility to only see deployments made by them. Keeping in mind some of the feedback we've been getting for dynamic service provisioning, we want to work on a solution for high availability DNS. And finally we're gonna keep on trying to simplify manifests and trying to make less and less YAML for you. So an example of this is we're gonna try to provide sane defaults for things. We already do this with links. So links by default are matched up based on their type. So you don't have to specify much when you're wiring up links. And we're looking for other ways in which we can introduce more defaulting behavior to YAML. So you have to write, you don't have to write as much YAML. All right. So we looked at some of the features that were introduced in Bosch in the past year. You can learn about them more at Bosch IO. Bosch IO is a great source of Bosch documentation. Also it's open source. Go to our GitHub repo and if you feel contributing, please start contributing. Join our Slack channel and start conversation about anything about Bosch. And we have some time for questions, but you can come and talk to us anytime during the summit. There's a Bosch team here, Cam, Tyler, Schultz, Dmitry Kalin, who was the mastermind behind all these features. So I think we have five, six minutes for questions. So the cloud config that's in the director, your deployments will use the cloud config at that point of time when they were deployed. So if you're going to do like cloud check and other deployment flows, they will use cloud config at that point of time and you can safely update your cloud config meanwhile until you decide to deploy your deployment again. At that point, it will start using new version of cloud config. Yes, so all these features are available now. And if you updated your director in the past six months, you probably have all these features already and you can gradually switch to them at your convenience. And grab the latest Bosch release version from Bosch IO website and you will get all these features. With cloud config, it will use one compilation for your infrastructure. So usually it's like the most CPU efficient type of VM that you will use for compilation of your packages. Basically, yeah, it will use the one compilation config for all the deployments that are managed by the director. Yeah, so some reviews have switched already to these new features like concourse releases using all these links and other things. And CF reviews is gradually switching to that. There's like spike that was done and that minimized deployments manifest a lot. But also you can continue. So all of these features, including links, it doesn't have to be all moved over in one sweep. So you can continue to use non links releases and links releases with the same director. You can continue to use your old style manifest and old style releases, everything. Once you add a cloud config, the next time that you do a Bosch deploy, it will ask you to change your manifest. But for all existing deployments until you update them, they'll continue to function. They're just like Bosch releases. So you'll have to upload the release to a director. And then instead of specifying it in a manifest, you specify it on your runtime config. I don't know. We typically try to maintain it for the next six months is what RPM tells us. But don't quote me on that. Okay, I guess that's it. We have one minute, but no questions. So thank you, everyone, for your attention.