 I'm going to talk about four principles of release engineering with Bosch. My name is Maria. I work as a software engineer for Pivotal in London and joining me on stage today is Jatin also a software engineer for Pivotal. All right, so we were asked to make this far exit announcement which I'll have to summarise apologies, essentially if there is an alarm please follow stuff outside the building. Okay, so release processes are a part of life is my opening statement and essentially if you think about any live software system or any software component that gets shipped to users, you can trace it back to being a bit of code and dependencies. I know this is like minimising things a lot, but it's kind of what it is. So essentially you always have some sort of process that takes that source code and dependencies and converts it into a system that can be used by end users. So for example if your ship software is a software library that could be a gem or a jar file and live in that situation means published somewhere so that users of that library can pick it up and consume it. So release processes will typically include some sort of testing and verification and integration of things, but let's look at them in a little bit more detail with the example of a simple website and the flow that it follows to take it to go from feature work that a developer does onto a production system. So at the beginning of time you would have an operator that sets up a bunch of environments or your production environment, some staging and some testing environments and at the same time developers would have their own development stations and environments to do their work on. Once a developer finishes their feature work they would typically push their code so that it goes through maybe an automated testing system and be deployed on quality assurance systems so that acceptance teams can do acceptance testing on it. And once it's cleared that stage it would maybe be deployed on a staging system that's almost exactly the same as production just to polish off any last sort of bugs that were not caught in smaller scales testing. And then after clearing that it would be deployed on to production. So the release process does not stop there. It includes support that the operator would have to do on the production system so any bugs that need to be fed back to developers and sort of dealt with. Okay, so even for this relatively straightforward release process for a relatively simple software system we can start spotting some challenges and some traps. So one common problem that comes up is the works on my machine problem also known as it went through QA but still broke in production problem. And that hints at a couple of different issues but the main underlying issue here is that it's really, really tricky if you don't have the right tooling to consistently create deployment systems that look exactly the same as each other. The other common problem that comes up is the Steve knows how to deploy it but he's on leave this week so we can't do anything problem. Which again hints at a number of things for example tribal knowledge but mainly points at the underlying issue of needing manual intervention to do things which first of all is error prone and secondly always a bottleneck. The third problem that comes up is the external dependencies one also known as works only if JDK 236 or some other magic dependency has somehow appeared on the build machine. Again a multiple things happening here yes tribal knowledge but mainly the implicit dependency on things to pre exist on the machine so that stuff that either the code doesn't package itself or it doesn't explicitly mention that it needs. And finally something that comes up quite often is reluctancy from the operators to ship often because they're scared due to their past experiences that new features will break the stability of their existing production system. Lots of good tools are out in the world to help with exactly this problem. So like tools that automate infrastructure provisioning such as Terraform help with the first couple where you can provide a specification and like consistently build systems that look exactly like each other. Or configuration managers such as Chef and Ansible are out there to help with automating the configuration of those systems. OK so so far we saw that even simple release processes have traps and are quite tricky. Those challenges get magnified as systems grow in complexity and scale and become more distributed. And as I say this word you might be thinking of a system called Cloud Foundry which is quite quite complex. Let's see what makes it so special. So first of all Cloud Foundry is made up of a number of different components. Those components are deployed in a lot of VMs. They interact with each other over a number of different interfaces and protocols. Each component needs to be able to be scaled up and down as needed without the other components that know about it knowing that this ever happened. So they should be able to just interact with it as they used to. On top of that each one of the components is developed and supported by a separate team. Those teams live all over the world. I believe there are over 20 Cloud Foundry teams that do some sort of work related to the Cloud Foundry product. They may be distributed themselves so that's an additional bit of interesting bit here. Each of the teams works and releases at its own cadence. But they also need to verify that the components that they interact with and those interactions are stable and keep working. On top of interactions there's also some complicated and interesting configuration management that needs to happen in terms of for example sharing credentials across components. Once the teams have released their own components the Cloud Foundry product needs to come together. There is typically an integration team that does this job. They just make sure that the communications work, that the whole system works well together and places a stamp of approval on it. In the end this gets packaged as a Cloud Foundry product and given to operators. But this is not the end of the release process because then operators can take CF and deploy it on a number of different infrastructures. It should work on each and every one of them transparently so exactly the same way on vSphere as it does on AWS. This whole cycle needs to happen every two weeks which is when a new CF release gets shipped and it needs to be quite consistently so. On top of the two weeks the value add versions of Cloud Foundrys have their own support agreements that they need to implement. For example PCF has a support agreement that they will turn around vulnerability patches in 48 hours. That means that this process needs to be quick, efficient, repeatable each and every time. Again it does not stop there once running the system will occasionally catch fire, usually metaphorically. But what happens then is that it's essential that operators have enough information so that they can make sense of it and they can fit it back to the correct team that has the right context and is best placed to support them. We saw that things get quite complicated when systems get growing scale. This is the problem area that release engineering comes in to solve. Release engineering focuses on... It actually adopts sets of practices from software engineering and brings them into software release processes. It covers anything related to compilation of code, assembly and delivery of source code into finished products or components. Now each and every system and every component will have its own special traits and its own unique challenges. But ultimately a release engineering system or framework rather will answer four main questions which we'll just look at in a bit more detail. Okay so question number one is given the same source material so the same code and the same set of dependencies can I guarantee the exact same behavior independent of the specific setup of the machine? So reproducibility is all about the ability to integrate components to build a stable system. And when I say components here I don't only mean code, I also mean the specific infrastructure that this is built on, like the specific VMs, the networks, is that am I able to make that happen in a way that's stable? In practice an implementation that's quite common is commonly offered by frameworks that address reproducibility to offer a specification that can be versioned as code. And given that specification make it possible to reproduce the exact same topology of machines that the specification describes. Reproducibility is important for a couple of reasons. The first one is that it enables the same setup between various environments like dev, staging, production. The reason why that is important is that you know that something that worked for you as a developer in your development station will most probably work in production as well. It also goes the other way around so a bug that you notice in production is something that you can quite easily take back and reproduce in a development station quite quickly so that you can focus on fixing it. The second question is how often and at what risk can I ship new functionality? Or in other words how can I move fast without breaking things? And this is where agility comes in in release engineering frameworks. So agility is all about allowing components to grow and move independently from each other. It ties back to speed of delivery, speed of integration, speed of patching things. And a common implementation is introducing layers of abstraction so that software can move around freely on either side of these layers as long as it implements the agreed interface. Integration with CI systems is also an implementation of agility and anything really that removes delays or contributes in reducing risk. Agility is important because at the end of the day fast feedback loops mean lower risk because the smaller things that you can ship and the more frequently you can ship them the less likely you are to go and invest strongly on their own direction. The third question that release engineering answers is where do I base my abstractions? And that is a pretty abstract question in itself so I'll try and break it down a little bit. What I mean by that is that consistency is really about a framework keeping an interesting balance. So on the one side not changing very frequently so that you can consider it fixed and you can depend on it to have a specific behavior. But on the other side of that balance being flexible enough and granular enough so that you have complete control over the specification of how your software should be built and how it should be run. Consistency is also about offering a paper trail that tracks changes and versions so that different bits in your framework know information about your code and how it is deployed. The reason why it's important is because stable interfaces are dependable and predictable and you don't need to worry as a developer or an operator about them changing. Auditon accountability are really interesting features in a release engineering framework because they allow for more efficient operations. The last question that a release engineering framework answers is what tools do I have available to me to identify the versions of components that I'm packaging and running. Identifiability is all about connecting the dots really at various stages of the software life cycle. So in terms of deployment for example operators would like to understand what versions of components are running together down to the gitshaw that the developer shipped to finish their feature work or the operating system that the software is running on or the version of the dependency that was packaged with the source code and so on. A differing example is at runtime so locating components based on information collected elsewhere in the system. What I mean by that is if I know that I have a problem on my VM do I have enough information there to figure out reasoning or track it down to a specific version of a dependency. It's important because it just speeds up tracking down bugs and identifying problems that affect the whole deployment or whole cluster versus one specific machine. So just a quick recap of the content so far so we saw how release engineering are everywhere software wants to make it from source code into a production system. Challenges exist even with the most straightforward of release workflows but are magnified many times over as systems grow more complex and distributed and with that Justin will take you through the ways that Bosch does release engineering for complex systems. So this part is about how Bosch solves the problems described before. So Bosch is a release engineering tool chain that we use to ship a lot of complicated systems. So there are a number of tools on the market which would allow you to ship a fully configured server in a completely reproducible way. So tools like Chef or Puppet. The main USB of Bosch is it allows you to ship fully configured clusters in a reproducible and consistent way. So the way that it does that is Bosch really works in a director model so it lives in an orchestration VM and it can provision clusters and monitor them for you. As an operator or as a person deploying software to Bosch you need to provide it four pieces of information. Those are like releases stem cells, a deployment manifest and a cloud configuration. So let's go through them one by one. So a release is a self-contained representation of the software that is supposed to be deployed and it's lifecycle. So releases are normally produced and tested by the developers of software. We normally call them release authors and each release contains a collection of jobs. A job is like a fundamental unit of deployment in Bosch and which can be placed on AVM. So in this example we have just taken two random releases from the CF deployment which is Rooting and Diego. We can see that in each of those releases there are jobs like Rooter, TCP Rooter or in Diego, BPS and Rep. So these jobs can be deployed independent of releases. We can take a look at what is inside each of these jobs. So one of the key things to note about Bosch is it controls the VM lifecycle as well as the way that the software is placed on the machine. So when you are writing software that can be deployed with Bosch, you cannot make assumptions on what other software is deployed on the machine that your software will be deployed to. So that forces jobs to be completely self-contained and independent of anything else. So each job normally contains these, like at a high level it contains three things. So it contains a specification file, a monet file and a bunch of packages. The monet file describes how to start and stop and monitor a job. This ties into the consistency aspect that Maria talked about before which is like just one common interface through which the lifecycle can be controlled for anything that is being deployed and run by Bosch. So the monet contract also tells how to start the job and like when it has died. The next one is the spec file. So the spec file is essentially a declaration of all the properties that can be consumed inside the job. So if you want to, as a release author, if you want to receive any property from the outside world, you have to declare it in the specification file if you want to receive the property. The last one, the final bit is the packages. So the job has to package all of its required dependency. So this can mean a package can be stuff like source code or all of the dependencies that the source code uses. An example for this in this use case is the GoRouter has its source code which is in Go. So it has a dependency on Golang and it also has a dependency on the Nginx server. So each package dependency is unique to the release in the larger deployment. So for example, in CF we have multiple components using a similar dependency like Golang. But each of the releases have to package their own Golang. So the reason for this is as a release author you don't want to be worried about if I change my version of Golang in my release, does it affect any of the other releases in the deployment? One more thing to note here is the release mechanism essentially tracks all of the packages that have been ever deployed in production or that have ever been shipped that makes them auditable. So for example, if there is a CVE or a vulnerability that has been discovered in one of my dependencies, I know exactly which shipped versions have been affected by that vulnerability. So the next artifact that you need to deploy stuff with Bosch is the stem cell. So stem cell is like a bare mineral operating system image that is normally produced by the Bosch team. So it has some IS specific wrapping on it like for example for vSphere that will be like a VMDK image and for AWS it will be an AMI. It allows clear separation of the operating system layer from the actual software that you want to deploy on the operating system. One interesting contract that this thing provides is the files on the image would be the same for across ISs. So the Bosch team will ensure that you get the same files for an operating system independent of the IS that you are running or the image that you are running on. So this allows release authors to really ensure that their code will be compiled and run in the same way across ISs. So this is like one of the pieces of the puzzle on how we can run stuff across multiple ISs. The next bit of information that needs to be provided is called the cloud configuration, which is essentially like a very large mapping file which maps abstract resource types to specific types or resources on the IS. And this file is normally written by a cloud operator because you as a cloud operator would have all of the information around like what resources you have with you to deploy stuff to. So this is an example cloud config for an AWS environment where like there's an abstract VM type called small and small essentially maps to a type on AWS which is T2 medium. There is another abstract VM type called disk large which is like a 100 gigabyte disk. And that also applies to networks and all the IS specific resources. And so the last bit of information that you need to create a deployment is called a deployment manifest. So the deployment manifest is like the fairly famous large animal file that you need to deploy anything with Bosch. So the deployment manifest essentially is a description of like what VM should be deployed and what job should be present on those VMs. It also ties all the components that we talked about into it together. So it will tell for each VM, it will tell what stem cell should the VM be started with. It will also say what abstract resources should be attached to that VM. So it does not really talk about like specific resources on the IS, but like that is mapped through the cloud integration. So this deployment manifest remains independent of the actual cloud that you are deploying it on. So the deployment manifest are normally produced by integration teams. So for CF it will be produced by the CF integration team. For smaller components or smaller products it might be produced by the product teams themselves. They are fairly independent, IS independent, and they are fairly deployment independent as well. All the secrets can be abstracted out into variables which can be injected into the deployment manifest itself. So let's quickly go over like how Bosch actually does deployment. So the first thing that Bosch does is takes releases and stem cells and compiles them and compiles all the jobs that are needed in the deployment to be compiled jobs. And these jobs can then be placed on VMs that are described in the deployment manifest and it uses the monocontract to start it. If all the jobs declared in the deployment manifest start then the deployment is successful or the cluster has started. So like one thing that I want to emphasize in this is like the combination of releases, stem cells and deployment manifest ensure the consistency and repeatability aspect that Bosch gives us. So what does this mean? So if two people attempt to deploy the exact same manifest with exact same stem cell and the exact same release version they would get exactly the same results. And this is a very powerful concept that we use all over the place while supporting these components. So one important thing to note here is I did not include the cloud config in that. So which means that like the repeatability aspect is independent of the exact infrastructure that you are running it on. So we leverage the combination of these three a lot of times. So for example when the integration team wants to know if they can ship a version of software they would test this combination. If there is a bug in production or whatever and we want to get information on how we can reproduce it these three things are enough to create a similar system. So to quickly recap we talked about release processes in general. What are the challenges in releasing larger spirit systems? How release engineering helps with that and how the Bosch tooling helps us deliver complicated distributed systems. So if you want to get started with Bosch there is a Bosch bootloader talk at 2pm that you can check out. This will enable you to set up a director yourself. Thanks.