 Okay. So I think we'll go ahead and get started. Thanks, everybody, for coming. I am Andy Lowe. I'm with All States Composed Labs. I'm part of the automation engineering team over there. And I manage a suite of tools that are all about helping our developers go faster. And one of those is part of our deployment technology. So I will be talking about that today. This slide is probably familiar by Friday afternoon of a three-day conference, but just in case, please make sure you know the locations of emergency exits. If there's an alarm, please call me exit to the concourse area. There is emergency exits outside and listen to people giving you directions, especially if they're public safety staff. Questions? Okay, good. So when we're talking about the process of building software, there's a lot of stuff that goes into it. But for most people, only small parts of that are actually interesting. And there's actually a lot of overlap with the stuff that people want to do and what the business wants to pay people to do. But when we start thinking about those things, the reality is that not all, not only is there the stuff that needs to happen, but there's all this work that people are actually doing. And that leads to, they're not always being a lot of overlap between what is supposed to be happening, what you want to have happening, and what's actually getting done. What you really want is a world where all the things that need to happen and all the things that are getting done completely overlap with each other. But the things that we want to spend time doing are also the things that we are spending time doing. That would be an awesome world to work in. So how do we get there? Let's figure out what we're doing with our time first. All of this is good stuff. People generally value writing code and having tests, change controls good, yak shaving maybe overrated, but all the rest of this is actually useful things as part of the software development process. And so we don't want to get rid of any of it. Thinking about what it takes to actually get code from the initial idea out to production. In a base case, you write some code, you do a commit, you have some automated tests, someone does a deployment. If you're into true CI CD, then your acceptance is completely automated, but maybe you have someone like me that's standing there doing acceptance, and then you push out to production. That's kind of the simple workflow. But the reality is there's all that other stuff that people are spending time doing as well. So let's start looking at what can we peel out of here and make it easier for people to do. As you can probably guess from the topic, I'm big into automated deployments. One of the big reasons is in this day and age, if you're not doing automation, please do, it makes life much easier. Things will go badly if you're not doing automated deployments. Yes, everyone knows how easy a CF push is, it's a single command, but usually there's a lot more to deployments than just a single CF push, and it's a repeated process. If you're trying to push on a very regular basis, people are doing it a lot. If you don't have automation, something's going to go wrong. So let's automate it. CF standard is blue-green deploys. Quick show of hands. Who doesn't know what a blue-green deploy is at this point? Who has never written a blue-green deploy? Okay, you guys are going to love these next set of slides. This is kind of the standard approach in CF, and it is pretty straightforward. You have your app out there running. We'll call it your blue app. It's running life's happy. You have your routes all mapped to it, but I want to push a new version of the app. So we'll do a CF push. We now have a green version running, and we have both of them out there, but traffic's only going to your original app because you haven't actually moved the routes over yet. So let's go ahead and move our main route over. You now have prod traffic going to both those versions of the app. Delete the route from the old app. All your traffic is going to the new app. Delete your old app. You're done. That's a blue-green deploy. In terms of the CF command line, it's really pretty straightforward. You push your blue app, then you have some commands that you run to do the actual blue-green swap when you're ready to do the new version. One of the things that a lot of people are doing is you start actually having an environment that looks like this, where you don't have just a single availability zone, but you have multiple availability zones. For some nice number, it makes for a pretty picture, so we'll look at it with four availability zones. The process is actually exactly the same. We have our blue app running. We push our green app, go through, map your routes, unmap your old routes, delete your blue app. You've done a multi-availability zone deployment. Do you want to write commands for that every single time? If you said yes, you're wrong, please don't, because one of the things that you start running into with this is all those commands I had earlier didn't have any error handling. If you're doing automation, please have error handling because something will go wrong. If you're doing a deployment across four availability zones and three of them go well but one of them doesn't, that gets to be a real mess, because you have to start worrying about the synchronization across the different availability zones, making sure everything stays in sync, and if something goes wrong on one, doing that rollback on the other three gets to be a real problem. Allstate ran into this basic problem as soon as we started saying we really need multiple AZs for availability reasons. We built out a tool called DeployDactyl. That's open source tool. It does blue-green deploys across multi-availability zones. We wrote it and go, and one of the things that we did was say there is this core deployment piece that everyone is really going to have to worry about, but we also recognize that every enterprise has their own custom deployment scenarios and deployment logic that they want to have, so we put in an event model so you can hook into that and write your own wrappers around that to do whatever is appropriate for your enterprise. We've actually done that, and I'll talk some more about what we've done that's allstate specific later, but just in terms of DeployDactyl, it's out there and ready to use. So who wants to see a deployment? Deploys are fun, right? Let me come out of pre-center mode. So to call DeployDactyl, it's, yeah, yep. There we go. It's as simple as doing a curl. That's all you have to do, and while that's running, come back over and talk some about the configuration for getting this going. What you saw was a single curl, but there's actually a lot that goes into that. Your app teams aren't necessarily going to have to keep track of what are all the different foundations. So DeployDactyl is set up so you can just say, let's have a pre-prod environment, and it's going to be pre-prod.example.com, and these are all the foundations that go into it, and we'll just have the authentication that we configured within DeployDactyl. We're not going to make you authenticate each time, skip SSL, because everybody does that, and by the way, we're going to go ahead and enforce that you have to have at least two instances running, and we have another environment that's production. We have four foundations there, and by the way, we want everybody to have four instances running for availability reasons there, and so we can enforce that in config. What that then ends up looking like for your app devs is they just do that curl, they specify one of these environments, they specify the org that they want to deploy to the space, and they give it an app name. Credentials can either be passed in through basic auth, or you can configure it through DeployDactyl environment variables, and the manifest is just passed with the app, and so you end up with just this really simple curl command that you can then build into whatever pipeline you're using, whether it's your desktop Jenkins, whatever it is, and if we come back over, then it hasn't quite finished, which is terrible, but in just a second we'll get logs showing up saying that we had to deploy succeed across all the different availability zones. One of the things that I'll talk about in a minute, and you probably saw in that curl, is this health check endpoint that is another thing that we pass in, and rather than wait for this, we're just going to keep going. You guys can come look at my screen later and trust me that it happened, but within the event model it gets to be pretty simple to set up these event handlers. They're just go functions, you write a go function, you register a health check, or you register an event handler with a handler manager, and you're good to go. We have a lot of different type events that show up. We do things like events on push complete. That's push on a single AZ, not necessarily all of them. We also have events for deploy start and stop, deploy finish. We treat success and failure differently because you might want to take different actions based on those things. So there gets to be this really robust set of things that happen within the deployment process that you can hook into and take different actions at different points. So let's talk about health check some. Cloud Foundry within it has the idea of health check built up. It will do a quick health check that really is just saying, is my app running? Is it sending a response if I ping it, and am I getting traffic back from it? That's all well and good. It makes sure that your app is live, and that's a pretty cool thing. But there's a lot of health checks that might go beyond that, because if all your app is doing is sending back a 500, well that's actually not an app in a functional state. Yes, it's live, so you probably don't want CF to actually get in and do a restart on it every 30 seconds, but it's not a useful thing. There's a lot of things that might cause that, whether your app is just flat out down, you did something like do a deploy to prod, and you forgot to open up a firewall rule, because no one would ever do that, but just theoretically you might. Or you forget to update a configuration setting, you added a new feature that requires you to connect to a different database, and you put in your non-prog credentials in your prod environment, and suddenly you can't connect. So one of the things that we've built in is this idea of saying, as part of that blue-green deploy, let's actually run a health check on your app before we actually complete the deploy, and we'll check and see is this app in a functional state. So that's what that endpoint that you register is, and DeployDactor will call that. What that ends up looking like is on the push finished event, then we call the health checker, and it goes out and calls that health route on every single one of the green apps, and again you'll notice this is before the routes have been mapped over to the blue app, so before this app is taking any prod traffic, we're doing the health check to see is this app actually in what you define as a functional state, not just did the push succeed, did the app start, but is it working the way you want it to work? If it fails, we get back something other than a 200, it's trivially easy, we just delete the app, and you go about your merry way of fixing it, your users never notice the difference, it just looks like a failed deploy, but because we're doing this as part of the blue-green process, we never took down your original app, and so your users never have an outage. If it does succeed, that's great, we just continue with the deployment process, and life goes about its merry way. So once we've started building out this automation, then we get to this cool place where all of a sudden we have a lot of that work that people are doing before has now been automated, but we still have a lot of things that people are doing manually. So that's where the event system gets to be really pretty powerful. One of the things that we've started doing is saying on every successful deploy, let's run a security scan, and that can be whatever kind of scan you want, we've done static scans, but through that event model, you can go out and run whatever you want and say, does this app actually meet whatever security criterion you have in place in your organization? You can do things like run that on each non-prod deploy, and then know by the time someone pushes to prod whether that artifact has been through your security scanning process, and so is it actually something you want running in your environment? One of the other cool things that you can do is most large enterprises actually care about open source licensing, but if say you're writing a node app and you have 900 dependencies, your dev team probably didn't actually take the time to go through and check the license for every single one of those dependent projects. If you're doing a Java app, maybe it's not quite 900, but even so, there's this large burden of making sure that you are actually compliant with license policies, and so you can start building that into your deployment process to do those scans. There's a lot of tools out there that will do that. That also gives you the benefit of you actually know what libraries are in use, so if someone identifies a zero day in one of the libraries that one of your teams is using, you have that central registry to say, well, I actually know what's in production, I know this library is in use, it has an identified vulnerability, so I need to go out and tell this team, update their stuff. One of the other great things, if anybody called campus has talked the other day about the pain of config management, I loved his talk, it was great, go watch it on YouTube later, but by building this kind of logic into the deployment process, you can say for every single deploy that happened, good or bad, I know exactly what happened, I know who did that deploy, I know what they deployed, I know what steps were followed to do that deploy, and I know whether or not it succeeded. That's basically the exact set of information you want for change management purposes, and so by creating those records automatically, then you suddenly have much better data for change purposes, and by the way your devs didn't have to do anything, which they really like. So what we've done is say, okay, let's take all this work that everyone thinks really should happen, but let's just automate it and build it into the process so that at this point our devs have more time to just focus on writing code and actually getting done what we want them to pay for or what we want to pay them to do, and what they probably actually want to be spending their time doing as well. One of the really fun things that comes out of this kind of mindset is organizational change ends up being a lot easier. You end up in this virtuous cycle where you suddenly have a really nice easy development or deployment tool that your devs like to use, and through that a lot of pain is taken away from them, security likes it because all of a sudden they know security scans are happening on everything, change management likes it because they don't have the case where, oh, I forgot to do a change record, or I forgot to close my change record, or I actually completely lied about what I changed and my data isn't right, and you suddenly have all this great data, and so all these different teams start saying, hey, why aren't you using this tool because it's a lot easier, your devs end up in the situation of they want to use the tool because it takes away a lot of pain for them, and so you suddenly start in this model of everyone is getting what they want and they want to start using the tooling and not circumventing the process because the tool takes care of the process for them so they don't have to worry about it, and the people that care about the process know that everyone is actually following the process the way they want it to be followed, and not only that, we have the data to prove it, so it really helps with some of these organizational change efforts and ends up really being a win for everybody. So what we've done at Allstate is we had the need for this centralized automated deployment tool that would handle blue-green stuff. We built out to PlayDactyl, but on top of that we built a wrapper around it, excuse me, that we ended up calling Convair, and it handles all of the Allstate-specific stuff. That's our change management, a lot of the metrics that we track around deployments handles things like our security scans and compliance, and that's all built just using the eventing system within deployed Actyl as wrappers or as event handlers that either do things directly within deployed Actyl or call out to other systems to take action there. So what that's ended up looking like for us is we have about 87 Cloud Foundry orgs in prod that is roughly a one-to-one mapping to product teams, and across all our environments we have over 4,000 deploys each month with about 1,000 of those going to prod, and so if we were to ask people to go through all of those steps that I laid out manually for every single one of those deploys, it would be painful and would basically suck for everybody. About a year ago our security team came to us and said, we want everything that goes to prod to have a security scan, and we said, great, everybody's doing that, and no problem, we've got to cover it. They said, sure, prove it, and we went back and looked and said, let's look at the scale at which we're deploying and look at the tools people are using and found out that we could easily prove that only about 32% of people actually had deploys going to, excuse me, had security scans on every prod deploy. Now, let me be careful and say that we had proof of that, and then realized we didn't want to spend the time to actually go out and prove it for the remaining, what, 78% because that sounded painful. So we said, let's go ahead and introduce security scans into the deployment process. We built that in, and basically overnight our number jumped to 92% that we could very easily prove had security scans going, and there were some caveats on that where either something wasn't using the tooling and they really should have been, or a scan failed for whatever reason, and so that's why we're not quite at 100%, but we're working all the time to get that number up. One of the other really cool things that came out of that is we've gotten into this mindset where this is a graph of deployments to production over a month where every bar represents a day, and you'll notice there's this pattern of five days where deploys happen and two days where they don't. You can guess, people aren't doing deploys on weekends anymore. Teams love that because we have the automation in place, and we know that these deploys are going to go smoothly, that everyone is really comfortable with this mindset of let's deploy to prod during the week when teams are there. Sure, it's prod. Users are hitting these systems actively, but we're not going to bring them down because we have the systems in place to say this is a secure deployment model and it will work. It's been great for our dev teams because they now have that comfort to do that kind of thing, and it's great for the sake of everyone else too because now they have the data to say this is actually working. So if you're getting anything out of this, just automate your stuff, let your systems take care of things your developers don't have to. They will be happier. You will get better data out of it. You'll have better processes, and everyone will actually prove that things are working the way they want to. DeployDactyl is open source. We would love contributions from the community. Send us pull requests. Send us questions. So that's it. Any questions for me? Yeah. So the question was, were there any third-party tools we could have used for this? There are a ton of third-party deployment tools. The reason we ended up building our own is we couldn't find anything that would do what we wanted to for multiple-pole foundations when we got started. I know the technology is progressing, and if you look at things like the demo from the keynote yesterday on the CF Better Push, that's going to be a really cool technology to see how this would fit into there. But when we built this out, there was just nothing else out there that would do the multiple foundations the way we wanted it to. So the question was, do we have data schema changes with deploys? No, we don't. That'd be awesome, wouldn't it? That would be awesome. We will take a pull request to do that. I think the community would love it. No, that is absolutely a weakness in the product right now. It's something we've looked at how we can do. We don't have it solved yet. Anything else? Question was, do we have any tips for configuration management? Check it in to get, or your repository of choice, do 12-factor and make that part of your deploy where people are building that into their manifest. And that's actually one of the beauties of the system is they can take that config and we guarantee that we deploy the same code and the same config across all different systems. Was the question on foundations themselves, or did I answer the question? So in terms of doing different parameters across foundations, we've basically told our teams that's not 12-factor and you shouldn't do it. We've had, I think, one identified use case where we were kind of on the fence about saying, yeah, maybe that's actually a good idea to do it. And it was integrating with a third-party tool and how that tool worked. But in general, we've almost always come down on, no, just don't do that. You might come up with some exceptions and you can always build in logic in your code that says, if I'm on this foundation, do it this one way. But one of the principles of 12-factor is you want the same deploy everywhere. And so we've said in general, if you want different configurations on different foundations, build that logic into your application. We've gotten some pushback from some teams on that, but that's the approach we've taken so far. Anything else? Cool. Thank you all for the time. I appreciate you coming.