 Thank you for coming. My name is Ken McGrage. I have a slide for this in a second. I'll skip it. Well, what the heck I'll do it. I'm a technology evangelist for ThoughtWorks. I've been with ThoughtWorks for about nine years. And most of my job has been talking to people about continuous delivery. I'm very lucky, blessed, whatever word you want to use. And then I started at ThoughtWorks exactly when we were starting a bunch of the work on continuous delivery and actually worked for the product side of the company. Don't worry, no product pitches, I promise. I worked for the product side of the company, which has allowed me to see hundreds and hundreds of implementations and work with lots of different companies on creating continuous delivery pipeline and allowed me to get a lot of that kind of experience that I'd like to share with you here today. I'm going to start actually with some definitions. And it's interesting because for everybody that was in the keynote this morning, I would say basically everything Gregor said was right and you could probably just take that for the day. He did use a definition of DevOps and I actually had my own too. So what I'd like to start with, when I start a talk with definitions, excuse me, the purpose here is to let you know what I mean when I use a word or a phrase. It's not to try to get you to adopt it. It's not to try to say yours is wrong or theirs is wrong and mine is right. It's so that while I'm here on stage or if you're coming to the workshop this weekend, that when I say a word, you know what I mean. So for DevOps, this is my definition of DevOps. The key phrase there is I think that DevOps is a culture. I don't think you can do DevOps. You certainly can't buy DevOps. As I mentioned, I work for the tool vendor side of our company. Buying Go CD will not give you the DevOps. It is a culture. And it's about working together and everything. And that's why I love Gregor's talk so much. It's about a team that's working together to own the thing, to operate the system, etc. If you're so inclined, there's a blog there that actually breaks down each phrase and what I mean by that. But when I say DevOps, this is what I mean. A little test that I often will do, in fact I find myself doing this subconsciously, is I do a word substitution. So when someone says DevOps, I substitute culture in my head, even if I don't mean to. So that's why a lot of people will say, look, I don't think that DevOps engineer is a thing, or Gregor mentioned DevOps practice leads, etc. Because culture engineer is not a thing. A DevOps coach? Awesome. You know, etc. But DevOps itself is the culture. Continuous delivery on the other hand is what a lot of people think of when they say, I'm going to do DevOps. Continuous delivery is the technical practices that move software from your inventory, your source code management system, through a pipeline into production. And as Gregor said, that's where the work really starts. That's where now I get to measure things, etc. This delivery does a lot of things for you. A DevOps culture does a lot of things for you. I'll go into some of those in this talk, but primarily it's that thing that you can get it to people faster and so you can measure its effectiveness. So, you know, tests like I'm selling hotels, do I sell more hotel rooms with this change or less, etc. That's why we do these things. Okay, so now, why this talk specifically? As I mentioned, I've had the privilege of working with lots and lots of customers and hands-on. I was a director of engineering myself for many, many years. And people would say, yeah, I'm doing this thing. I'm practicing this delivery. And I'd say, okay, that's awesome. Can I see your pipeline? Can I hear about what you're doing and what have you? And what they would say is, yeah, I got condensed delivery and here's my pipeline. And when it's done, then I give my installer to the security team or the compliance team or the what have you. And so, in fact, it wasn't condensed delivery at all because I say that condensed delivery means that you can click a button literally right now. So, you get a call, there's an emergency, whatever. If you don't feel safe doing that, then it's not quite, you don't have, there's delivery yet. It might have really good automated build and test. It certainly don't mean to insult anyone's processes. But it's not continuous delivery if you can't deploy to production right now. So, this talk is going to be high level in that it's going to cover a lot of topics. My goal of this talk is to show you, here's a bunch of things that you might not be thinking of in a continuous delivery mindset. Types of testing, types of managing code, types of deployments, et cetera. Again, it's very high level and it's going to cover a lot. And so, as Gresh said, if you're looking for more hands-on practical and law of two feet, I will not be insulted if it's like, hey, I need to know which security test to run because I'm not going to cover that. I'm going to try to get you to do a security test. So, why continuous delivery at all? It's interesting coming especially to an Agile conference is everyone's familiar with the Agile manifesto, but not a lot of people have looked at page two, the principles behind the Agile manifesto. And in fact, when Jess Humble and Dave Farley were trying to name the book that they wrote eight years ago and they named it continuous delivery, this is where the name came from. It's the first principle behind the Agile manifesto is that the highest priority is to satisfy the customer with early and continuous delivery of software. So, it's interesting that I kind of see DevOps continuous delivery fulfilling the promise of Agile from many, many, many years ago. So, I love that we're seeing DevOps and CD tracks in Agile conferences now because this is really where it came from. And to that end, this might be very much reviewed for a lot of people, but I want to go over something that anyone that's ever taken a Scrum Master course, really any Agile 101 course has probably seen this series of graphics and Gregor kind of hinted to it too. When we do incremental work, we don't, you know, you don't do a painting like this. You don't complete the head and then add the shoulder and then add the other shoulder because there's not value to it, right? So, partially done is not useful. I love Gregor's analogy of inventory and I'm going to steal it from here on if you hear Gregor. Because it's not usable until it's done when you do it this way. And so, is it incremental? Yes, but it's not deliverable. It's not a minimum viable product, et cetera. So, what we really want to do is this. And I did, I went back into our training materials and grabbed this slide from a 15-year-old Agile deck on why we do iterative development. So, what this is allowing you to do is do the iterative development that you've come to love and should, but then also be able to deliver at any time and get at least some value. Find out do people like it, et cetera. That's one reason that we think delivery is important. The other one is the ability, another one, there's many, is the ability to respond to security issues. These days it's not if you're going to get hacked or if a library use is going to get compromised or what have you. It's when for sure. And what you want to be able to do is react quickly, though. So, this one's a little bit old now. It was Heartbleed. It was a vulnerability openness library. And it affected a massive amount of, especially web software. Lots and lots of people were affected and it took people, in many cases, weeks to get fixed up there. So, they were vulnerable for that time. And frankly, it wasn't very hard to exploit. People that had mature continuous delivery pipelines were able to update their infrastructure because that's also part of the pipeline. That's code and we'll get into that a little bit. Say, oops, I need the new openness library. Runs all their tests because every pipeline runs all the tests all the time and get it to production very, very quickly. So, we had projects that we got the CVE, the announcement about Heartbleed and new start points in production within an hour or so. And you can do that if you have something that's deployable all the time. Anybody know the story of Night Capital? I'd love to be the first one to let me tell you. No spoilers. The other one I'm going to get to at the end. Give you a hint that has to do with risk management. I want to talk a little bit about continuous integration. And as you probably got, because I did a definitions thing, I think words are important. I think when I say to a co-worker that we're doing CI or continuous integration, there needs to be a meeting of the minds of what that means. There's a lot of people that say, hey, we're doing continuous integration that by the definition really are not. And again, might be okay. ThoughtWorks has a thing that we call the tech radar and in it we put practices like a bunch of different stuff. And we have a category where we say, on hold, don't do this anymore, et cetera. We recently had to add a section called CI Theater. It was the illusion of doing continuous integration. When you're not actually doing it. So I downloaded Go, or I downloaded Jenkins, or I downloaded Bamboo, and I set up a pipeline. It's always green because I'm running four tests. So I'm doing CI. It's like, yeah, not really. There's a lot more in the tech radar, but we really want to avoid this. The one that scared me the most is my division actually did a follow-on study. We did a study, and I'm not going to say it was purely scientific because it's like our Twitter audience and our followers. Bit of a bubble, if you will. In our study, only 10% of the participants acknowledged that having a CI server was different than practicing CI. It's like, yeah, we do CI. We have Go. We have Jenkins. We have Bamboo. We do CI. It's like, no, no, products don't actually solve problems. You know, the tools are important, but they're not the solution. And so it's important. So I'm going to go through a few things that we think are, I think, are core to doing continuous integration. The first category is code management. I'm going to go through this one at a high level, but there's actually a talk this afternoon about this particular pattern. I haven't seen the talk, but it's an important talk. Feature branching. I hate it with passion. Okay, so here's the issue with feature branching because I understand why people do it. I honestly do. I'm working on a new feature. I don't know if it's going to even go this way. I don't really know what it's going to look like. We're still kind of figuring it out. The developers are the architects, et cetera, et cetera. So we want to be safe, and we want to make sure that if we take Trunk or Mainline right there in the middle, that I can still deploy that if I had to. So what happens here is you have Professor Plum and Reverend Green, and these are literally stolen off Martin Feller's blicky. When you're Martin Feller, you get to make up words. Not a blog or a wiki. It's a blicky. And I want to say this slide's nine years old. But at any rate, this is a common pattern for feature branching. You have Professor Plum, and you have Reverend Green. And Professor Plum is working on her branch, and she's doing and commits to her branch and everything else, and doing great work. And there's a bug fix made on Mainline, so that gets pulled in. And meanwhile, Reverend Green's working on his branch, and that's working on. It pulls in that same bug fix, and everyone's working, and everyone's fine. What we have here is a literal grace condition. They both want to be first to get back, because what happens is, when Professor Plum finishes her feature, and she now merges it back into Mainline, her merge is fine, because she was pulling the bug fix the whole time, right? Reverend Green does his next pull, and I think the technical term for this is big ball of mud. There's all these merge conflicts that now have to be fixed. Well, hey, wait a minute, Ken, there's lots of merge conflict tools out there. These are really easy to fix these days. It's easy to fix the text differences. It's not easy to fix the intent. It's not easy to know why did they change this thing that's now conflicting with the thing that I worked on, you know, et cetera. And so it just, it's really not friendly. So what we would rather see you doing when you practice continuous integration is pushing your code to Trunk or Master every single day. I would say if you're not doing this, you're not doing CI, full stop. You might be doing really good automated building tests, you might have, you know, great scripts, et cetera, but it's not continuous integration, because that word means something. Now, in this world of distributed version control systems, I'm not saying you never make a branch. If I'm going to work on something, the first thing I do is get, check out, that should be branching, right? But we're still pushing back to Master or Trunk every single day. And our continuous integration server is watching that. You know, sometimes it might watch a branch group another reason or what have you, but we're practicing what we preach here and going into Trunk every single day. When that happens, your continuous integration server is running all the tests all the time, every single time. Okay? The purpose of a continuous delivery pipeline, I'm going to make a slide for this, but the purpose of a continuous delivery pipeline is to kill a release candidate. Okay? It's to prove that the commit you made is not good enough to give to your customers. You can't prove something's good. Okay? A green build doesn't mean it's good. A test test. You can do that just by not having any tests. We're not having very good tests. So what we want to do is we want to run all the tests all the time, and I'll get more into how to do that. But what we want to do is we want to prove, nope, that didn't pass our whatever test, and so it can't go any further. That's what the CD pipeline's for. Now, in order to be effective of that, when the build is broken, it has to be fixed immediately. So if you have a stage in your build that's always red, don't worry about it. We're using an API, this third party vendor, and it's a little bit flaky, and the network goes up and down, and so we kind of expect that stage to be red. Let's not worry about that too much. Yeah, no. Because there might be other things happening there, right? And so you're doing the thing, and you think it's fine, and you commit my code, excuse me, and that one turns red, but I'm just going to force it to keep going because I know that one's red. And meanwhile, a week later, you find out, why is it working? Now you have a week's things to go through, et cetera. I would much rather you remove a flaky test. For me, a test that's flaky is worse than no test. OK, so pipeline is a little bit of an overloaded term. Think of the pipeline is just the way, because again, I like to define the way I mean something, is it's the workflow, or the value stream, however you want to look at it, of how code goes from source code management to production and beyond, maintenance, et cetera. And it's very common for people to create pipelines that are very linear. So I do the thing, I run a unit test, I run a functional test, I put it on staging, and then it goes to security or compliance or production or what have you, and now we have to do a change management board, et cetera. I would like to convince you, if I can, that there are pipelines, there's types of automated testing that you can and should be doing as part of your normal pipeline, of course, having a conversation at dinner the other night and talking about the word DevOps and it leaves out security and leaves out QA and what have you. When I think of developing the application, I think that's all of the developing the application. That's coming up with the idea and writing the story and writing the code and writing the test. That's developing. And that needs to include all of the tests. So, security is one that is very uncommon to see actually part of the pipeline. Often this is because of external security teams, lack of expertise on the team. There's valid reasons for it. I'm going to show you some ways to try to overcome that. But I want to encourage you to include some of these things in your pipeline. First off, there are tests you can do before you even commit. So there's tools out there and I don't want to advocate any specific tools, but Q&A afterwards, if you want to come up they can ask for them, whatever you. We don't make any of these, but there's tools out there that do pattern matching on your commit and say, ooh, that looks like an AWS key. Nope. I'm not even going to allow it to hit GitHub, because if it hits GitHub, you're dead. Once it's up there, you have to kill it. So there's things you can do on your local stuff in some of your unit tests, etc., before you even commit. Obvious stuff, static application security testing. Actually run the thing, run penetration tools, etc. Now, I want to be clear here. I'm not in the camp that says I think you can automate most of the things, but security especially is an area where there are people who specialize in this that a good friend of mine, Jason, who sits in a dark room and goes after your application and he's going to get in. And I can't ever write a test to be Jason. So I'm not saying that these people don't exist, but if we have good automated testing, we can take that out of van. We have that not being a blocker. They could be working full-time, doing penetration, doing the thing, and when they find something, now that becomes, you know, the next high-priority story or whatever, but there's lots of things you can do, automate. If you look up OWASP, there's lists of tools there that you can use. There's lots of open source stuff. There's commercial tools. There's lots of things you can do here and you really need to be. One of the most basic ones that I almost never see and Sonotype actually did a study. They call it the value stream. So Sonotype is a company that runs Maven, Nexus, and so they see all the Java open source stuff. And they did a study, and I'm going to be probably off a little bit on the numbers here, because I'm weird about not putting notes on here, because then I get fixated from the notes. I want to say they said the average open source Java project was something like 120 libraries and 21% of them had known vulnerabilities for which fixes had been released. But the Maven POM file specified a version that was vulnerable and nobody knew it. There are tools that will scan that. That will just say, nope, you're using an old gem or using an old POM or what have you. So it researches those. I encourage you, again we can talk after we get more. Don't have time to go through all of them. This one I almost never see in a pipeline. It's actual performance testing. Who here has never I'm going to do this because I know nobody has to raise their hand. Who here has never gone on their computer, gone on their browser, gone to a site and sit there waiting for it to load and say, this one's too slow, go into a competitor. Okay, don't even bother raising your hand because we've all done it. Performance testing will lose you business. Amazon and others have done studies about this. It's amazing how fast. It's measured in seconds that people are like, whoop, see ya. So, you know, do load testing. What can I actually do? What can I think thing? Stress testing. So, throw a higher load at it and see where it crashes. And you measure your risk. It's like, oh well, okay, if we get famous then that's going to be a problem. But that's okay. I'm not worried about that. Now these are the ones that I, again, almost never see. So, a soak test is something that will run the application for an extended period of time and say, hey over time performance is degrading for this or that reason or the other reason. But nobody wants to put these in their pipelines because they slow the pipeline down. Everyone says, well no, then it takes me three days to get to production because I have a soak test that runs for two days. Not to mention they're expensive and so forth. And so, you know, I'm not going to do that every time. We'll just do that when we can. There's ways around that. And then the other one, I wish the rest of the material was going to tease it because the convention site went down yesterday is spike testing. If you're in an industry, you're going to get compressed, what have you, throw tons of stuff at it, just all of a sudden see what happens. These can all be automated. Now, I'm not a believer that deploys per day and how fast I can get to production and those kinds of things should be a metric that drives behaviors. Unless that's important to your business, okay? There's a slide that a well-known speaker does that says something like, you know, hey, we did deploys per day, congratulations. No CEO ever. They just don't care. If that is a thing for yours, then that's great. But that said, there are ways you can do, you can run these pipelines that you can still get very fast response. Okay, so any modern continuous delivery server can do parallel processing. And by parallel stuff here, I don't mean, you know, two tasks in a Jenkins job that they all finish. The job's done. I mean completely separate pipelines on completely separate environments, potentially with completely separate permissions. And I can run a lot of these things in parallel. And you can decide where you want the cutoffs to be. You can decide which of these might require a manual approval for reasons, which are these you want to be fully automated and so forth. So this is just one very, very fictional chart. But the idea here is we have some unit tests here. I love it, the unit tests of confidence, that's all there for. You have your unit tests and those pass. So now I'm going to take that thing that I just built. Okay, I'm going to pick on Java, because Java. So I pick a jar. And my unit tests have passed, so I have the jar. I store that jar in a repository. I'm not going to rebuild it ever again. I'm going to take that jar then, and I'm going to put it on environment that starts my functional test. This is very normal, lots of people do that. But what they then do is they take this chart, and I should have done one, and they do it very linear, that when the functional tests are done then it will run the load test. When the load tests are done then they'll run the spike test, et cetera. I don't want to do that. I want to put it on three environments simultaneously. In this lovely world of public cloud, this is actually pretty inexpensive. It's not free, but it's inexpensive. I can run things on multiple environments simultaneously. You can do things in dependency management. This is commonly referred to, excuse me, in Pierce delivery as fan out, fan in. Notice the top three there in the second column. I do a unit test that fans out to three different pipelines. I need to have logic in my system that says if and only if all three of these pass, then we're going to run staging. If any one of them fail, and again these are not just tasks in one job, so we actually have to check for this. If any one fail then I'm not going to go to staging. Staging in this diagram, the purpose is to test a production like deployment. My staging system looks as much like my production system as possible. If production is a cluster then staging is a cluster. It might be a different size, it might not be, etc. But it looks as much like it as possible. It's going to run the installation of the application, or the deployment of the application in exactly the same way that production is going to do it. Its purpose is to test the deployment. That being the case, I'm going to say I'm going to allow that to happen even if those longer running tests, like the stress tests and soak tests are not done. Because again, I want to test that as often as possible so that when I do go to production then we know we're good. Short little story. It's going on 11, 12 years ago. Our company thought it was on a project in England, and they were working on a project and again I'm picking on Java. But the project was the developer and the machines were Windows machines and they were building a Java application and the deployment target was Solaris. And Java runs the same everywhere, right? In the project plan, in the Gantt chart they didn't have Solaris hardware until several months into the project. So they're writing code and they're doing the things and so forth. They finally got Solaris hardware, no problem, ran the deployment, put it on Solaris, it didn't work. And I don't mean there were bugs but there were things they were doing in the way that it accesses the file system that simply did not work on NFS on Solaris. And so they had to scramble. They wrote a tool called Conan, the Deployer, and went and literally stole a Solaris box from some other office and then told them, hey, we stole your D450 and put up a staging server. The purpose of it was to test the deployment. Every time they had a good build they were going to test it on Solaris and make sure that it deployed. Two of those people, Jess Humble and Dave Barley, went on to write a book called Use Delivery because they were really not happy with what happened. That book would not exist if they had a staging server and a weird twist of fate. And I do want to make the case that this is not fictional. This is a real pipeline from a real project that's in our office here in Cora-Mangala. And what we see here is multiple pipelines running in parallel and notice that this particular one is an open-source project and when a person does a full request to GitHub that kicks off the pipeline. But there are also security scans and tests and those kinds of things that are not part of that developer team. Question in the back? Pardon me? I can't hear you. What's not visible? Oh, the text. I know it's an eye chart. You can't read it. It's a little bit on purpose because I don't want you to be saying, these are the pipelines that should be there. It's to show you that it's complex. Yeah. I have a weird thing. If you come to my workshop on Sunday we're doing a hello world application. People are like, why aren't we deploying a real app? Because it's not about the app, it's about the pipeline. The point here really is that you see the boxes. So in this case, the squares are pipelines and the circles are Git repos. But thank you for that feedback anyway. Maybe I'll... Well actually, wait a minute. Is that one more readable? It's a zoom in to the side. But the point is here there are a few different things in a different order which is determined by the team that's doing this particular application. And some of the things are out of band. So like, if I wasn't logged into this system as an admin when I did the screenshot some of those pipelines would not be visible to you. So the reality is that we would love it. We being, you know, the DevOps people. Whatever that is. Would love it if we were all really self-organized teams and we all fed ourselves with two pizzas all on Amazon and we really did control everything. But the truth is we have compliance departments and we have security departments and we have those kinds of things. And so they can put their pipeline in parallel with yours. Your jar passes. It comes out, runs their cast that almost set a product name. Runs their things, makes sure that compliance is good and everything else. And the triggers there say that they can't go any further if either one of them fail. And so it's completely possible and frankly mainstream that you can do these things and you can include those departments sooner in your development life cycle. I would love it if they were sitting on your team. If there had been a Solaris admin sitting on that team in London saying who you do that and that ain't going to work then it would have solved the problem, right? But that's not always reality. I would love it if they're on our team but if they're not still encourage them to say that's great but let's automate as much of your tests as we can and bring it forth in the cycle. A woman by the name of Joanne Moleski co-wrote a book called Dean Enterprise she's an auditor by trade and she's like her favorite thing as an auditor is to go sit with the dev team for a few weeks to understand the risks and everything. She's like here's all the things I have to test if I don't know what you're building. If I go sit with you for a few weeks and I learn more about what you're doing then I know what I have to test and we all get along better. If I don't know what you're doing but if I know what you're doing and why you're doing it and what the risk profile is and which system is hitting et cetera that's not a three hour status meeting that's embedding with the team for a few weeks and I kind of get already the spoiler here but you decide the order. I'm showing you charge and this is one reason why it's not readable is I don't want you to get fixated and say okay that's the order Ken said he should do it and he's been doing delivery for 400 years so no. You decide the order based on your risk profile and when you need feedback because you want the fastest possible feedback but again any modern tool can do this. Okay. So I've been ranting about always being able to deploy right? Deploy right now. But user stories don't usually take six minutes to complete. So the truth is there's a lot of times when work is in progress I'm not done with that story. If I hit deploy the thing's going to come out and it's going to present a user experience where they're going to click a button and nothing's going to happen and that's bad. There's lots and lots and lots of ways to deploy and complete work. I'm going to go through just a few of them. One of my favorites is a concept called feature toggles and again this is not a new concept. This particular screenshot again, so on from Martin Baller is several years old and basically what it's saying is I'm going to create a new feature that has an impact to the user experience changing the UI. The first thing I'm going to do is write some kind of definition that turns that feature off. Now that could be a simple text file people do these in databases there's SAS products that do this etc. But it's just a little thing that says if we're in development it's off or it's on, if we're in whatever then it's on then it's on and then you do your UI and so what happens is when you run that program you load that web page whatever on anything other than your development environment it looks like the old one because it's turned off and what you can do then is you can turn it on when you want to test it or when it's done. Fast forward to about a year and a half ago another colleague updated the article and goes into a lot more detail and I've tried to include references here and I know they're going to distribute slides later because I know this is pretty high level I highly recommend this article talking about these feature toggles because feature toggles are a great way to deploy incomplete work as I talked about but they also have a lot of other uses and this article goes into quite a bit of depth about different ways to use them and I can test markets I can feature toggle programmatically and say I want to see if I turn this on for 10% of the audience in Bangalore do my hotel reservation go up or down and they went down and turned it off oh they went on so there are a lot of ways to do things now feature toggles are tech deck so you're adding code that once you're done it needs to be cleaned up so be aware of that people want to do DevOps because it's faster it's not faster it's better, you'll get faster feedback but completing that user story is not faster, it's slower you're going to do more automation you're going to do things like toggles you're going to be cleaning up tech deck it's better overall for the business that's why a lot of people like things like lean value stream mapping and so forth I want to look at the entirety of the process not one task but these become a lot of value you can see here that we have a release toggle so I want to differentiate deploy from release deploy especially in the world of web software is taking the software and installing it on the machines and making sure it's running etc release is making it available to customers you can deploy software onto running machines have routing rules or whatever where they can't actually see it yet have feature toggles where it's turned off when it's deployed I can run a test, verify the deployment went ok etc and then flip the toggle and now it's released a lot of our products everything is toggled in our project branches are strictly forbidden and if you there's actually a lot of them that you download if you download go there's a big file you can go in there and start playing with toggles you're going to break stuff but they're there there's also things like ops toggles, permissions, experiments etc so a lot of the web scale companies the netflixes and the facebooks and what have you they have toggles in there for things like performance degradation so there's a denial of service attack performance is really bad right now etc they'll go in there and turn off certain features that are less used or not important to their core business my favorite is like this netflix the one where you go watch movies they turn off their recommendation engine under periods of high load if you're watching the movie at home it's not going to change but if you go log out and you say recommendations it's not going to give you any or they won't be refreshed and that's a toggle they can go oooops we're in trouble but it's not visible to you so that's for a one way this is one way for user facing things but what about back end things so a lot of times you have things that you need to upgrade let's see what's a good example well so we have a project that's Ruby on Rails and it's 10 years old so it was a 10 year old version of Rails and they wanted to upgrade Rails or LibraryX picked your library now how do I do this, it's not on a branch because now I have all kinds of things there's an idea of branch by example a branch by abstraction and I think that blogs from 2011 it's delivery.com which is jazz humbles website basically you have your consumers that are using this service, this library, whatever it is component and they're pointing to the component what you do is you stick an abstraction layer and it's now talking to the component and then you add the new component next to it and you can go back forth and you can talk to them that way and so what it allows you to do is incrementally move over to the new component the new functionality, the new version of Rails while not degrading anything else and if I make UI changes they're still everywhere it's a really good way to do this this was important for one of our products because they were in the middle of that Rails upgrade and yet they still could update open SSL, click the button and deploy right now because it was safe if you downloaded it and you turned on Rails 4 which I think it was at the time it would have broken horribly but it would have worked if you changed the defaults this next part I put in managing risk it almost could have been in the part I just finished it's an example it's fairly new and again homework I'm going to encourage you to read the associated blog GitHub engineering had a problem the merge commit client the merge commit library inside Git was really not very performing and for practicing CI we got our own branch and we're merging every day, all day etc merge happens lots and lots and lots and yet it was really just not very good but it's also a feature that their users are using all the time so I can't break it I can't count on the new version and risk that something is going to go wrong, merge has to work so what they did in this case they used a tool called RubyScientist and if you were using GitHub during this period and you did a merge what it did is it called the old merge and that's the one that actually merged your code and went into your repo and your other folks pulled but at the same time they called the new library and they did a bunch of metrics and testing and etc on that and we're able to do this thing so they were testing in production at that scale because it was a performance issue testing a laptop doesn't do any good you have to test it at that scale so what they ended up doing, what they ended up finding and again I don't expect you to be able to read these but the blue line is the old client and the green line is the new client I don't know if I can zoom that in at all but what they noticed is that at first they were a little better and then it got about equal and then the green one got really bad in some edge cases that they figured out if you care if you had merge conflicts that were multiple of 256 if you had 256 512, 768 merge conflicts the old library would not highlight that there were any conflicts it would just merge it and tell you everything was okay but you know in such an edge case they didn't find it right but they actually did find it here but if you could see this chart what you see and it's on of course on the blog there is that at the end it was orders of magnitude faster and now they were able to go back into scientists to make it the default and everybody was happy and we didn't notice but they got to save tons of money on hardware and processing power to not have to do all those merge conflicts with the library that just simply wasn't very good I'm going to breeze through this part because Gregor did it better justice than I'm going to but for most of our projects we really should be optimizing for MTDR for meantime to be covered when something breaks I stressed the word when how long does it take me to recover okay now I joke here and I say hey I'm on airplanes a lot so if you're working for Airbus or Boeing or something please do the latter many time between failure or the former in this case because there are things there are systems that you do that we don't want to fail okay but you probably want to optimize for MTDR it's a really weird thing that if you optimize for MTDR the MTBF actually gets better too there's a lot of stories about that but you need to optimize there now I want to be careful because anybody who read the state of DevOps report comes out every year or two it's good reading it's a study they went out and interviewed 30 some thousand people over multiple years and they said okay what are you doing and what's your reaction time and what's your MTDR etc and one of the metrics they use which again I'm not crazy about and in 2016 it came out and it was a certain baseline we'll call that the baseline 2017 the study came out and what they called the low performers the people that were only deploying a couple times a year were quite a bit better they were now deploying weekly or monthly or what have you and so yay improvement right but their mean time to repair actually went up metrics drive behaviors they were now measuring on deploys per day not actually the quality of the deployment okay and so be careful that you don't go to the wrong metric what a continuous delivery pipeline will help you do one of the main things is recover quickly it's not there to say look at me I can check code it's in production in three minutes that's probably not a thing so let's be careful there from here on some of this stuff is especially if you're new to continuous delivery if you're not doing this yet if you're not actually deploying things automatically to production etc some of this might be aspirational but I want to cover it because it's an important value that you'll get from having a solid continuous delivery pipeline just a couple of patterns you might have heard these terms around canary release is and I kind of hinted at it earlier this is where I do a release to a percentage of customers or percentage of end users and I test it and by test it I don't mean does the login form show up you know is it expiring passwords I mean those are all important tests testing the business impact did I sell more hotel rooms ok and then you can make business decisions on that it's a great way to do that all the web scale things do this partly because they have to but it's a good way to go out and see is this working for its intended purpose so you know acceptance criteria story if you will if it's not making our work better then let's take it out let's kill the epic you know whatever the analogy is that you want to do it's other one is really cool it's really um it's only for certain cases it's the idea of dark launching so the most famous story and again there's a read me there informant article Facebook messenger so Facebook messenger is the real time chat where you can chat to your friends on Facebook how do you test that at scale I mean Facebook has hundreds and millions of users so they did this thing called dark launching basically the features minimum buy over product you log into your Facebook account and some JavaScript in your browser sent a message to a certain percentage of your friends you didn't send it the browser did when they logged in their browser got the message and a certain percentage of them responded and they had all this math where some of them were long conversations some were short and some never replied and some were ignored and some were sent to people that weren't your friends and so they needed to be not accepted et cetera et cetera et cetera and so Facebook messenger for months before you were and so when it came time to release it again not confusing deploy with release what they did first is they flipped the release toggle for Facebook employees which is a lot of people and now they could that's why I put the cap on everything then they would they went to a little bit bigger and a little bit bigger and eventually everybody had messenger and there were some feedback and so forth but there were no performance issues I mean that's a massive feature but it's important to deploy to that large of an audience and not have issues and dark launching is how they did it now one of the other things that your continuous delivery that we'll give you and I've hinted at it quite a bit there is feedback on what's going on part of that feedback are all of that feedback really should be consumable so when you're doing feedback loops when you're giving feedback to the development teams to the business analysts to the people make it useful please it's you know couldn't connect to LDAP blah blah blah blah blah blah it's useful so for the developers in the room or people that are creating these logs your ability to react I can have continuous delivery pipeline that makes it so I can deploy to release right now but if it's not getting useful feedback it doesn't do me a whole lot of good so please do that same with alerts so there's a thing that with DevOps continuous delivery now the developers carry pagers but it is that people get alerted in certain situations please make sure that those situations merit an alert if you have a cluster that has 1000 nodes and a physical piece of hardware goes down and 50 of those nodes go down at 3am that is not a critical error it's just not it's not going to affect your business at all and so leave it send an alert automatically at 8am again there's tools and software and stuff out there to do that but if your people when they get that fatigue and I do talk a lot about burnout burnout is a huge medical problem in our industry and a lot of it people are afraid of DevOps continuous delivery because they're afraid well now I'm going to get paged on the weekends and everything else make sure that they're useful and I talked a little bit about running some of your testing production now there's a thing called the Gartner hype curve this is my hand drawn version of it I want to be clear things are going to get worse before they get better you're going to have to learn how to do feature you're going to have to learn how to measure your risk etc the point where it goes up first they call that the peak of expectations and then there's the trough of disillusionment and then it goes back up so understand this is like any other big change this is like going down for the first time etc don't expect that it's a miracle cure it's not you can't install a new delivery server you can sit on the beach by the way I need to go to go out someday so it is going to get a little worse before it gets better but it is going to get better it's going to get a lot better I will finish with I started the story about night capital night capital is a trading firm they did low volume high value stock market trades and they had a whole lot of tech debt I won't go into the details but you can google it if you want they had a whole lot of tech debt in there they hadn't been cleaned up and in August 1st 2012 an engineer went in and did what the engineers did and they manually copied the new release to 7 servers and started them up and made sure everything was good and then left well the problem is that they had 8 servers and one of the toggles that they had and it wasn't actually a toggle I think it was a stored procedure one of the names was reused and that 8th server and started buying stuff at ridiculous prices and in almost exactly the time it took me to do this talk they lost 440 million US dollars a lot of market cap of 400 million US dollars they were gone and it was because they didn't have automated processes that would have caught that if that had been an automated deployment and the deployment to staging had used the same scripts that were going to do the production things might have happened so cause and effect is a really nasty thing but please don't do this so summary again it's using words if you're in your organization and you're trying to get other people I like the missionary type thing if you're saying to the other team hey we're doing continuous delivery come take a look this is what it means if you're doing really automated build and test that's good too they should come look at that you don't have to use my definition Gregor's definition you do need to use your definition if you say unit test to a co-worker they need to know what you mean good CI habits are the key you cannot do a good continuous delivery pipeline if you're not doing solid CI feature toggles with oh I'm over by 6 seconds thank you very much I'll be around all day I actually have a workshop on Sunday too where we're going to create one I don't know I went a little bit minute over I don't know if there's time for questions or not sorry I'll be around if you have any questions but I went into