 So, hi, my name is David Blank Edelman. I am a technical evangelist at a company called Epsera. We make an awesome container management platform that DevOps tend to love, but I'm not actually going to be talking about that today. Today I'm going to be talking about SRE, Site Reliability Engineering. So if you haven't encountered it, Site Reliability Engineering is an engineering discipline who attempts to engineer failure out of a system in an attempt to get a system to a certain level of reliability that you want. It is... Well, let's talk about why. Does anybody know what this is? This is going to change in a minute. Can you guys see in the back? This is what happens when a PHP app errs out. Right? So, the thing is, is that you're, you know, and I'll switch it to another language to make you feel a little bit better, but the thing is, is that reliability has to be like a core part of what you're doing, because you can have all the features in the world, but if in fact it just errs out, it's not useful at all. And that's why this is about reliability. So back in about 2003, Ben Trainor started to deal with this particular tug-of-war, this tug-of-war that says, on one hand I have these developers whose whole job it is to make things that have features and iterate, and then on the other side you have the operations folks who would really like things to stay stable so they could stay reliable. So they came up with the notion of Site Reliability Engineering. This is a book that was recently published by the Google folks describing the Google method about, about how Site Reliability works. And in fact, I'm going to take one of the pieces from this. It's a longer talk I usually give about this. I'm going to take one of the pieces. The thing that people tend to ask before I get into that is, are we just talking about Google? Are they the only people who are thinking that reliability is like a key core thing and having SREs? And the answer to that question is no. There are a ton of other companies. Here's a list of some of the companies. You may have heard of some of them. And if I can embarrass him, the actual SRE lead from LinkedIn is in the audience as we speak. So there are lots of people that are thinking that this is an important challenge. The other thing that I think it's important to mention before I tell you one of the things about SRE that's interesting is we have this notion that somehow that SRE or DevOps or is this like evolutionarily, evolutionarily upgrade from what we previously were doing. And it's not. It's just kind of a parallel track to the same sort of challenges. The best way I can talk about SRE is to go back to the keynote that was given at the first SRE con back in 2014 by Ben Treynor, the person who helped start the group to do that. And in that particular keynote, he showed a slide that had some 14 or so items. I'm not going to go over all of these because we only have less than five minutes. I'm going to take a piece of this. So the piece of this I want to talk about is the middle, the part about error budgets. The idea behind error budgets is you take a service and you decide how reliable does it have to be because chances are your service or your product does not have to be up 100% of the time. Let's say it has to be up 80% of the time. The remaining 20% is your error budget. And you're going to see how that plays out in sheer seconds. So what you typically do is you would sit down with the people that are writing this and say, OK, let's come up with a service level objective that says how are we going to measure how reliable this is? What does it mean to be reliable? How reliable does it have to be? Remember that 80% mark that we said a moment ago. Then the next thing you do is you plug in the stuff you talked about into some sort of monitoring system that you can and the developers can look at. And everybody can agree this is the source of truth. This is how we're measuring how reliable it is. And then we see, OK, we want to launch a new release. Can we release it or not? Can we do it or not? And so what you get to do is you look at your monitoring system. And if, in fact, you have been 80% up all that time during that quarter, go for it. You could perturb the system. But if you haven't and your thing has been down, has only been up 70%, well, then you might gate that release and say, no, we're not actually going to release that now. And the nice thing about this is you're setting up this sort of virtuous feedback loop, where everybody is agreeing, like reliability is important. And as long as we keep on using this sort of feedback loop, we get what we want. Another example of a feedback loop that shows up in DevOps often, which is at the bottom of this slide, is the notion of postmortems. The notion that you can do postmortems that are focused on process and technology and how they failed, but specifically not on how some person failed. Because here's another key hint to you. It is not possible to fire your way to reliable. You can't just go forth and say, if I only fire the person who made that mistake and I keep on doing that again and again and again, then I'll have a fully reliable system. No, you don't. You just have one person crouching in the corner shivering. So with that, I just want to say this is sort of a little tiny bit of SRE. I'm trying to try to put it on your radar so you think about it. If you want to talk more about it, I'll be here. I know that there is an SRE advocate from Google in the audience, so I just happened to just meet met today for the first time. So come talk. And if, in fact, you want to talk to me online, this is how you can reach me if you want to talk about container management stuff, I'd also really down with that. And I want to thank you for your time and your attention. It's great.