 OK, so I actually wasn't aware that this was going to happen, but in your bags, you got this book. And so I thought I'd start with this book anyway. And there's a quote from me on the back, that this is the IT swamp draining manual, if you're up to your knee deck and ablit alligators, which is, but the thing about this book is that it talks about a company that's been overtaken by events. It's a company that's just like a regular manufacturing sales company that years ago, they used to, IT was a small piece of their business. And it gradually gets more and more significant. They don't really deal with it. And it's about the transformation they go through as they try to deal with the fact that IT is now central to their business. And so why has this happened? And so I'm going to actually pick up on that part and talk a bit about what has happened to the industry over the last 10, 20 years. We used to have, this company used to deal with post terminals. And they had employees. They had a few thousand post terminals maybe and a few thousand employees. And once a week, they updated the advert in the local paper. That was their IT had to go deal with that. And IT probably didn't even have to deal with the advert. But they went from post terminals to an online presence. And now they've gone from thousands of post terminals to millions of sales, millions of web transactions. They've gone from thousands of employees to millions of customers. And all those customers want personalized information and all kinds of things like that. And their advertising has now moved to real time bidding on the latest stuff. And depending on what the weather is, you'll put up a different advert and all those kinds of things. So this is the same company, even operating at the same scale, over a 10, 20 year period. This is like a three orders of magnitude increase in the stuff they have to deal with. They've gone from thousands of entities to millions of entities. So that is happening across the world as people are trying to compete in this world. If you don't track that, you'll go out of business. So over the years, they started off with a mainframe scale system that was just doing payroll or something like that, a bit of manufacturing. And then they did some client server. And then maybe they went to commodity. And now they're trying to figure out web scale cloud stuff. So that's the arc that people have been on. And a lot of the mindset that you start with was set in the early days. So if you started off in a mainframe world, then you're used to running reliable hardware. Mainframes just don't fail. That's the point of a mainframe. And it runs stable software. You spend a year making your software really solid so it doesn't break. And this is the base assumption of a lot of IT. And the trouble with this assumption is that it's kind of fading into the distance. And what do you do now? These assumptions aren't valid anymore. And what I'm seeing is a lot of people looking around for, well, how do we deal with this world? And I've been going around giving lots of talks to people. And this is kind of a summary tweet of what I generally, it looks like, sort of baffling later doctors as a service. It's like, we do all this stuff. Oh my god, we couldn't possibly do that. It's been the reaction. I mean, three years ago, like four years ago when I first started talking about Netflix, the reaction from the cloud community was you couldn't possibly be doing what you say you're and then eventually they agreed we're actually doing it but we're a unicorn and no one else was going to do that. And then it moved on to, well, actually, I guess people will eventually try to do that. And in the last year, I think we've seen major brand name companies you've heard of that are at all really, the bleeding edge technology companies are talking to us and talking to other people that are doing web scale about how, just they're in the middle of doing it now. I mean, it's no longer we're a unicorn out in the wild. We're like, well, how do we do that? Because if we don't become agile, continuous delivery cloud, if we don't get on top of that, we're not going to compete. So what happens? Well, if you're running at scale, scale breaks hardware. If you have enough hardware, it's scale. So it's assumed you have a thousand times more hardware than you used to have. Some of it will be broken at any point in time. If you keep deploying hardware, eventually you hit that scale. And if you keep changing your software, effectively speed breaks software. So if you're continuously updating your software, occasionally you'll push something that's broken and the dependencies you have, some of those will be broken because you haven't spent a year tuning and testing it, getting it perfect. It's better to get it out faster. And if you do speed at scale, it breaks everything. And this is the kind of, this is web scale. This is the world that we live in. And even if you're a small company, if you're going fast enough, then you will continually be breaking your software and breaking your systems. And you have to build systems that assume that everything they talk to could be broken. And how does this apply to Netflix? Well, there's this strange world of the future stranded without video. No way to fill their empty hours. It's snowing on the East Coast and the kids are home from school. It's very important that those iPads have Netflix on them. And we don't want to end up with the cloud of broken streams. So what are we doing about it? The way we think, I'm gonna talk a bit about availability. I've talked a bit about the scale and then I'll get into the speed part. How do we think about incidents and mitigation? So the worst thing that can happen is what eBay used to call the CNN moment, which is your website's down, but it's on the news. People who have never been your customer and know that your site's down. That is a PR event, right? So we want to avoid those. That's media impact. The next worst one is that customers are calling customer service. It didn't make it to be a PR event, but the customers are annoyed. And if a customer has to call customer service, they might quit, right? So you don't want to annoy people. The next one down is that customers have no idea that something's broken, but you're not actually giving them the feature set that they're supposed to get. And if you're doing lots of A-B tests, you're giving people different sets of features and if they didn't actually get the feature they were supposed to get in that test because the test was broken or something else broke, you need to understand that. And those cause issues internal to us. And then fast retry, I'm sorry this request took 100 milliseconds longer than normal, you won't notice, right? But this service just auto-scaled down and that machine's no longer there and when you tried to call it, sorry it's not there anymore, fell over to the next one, keep going. That kind of thing happens every time we auto-scale up or down or do a code push. So this is sort of order of magnitude. Hopefully there's a single digit number of PR events a year and working the way down sort of a 10X on each scale here. So what are we doing about this? What we're doing about the top level ones is that we're doing active-active which is we're running the whole of Netflix, East Coast and West Coast. This is what we've been working on all year. We'll be talking more about it at re-invent in detail but I'll just show you what that looks like shortly. And we're doing a little bit more practicing of flipping over between these things and something that we really need to get better at is game day practicing and the way to think about that is it's really annoying those fire drills in your building where you have to go stand in the parking lot for 10 minutes every six months except when their building is actually on fire and you're in the parking lot and you look around and all of your team is there. Yeah, that's a good feeling and there is nobody in a smoke filled room somewhere in the building. So those practices are annoying but if you don't practice them then when the thing really happens no one knows what to do, right? So you have to figure out how to set up something that looks like a game day practice. Otherwise if you build an incredibly reliable system when it does break no one knows what to do and we've had that happen a couple of times. Better tools and practices, just tracking all changes in the system. Every time you change a dynamically changeable configuration variable we log it into a central place that you can see that just after that everything went to hell on a hand basket so it's probably that change turn it back that kind of stuff. And then better data tagging. So if you tag the experience everyone gets then the person that actually was using the site at the time this feature wasn't working you can say no they didn't get that experience so you clean up your data tagging. So that's actually keeping your big data stuff clean actually keeps the internal people happy. So this is the kind of stuff we've been doing. In the architecture what this looks like you have customers, we have East Coast, West Coast, it's lots of Cassandra, there's triple replicated on the East and triple replicated on the West. We have DNS routing traffic back and forth and if stuff starts going wrong say this service in this one this zone goes wrong we don't care it just keeps working some more services go wrong. This is all still a perfectly working system they won't know customers won't notice anything. We lose an entire zone that's a whole data center worth of stuff it'll keep working we've had that happen a bunch of times. If you lose a whole region we go okay well we need to turn off traffic to that region but half the customers were okay and we can switch the other half over in half an hour or something. So it's a minor outage rather than being a like all over the news outage. This is the plan. If one of our DNS vendors dies we switch to a different DNS vendor so we've kind of abstracted all that stuff together. So that's kind of web scale architecture making it highly available. This is very scalable, it's globally scalable, it's scalable in terms of traffic we're doing ridiculous amounts of traffic into this thing. So let's move on a bit. So how do we speed up delivery? You have your CIO saying speed it up we need to go faster. And so what does that really mean? Why do you need to speed it up? Well you're competing with other companies and those companies are going you know you're in kind of a dogfight and if you go to the you know one of the ways of looking at this is the Uda loop some of you probably heard already. This is from the Korean War the pilots that came back from the dogfights were the ones that could think faster and react faster than their opponents and you want to come back from a dogfight because the alternative is you know you're plane crashing or you didn't make it back. So what does that actually look like? Observe-Orient Decide Act that's the loop we're trying to get round and from a business point of view the observed part is you're looking for a land grab opportunity you're looking for a competitor made a move you want to react to that or you spot a customer pain point you want to do something about it. So okay so you figured out something that needs doing then you're going to do some analysis and you're going to model some hypotheses and your hypothesis-driven development then you buzzword watch out for that one. Has a quite, you know, read an early version of Jesus' book. And then you're going to plan a response get by and commit resources, okay? How long does it take you to do that? Just to decide what you're going to do or if you're going to do it then implement deliver, engage customers with what you did say hey look we did this thing you're going to point customers at it and finally you're going to measure the customers now you're going around this loop again this is your feedback loop. So the question is how fast can you go around that feedback loop and can you go around this faster than whoever you're competing with in whatever market you're in? That's the trick. So let's look at the evolution of this. So let's start, you know, every conference opening keynote has to start with mainframes, right? We've got more mainframes later. So okay let's look at what it was like in mainframes. So we have the big blue version of this cycle, right? So did it all in blue this time. So we're trying to expand territory. We have, you know, there's a think of it back in the 80s something like that, you know, we're going to take over, we're in Wisconsin we're going to take over Illinois, right? You know, that's kind of the territory. Franca, the Japanese are coming they're making cheap, better stuff, you know, whatever. Or you actually did spot a customer pain point you wanted to deal with somehow. You have systems analysts. Remember there used to be people called systems analysts and they had been new to capacity modeling. You put three months building a detailed queuing model of exactly what you were going to build so that you would order the right mainframe eventually. You put this in your five year plan. You have board level buy in because this is a massive amount of money and then you start evaluating all the vendors or the software. Then you customize the vendor software and you upgrade your mainframe and you put your print ad campaign and then you see whether you've got any more revenue the following year. So this is like a year long release cycle. And this is, you know, that would be considered state-of-the-art reasonably agile 20 years ago, right? So it's like a million to $100 million investment. You know, you're betting the whole company if you fail, you probably go bankrupt or get bought, right? You know, we've seen companies do botcher product launch or botcher and IT roll out and just really go out of business. And that's kind of what's staring the parts unlimited in the Phoenix project in the face. They're going to outsource IT or go out of business if they don't get this right. And it's probably like cobalt and MVS and all that stuff. So then a few years later, you know, that's before the time I was in the industry but I used to work for Sun. So I made this purple because Sun was the purple kind of little purple feet on our machines and things. And I think, you know, you could probably turn stuff around in three months in this area. So you're still solving the same problems except maybe it's like China instead of Japan or is the new foreign competition or something. And this time, you actually, you built a data warehouse because, you know, teradaters, you could buy those kinds of things now. And you put, instead of doing modeling, you're doing capacity estimate because you didn't want to do it in a week instead of three months. And you have a one-year plan and you really only need CIO buy in this time. You don't have to go all the way to the board but you're still doing vendor evaluation, customizing software, installing servers but you're now doing TV advertising instead of print advertising. So the world is really moving on here. And again, you measure revenue, maybe you're quarterly revenue, you can see a difference now instead of annual. So it's speeding up. This is kind of, you know, I was working with customers in the 90s that were kind of this would be state of the art. If you could get something out in three months, you were going pretty quickly. So, you know, three months to a year, you're sort of a hundred K with sort of the minimum entry price to get into this up to some pretty big deals. And it was sort of C++ and Oracle and Solaris' sort of generic things or HPUX or whatever, but the general idea here. But if you failed, you got a revenue hit and maybe the CIO lost his job. And the company's not going to go out of business but maybe the CIO gets hit on that. So then, you know, moving on a bit back in the early 2000s, I was at eBay and we had a two-week agile train. You know, it was like excellent stuff. A lot of people are actually still running on this kind of train. So I probably read maybe Gateway or something. Dell, I don't know. You've got your basic commodity machines. Same kind of problems you're trying to solve. You still got a data warehouse. You're still doing estimates. But you've got a two-week plan now. It's like, what's on this train? That's the plan. Your business does buy-in. You're prioritizing your features. This is all nice and agile and you're getting stuff done in weeks. And it's about a code feature is what you're acting on. And you install the capacity. You do web display ads now because display ads are called and that's why Google made all that money. And you're measuring sales directly. So this is getting to be kind of what a lot of people are actually in right now. So, you know, it's much cheaper. Another order of magnitude cheaper. Everything's happening in weeks. Cost of failure is the product manager's reputation. Maybe, you know, the two weeks later, he gets it right and everyone's happy. You know, it's not such a big deal. So that's good. And it's sort of Java, MySQL and something like that. So when you're doing this, there's a bunch of hand-off steps. The product manager tells the developer what he wants and then the developer builds it and then he gives it to QA who integrate all the work by all the developers that's gonna go into this train and then they give that to operations who go and deploy it and it breaks and they go back and forth a few times trying to get it to work. And eventually the BI guys build a report that this is what happened in this train. So that's, you know, reasonable. So what happened here? The cost and size and risk of change reduced and the rate of change increased, right? So this is the trade-off. So where do we go from that? What's the next step? So what we've been doing, we call cloud native. We're constructing this highly agile and highly available service from ephemeral and assumed broken components. It's not that we expect them to be broken. You just have to assume the default assumption should be that any request you make to a service is likely to fail. So you should actually practice, what does your code do if this thing fails? So you don't assume everything's working. You flip that assumption and then you build very different systems. And that's really the mind shift that you have to get through. And it's one of the ones that a lot of developers and a lot of enterprises have trouble with. So you end up building a system though it's a bit like this. This is actually, a year or two ago, what Netflix's homepage looked like. These are all the individual services. Each of those icons is actually, you know, probably hundreds of machines scattered across three different buildings or six different buildings. And, you know, this is just a service, the homepage. That's, you know, that homepage when you visit Netflix. This is the web of different HTTP requests and memcache lookups required to just render the one page that you get back when you first visit. So it's pretty messy. If we lose a service, you don't even notice, you know, one piece of functionality on the page isn't there, everything keeps working. It sort of routes around it. You get a different row of movies or your movies aren't quite as nicely personalized as you think. So how are we getting that delivered really quickly? So continuous deployment means, I'm trying to deploy stuff and I don't have time to have that bi-weekly meeting with IT because I'm doing it five times a day and I can't go and meet my IT guy five times a day. It'd go crazy, right? That doesn't work. So because there's no time for this handoff to IT, the developers have to do it themselves. So what you've got now is developer self-service. The developer is putting code into production themselves. They're responsible for it. They click the button. They take the stuff to live. So that's a freedom that developers get. But then they're also responsible for what they put there and you have to hold them responsible for it. And if they do stupid things a few times, you find a new developer, right? So there's an element of peril here. And if you don't have any peril in the system, it kind of gets out of control. I mean, there's this idea that cars, instead of having airbags in the steering wheel, should have a like a six inch metal spike sticking out because then you would actually drive, if everyone had it, everyone would drive around really slowly and carefully. And there would be many fewer road accidents. Occasionally, some would get like spiked a bit. But overall, it would probably, everyone would drive around very slowly and carefully. And if you give everyone airbags, they drive like crazy because if they crash into somebody, all that's gonna happen is they're gonna get airbag rash or whatever. So there's this sense of having a little bit of peril in the system causes people's behaviors to actually get well aligned with the outcomes, right? So we have developers running what they wrote, which means that they get root access in production. And some people go, oh my God, but we put them on page duty, right? So you get both here. You were gonna call you at three a.m. and we're gonna give you all the ability you need to fix it, right? You don't have to call anyone else. You don't have to wake up the IT guy as well. You wrote the code. It's in this, you know exactly what state it's in. You're changing it four or five times a day. It doesn't time to transfer that information into somebody else's brain so that they can operate it for you. You have to go back to that. And it turns out that the developers actually spend less time managing stuff in production than they used to spend talking to IT about how they were supposed to manage it in production. So there is a net actual reduction in the amount of time you spend worrying about operations because you end up building code that's more reliable in production to start with. And because you're on hook, you can resolve things really quickly. And it doesn't happen that often. But yeah, occasionally you'll get woken up at three a.m. because something bad happened. So what is IT in this world? IT is a cloud API. I don't really care whether you're earning internal IT or whether you're getting it from a public cloud vendor. The way you talk to your IT department is an API. And for us, our IT department is AWS. And we can talk to them once a week on a con call, but we don't talk to them about provisioning individual machines. That's happening all the time. If you're big enough to run your own internal cloud, then there shouldn't be a conversation about a deployment unless you want to deploy something that's so big you're gonna run out of a resource. So this is sort of DevOps automation with the accent on the dev. This is the developer side. I mean, DevOps is dev and ops coming together. But starting with the developers and teaching them to do operations, that's the flavor of DevOps that Netflix has been doing. There's another flavor where the ops guys figure out how to automate their stuff and they're learning to develop things. So that's coming at it from the other direction. And sometimes we kind of miss a bit in the middle, but that's generally what's going on. So another thing that we've saw in those other loops was like the vendor evaluation and customizing vendor software. We don't do that anymore. We get everything from GitHub. And if it isn't on GitHub, we write it and put it on GitHub. And everything's Apache licensed. And so that whole vendor cycle is gone now. The major dependency we have, external dependencies, probably Cassandra, and we have our own committer. We're contributing, jointly developing code in public with a whole pile of other people, lots of other companies to make Cassandra better and better. And that's our major, that replaced what we used to do with Oracle. So this is, and if we want something, we know exactly where the code is and we can change it ourselves if we need to and we can discuss it. So what that means is that we've ended up putting a lot of stuff. This is what Netflix's GitHub account looks like. We borrowed some code from the UI team and we found some icons for, because we own the icons for genres, it turns out, but not for anything else. Randomly signed genres. But we have 35 projects there and we have a few more coming in the next few weeks. But this is, from two years ago, we had one. And this is kind of us engaging with the community, putting a platform out there, consuming stuff from outside, putting stuff out there, filling the gaps. So putting all that together. Let's do continuous delivery on cloud. What does that look like? So going back to that original thing. We're still looking for land grab opportunities. We're still reacting competitive moves. We're still looking for customer pain points. You're doing analysis, we're modeling hypotheses. We're doing AB testing of a piece and the iteration loop here isn't even a feature anymore. It's maybe one line of code. It's a step towards a feature. You can put all these steps into production continuously. You could be checking, every check-in could flow all the way through. You're gonna plan a response, you're just gonna do it. Because we empowered the developer to decide to put it in production so you don't have to do any board meetings anymore, which is good. But you have to share that your plans. You have to communicate to people, this is what I'm doing, so other people know, and if there's any implications, they know that you did it and what you're doing. So instead of asking for permission, it becomes sharing your decision. And then there's an incremental implementation. We're automatically deploying and you're launching it. The end point here is that eventually you get the thing into a state where somebody's in the launch an AB test and then you're measuring customers and you're measuring an AB test, a subset of customers behind a feature flag and all that kind of thing. So what this is typically known as is this is the innovation part and companies say we have trouble innovating. They mean they can't see the stuff in, they can't see the grumpy customers in front of their very eyes. This is also known as big data. And what's interesting about big data to me is I want to be able to quickly answer a question that nobody has asked before. So traditional BI is you get the same weekly report saying we made some money this week good, right? That has to be done accurately and all those kinds of things. But I want to know, is this customer pain point real? Who has it? How many people are hitting this piece of the site, right? There's a broken link. How many people are hitting that broken link? Those kinds of things. And then you build a hypothesis about, well, if I do this change, it will make this customer happy. You loop your way around and a day later or whatever, you've got data. This part's culture. The culture of giving permission to do stuff all the way to the developer because we're going around this loop in hours now. There isn't time to go ask people for permission to do things. And this is basically cloud, whether you're doing it internally or externally. What you're doing is trying to get minutes to deploy stuff, right? That should be what you're looking at. So what does this look like? The cost is kind of near zero. It's variable expense. If you turn stuff off again, you stop paying for it, which is nice. So you can deploy, shrink things back. You're not committed to a three year depreciation cycle for what you needed to roll this out. You're in hours, maybe days. What you're betting is a decoupled micro service code push, like one of those little dots on my big diagram of all of those services. I'm changing a line of code in one of those. That's my bet. And the cost of failure is near zero. I'm running AB tests. I'm running red, black deployment. So I have the old code running alongside the new code and I can flip back and forth. It's all gonna work. Kind of languages have moved on a bit. We're still mostly using Java, but we've got quite a bit of Scala and Python now and we're starting to poke a bit at closure. And you're running on no SQL on cloud. That's sort of what the world looks like. And so those continuous deployment handoff steps have collapsed now. There's a product manager and the product manager says, I have a hypothesis. I'm gonna create an AB test. Here are the different experiences I want. They actually self-service creating in the system the AB test itself. The developer writes the code to implement those variants. Tests build some automated testing to do that. They deploy the code. They're on call for it. And then they have some self-service analytics to say whether their code's running or not. And once it's all in place, the product manager will actually turn on the allocate customers to the AB test. And then they will actually see the results come through statistically valid, where all the statistics is done for them. So it's all a self-service layer of systems. And there's really only one handoff. It says, please implement this AB test. That's the next thing I want you to do. And then everything else is basically there. So we're building some automation to support this. First of all, you check in code. Jenkins build happens. We bake an AMI. You launch it in your test environment. This is like from checking, everything else can flow through and be automated. We're not quite there, but we're getting there. There's a functional and performance test. So the functional test is that broken. Okay, looks good. Then we do a little stress test to measure how many requests per second that code can take and compare it with the previous one we do side-by-side testing. Then production, we do a canary test where we take the old code and the new code and we start a fresh version of the old code alongside the same time as a fresh version of new code, give them equal traffic and see what happens. Do some signature analysis. I'll show you what that looks like. And then we roll forward into a production red-black push where we stand up the new version of the code. We leave the old ones there in case something really breaks quickly. Then you can flip back and forth between them. So that's the flow. Flowcon, good. So we're trying to automate all the steps in that. We haven't got all of it done, but we're getting fairly close to it. So Roy's in the audience and one of the engineers working for him gave me some slides, some screenshots. So we have signature analysis where we take a canary in production and we run statistical analysis of whether it's better or worse than the previous one and what's variable. And this is a bad canary. And if I zoom in on this bit of it, you can see that the little red dots off to the right are the mean and the median for that measurement, which is way out of the acceptable range. So this is a canary we'd reject because it's behaving in a way that is unhappy, right? And we've got a few metrics that are okay, a few that are noisy, one that's better than usual. So it's blue, it's a good one. And overall we'd reject this. So say this code and you bounce it back, go fix it, what did you do? You broke something, right? This is what a happy canary looks like. It's all green or I'm not sure. The not sure is close enough. Basically you can see that if you look at those, the width of the little sort of variant things are very narrow and everything is within the right range. So the software to do that, we're working on it. We're trying to build a flow that uses this as the gate to production. And the moment this like goes back to the developer and says, here's the result. Are you happy with this? Do you want to hit go? So we reduced instead of all the manual steps of deploying something, it's a one click. And it says, okay, take that one through to the next step. So then we've got a happy little canary and we've got some code and we've got it running, but we actually have three different sites, three different regions where we're running everything. So this is the other kind of automation we want. So it's afternoon in California, you've been developing code, you think you've got something, you check it in, it pass tests, then you want to deploy it. So we deploy it to Europe. Because we have some machines in Europe and it's nighttime in Europe and Europe's a small site and it's night work. So it's very little traffic. So if we totally break everything, relatively few customers are inconvenienced and they're a long way away. But we're kind of working towards this. And then we run the canary, we make sure it works, we sort of scale it up and everything goes and it's fine and it's running. And the developers is there, it's at work looking at this. They get the reports, yes, this looks good. Then the next day they come in and it still looks good overnight because it's been through peak now in the UK following morning. So then you can deploy to US East or US West to pick one of the US sites and we automatically sort of, we can set this up so they're workflow. There's a groovy-based DSL we came up with called GLSEN which is talks to the Amazon Simple Workflow. And we're using this behind our Asgard tool to just try and sequence these things. So you can kick off a push that automatically rolls around the world, you know, it shipped the next day, if everything works, that kind of thing. So again, we bring it up and we go through the canary process again because the state and the initialization is different enough that you don't want to just assume that it's working code. So we run it through the canary process again and then maybe the next day we'll roll it to the West Coast. I mean, you can choose how much delay you want to put in this, but conceptually, you could do a check-in, it could go through all this process automatically and if you wanted to just hands-off do it, you could roll everything all the way through to production and you could be continually creating these. We typically run the canary for a few hours to get valid data on it, statistically sort of clean data, but you could shrink that at the time if you really wanted to. So that's what we've been doing. That's kind of flow for us. There's a couple of things I want to just wrap up with, but one is the inspiration side. All of the things we were doing around Cloud Native, we've had all of these great sources, so I wanted to give them some credit. Michael Nygaard, a lot of you probably know Michael, release it contains bulkhead patterns and circuit breaker patterns and all the things that ever went wrong when trying to deploy code and we've built systems to deal with a lot of those patterns. Thinking in systems is not really about software, it's about the overall sort of feedback loops and how to build systems which have the right emergent behaviors. If you set systems up in one way, everything goes to hell on. If you do it the other way, everything goes happy. So it's about thinking about things in that very systemic way. The REST API design handbook by George Reese is a, he's been coding to all of the cloud APIs for years and there's a lot of brokenness in there. So it's a lot of don't do this kind of stuff. It's a very short book but basically he was like ranting on Twitter distilled into a book. Oh my God, this cloud vendor is so broken kind of stuff. Antifragile, what we're building is a system we're trying to build a company and a system which is antifragile in the sense that we continually attack it to make it stronger. This is, think of, if you're not running into the antifragile idea before, think about just working out. You're at the gym, you're not used to going to the gym so you're gonna go and you're gonna work out and at the end of that day you feel really bad, you hurt, right? So why would you do that? It hurts, right? But the next day you feel a bit better and you're a little bit stronger and you keep doing it again and again and you get stronger and stronger. So this is the concept, right? By damaging a system slightly, you're stressing it slightly, it gets stronger and if you don't stress yourself, if you're completely out of shape and flabby and the first time you need to go run for a train or something, you have a heart attack, right? That's the failure mode, right? You want to build up the ability to deal with sudden attacks, sudden stress coming into your systems to deal with it. Drift into failure, anyone, I always say this, but anyone getting on a plane soon, don't read this on the plane, it mostly consists of aircraft falling out of the sky kind of examples. The rest of the examples are about people dying in hospitals, so, you know, just, but it's about how a sequence of perfectly good decisions made with all the local sort of knowledge available will build up to a tragic failure, right? So the tragedy of the commons kind of failures happened because everyone is optimizing for the information they have, nobody is to blame, but the system will end up in a big outage and so that's kind of the drift idea. Then there's this continuous delivery book, some guy wrote, there you go. If you're here, you probably know about that already. I'll make it up. And then everything is obvious, so if you're trying to build systems that are operable, then when you get into the incident review for something going wrong, you say, well, it's obvious why it was gonna go wrong. Why did we see that, right? We didn't make it obvious. So thinking about what is, how do you make things obvious? How do you make it obvious that when this problem happens you push this button and you pull this lever in this direction? Because quite often when system things go wrong, people do the opposite of what they should be doing because it's not obvious, the machines are all messed up. Let's reboot everything. Usually that causes everything to completely tank. It's like we've been trained by our Windows PCs. The answer is to reboot everything, but it turns out there isn't. In most cases, if anyone says in the middle of an outage, let's restart everything, just say no. It's probably a bad idea. And then cloudonomics, if you're trying to figure out whether you should have stuff on-premises, off-premises, all of the sort of economic modeling for cloud, but Joe Weinman's book's really good. And I was hunting around and I found there's another book coming out soon, this one. So make it up to Jess. And I actually got to read a few bits of it, so this is looking forward to this coming out. Here's my takeaway. Speed wins, assume stuff's broken. There's cloud native automation is the way we're getting everything done. And then GitHub has become both your app store and your resume. And it's a resume as a developer and as a company. People look at Netflix's page on GitHub and go, I'd like to work there. We have all that interesting stuff you're doing. And we go and look at, when somebody wants a job at Netflix, we say, what's your GitHub ID? We'll go read your code. We don't need to have you stand and squibbling on a whiteboard. I can read what you've done over the last few years. So that's it. So I'm going to end with an A-B test hypothesis. My hypothesis is that you actually prefer the Scooby-Doo ending. What do you think? We'll see. So there we go. That was continuously delivered at about 10 to nine this morning. All right, so that's it.