 I'm going to talk to you about fun things that I get to do with scaring people, because pretty much all of my talks are either things that I'm angry about or things that I want to frighten you about, if you drill down deep enough. So this one is called, All the World's a Staging Server. There are four elements of software that move in relation to each other. If you transform one, you need to pull the others with it, because halfway is bullshit. And doing only part of this transformation is not going to work out for you. So I think of this as the post-agile world. I'm moving toward calling it the blended software environment, because about two-thirds of all the software that we're using that we call our product is not something we built. It's something we integrated, something we rolled in, something we consume, and that's fine, but we need to think about the fact that we aren't really designing as if that were true. So the four elements, the four pillars that I'm talking about are development style. What is the definition of done? So this is things like waterfall and agile and Kanban. How do you know what it is you're supposed to write and what that means? Given that, how do you store your code? Who here has started in technology since the year 2000? Who here has started since 2010? I love this audience, crusty like me. Who here has used source control that isn't Git based? You remember this, we used to have to lock things. You'd check something out, and it was yours forever. And if you got hit by a bus, we had to try and break your lock. It was kind of terrible. So source control. Testing, what is testing? What do we do testing for? What is it meant for besides people not yelling at us? And deployment. How do you get your magnificent opus from your computer to the user? And what steps does that entail? If you move any of these four things without the others, you have sort of an unstable base. Like, you could go ahead and try and do waterfall and software as a service. But it doesn't work out super well as we've discovered. When I think about this, I think about this really cool project called Wintergotten, which is a bunch of complex moving mechanisms and gears that essentially make music by dropping ball bearings on other things. And it doesn't work in isolation. You can't test, like, any single part of it because it all has to work together. Like, sure, you can do something with, like, does the screw lift work, but you don't know if that's going to make music until you put it all together. So this is one of four talks in this series. The first one is called Tinker Toys and Microservices, in which I depress everyone by telling them that they're testing all wrong. Full testing is impossible. Anyone who tells you that they have full test coverage is probably lying or spending so much money that it's not possible for you to keep up with them. You need to be able to decouple your understanding of what's going on. We call this microservice architecture, really it's about loose coupling. So the Accelerate State of DevOps report dropped on Thursday, who's read it? Yeah. Okay, get on that, people. Nicole Forsgren and her team, once again, full of brilliance, Accelerate is the book that you should have already read. This one, this report, is a million percent like, hey, Accelerate was right when we wrote this book, we were right, we are moving this direction, and it's all about loose coupling and moving faster. And then the last thing for Tinker Toys and Microservices is because you can't test everything, you're just going to need to test endpoints and behavior-driven development and what the user experience is. Because, like Charity Matrix said, nines don't matter if your users aren't happy. They don't. And so what we need to be testing is what our users experience and not what we think ought to happen. The next one is called Every Scarra Story, and this one has to do with my own personal trauma about cutting myself with a plastic knife. It's about source control. I think that after 20 years of get, we are coming to the end of its effective usefulness. Not like immediately, but let me just set your world on fire a bit and say, if we were going to keep using get, it would need to be a million times better about how we ingest dependencies, because it does not understand dependencies. And then we try and put dependencies in and we have a bunch of hacks that make it sort of work, kind of, right up until one of our vendors or providers or partners put something crappy in and we release it and we don't know because it's not in our source control. Yeah? You've had that moment. Branching is a habit and not a mandate. What's the first thing you do when you're going to write some code? You, no. The first thing most people do and the thing that we've been teaching people to do for 20 years is create a branch. And when we do that, we're saying, here's something that's going to live outside of the run of code until I'm ready to commit it. But by living outside that run of code, we've hidden it. We've made it impossible for other people to see it and affect it. And every time we pull a branch off and the longer it lives, the worse it is, we're saying, I commit to having a merge conflict in the future. Like this is really what a branch is. It's like a promise to have a merge conflict because that's what's going to happen when you try and put it back into master. So question your assumptions about what it is you actually need from your source control. This one I'm going to get into more because obviously it's this talk. But I'm going to talk to you about killing staging and testing in production and launching darkly and going faster being safer. Because I'm really excited about the way that we're moving toward going faster and being safer. I think that is what we were trying to do with Agile and we got caught up in tradition and ritual because humans love religion. And that's really what we're doing when we're saying, I'm doing Agile. When was the last time any of you wrote a story on an actual three by five card? Right, no? I was talking to somebody, they're like, yeah, we do Agile. As a developer, I need a Kubernetes server to do this. I'm like, it's not a story. We all need some remedial reading in what Agile is. It's working for us. We get stuff done. We get it done faster than we used to. But we're following all these procedures without really understanding why we were trying to change this in the first place. And the last one is called everything's a little bit broken. And it's about the fact that although we can have semantic compression of what we're doing, we can't do anything about the fact that there is a finite amount of effort that has to be expended in the world. Every time we do something labor saving, it is either because somebody has already done a bunch of labor and we're building on top of it, or because we are displacing the labor in a way we can't see. I would like to introduce you to the myth of American exceptionalism built on slavery. We are not actually that good at things. We just used a bunch of labor and didn't acknowledge it. So when we think about things being broken, we need to accept that there's more going on to every system than we're really thinking about. And we need to accept that development is not perfect and will never be perfect. We can strive for perfect. It's good to aim for perfect. But we're not there yet and we will never will be. And everything is broken. You can only anticipate failure and mitigate it. You can't say this is never going to fail because it's perfect. It's going to fail. Parts are going to break. Every plane you have ever gotten on has broken parts. I guarantee you there's no perfect play in the world and yet they stay up in the air. So let's make our software something that can stay up in the air. So here is the staging server part. Deployment is not release. Deployment is the act of putting your code on the deployment server, the production server. It's getting your code to where it could be visible to the world. Release is a business decision. Deployment is a technical decision. Release is a business decision. Release is where you get the value. And you don't have to do these together. For a long time, we've done these together. We're like deployment and release were effectively interchangeable words. We're like, oh, when's the big release? Oh, we're going to stay up until midnight to do that. I mean, you could. But really, what if you deployed and nobody could see it? And then you could release it at, like, Monday at 10 AM when you're smart. Here's the thing. Here's the truth. We are all testing in production. Some of us admit it out loud. But no matter how much testing you've done ahead of time, you're testing in production because there's something that's going to go wrong. This is a quote from aforementioned amazing state of DevOps report. The types of incidents that bring down production systems are often caused by interactions between components that are operating in apparently normal parameters, which might not be encountered in test environments. You've had that break. You've had that outage. You've had that moment where you're like, nothing is broken. Why won't you work? That's what God gave us honeycomb for. But the answer is, because production is too complicated for us to ever simulate effectively. When I ask people if they can stand up a full production system, can you stand up a full duplicate of production? Yeah, no. Sometimes there's AWS people who are in the audience who are like, yeah. I'm like, yeah, yeah, what are you doing about your user data? Oh, you're using that in test? I am now unhappy. Because even if you could have a full duplicate of the technical stuff, you would still have to have a full duplicate of all of the vast weirdness that is people, that is one letter last names or no last names. I'm really enjoying the fact that I have been giving this talk and saying, you need to be prepared for the future when people name their children with emojis. They laughed when I said this. They laughed when I said this. On Friday, we got a bug in our production system that said, hey, we can't delete a flag because it's named with an emoji. I said, so when you're thinking about this, you're never going to be able to test the weird ass things that humans do to computers because you know how computers work and you don't do terrible things to them. I mean, you do, but like, predictably terrible. Unpredictably terrible things, that's the realm of the user. So we can sit around and talk about sad stories. Like, we deployed to staging and everything was going to be fine, but then we deployed to production and it turned out we were missing some underlying dependencies for tools that we didn't use very often because it was an internal tool and it hadn't been updated and that's a different team and by the time we discovered it, the person who had written it had left and yeah, you've had that day. In the cloud native world, staging is a lie. It's not like this person is actually flying. We have enormous difficulty replicating the complexity of production, the weirdness. If you did that, you would have to have more licenses. Are you going to authorize against third party APIs the same way? Are you going to pay for auto scaling? Are you going to test auto scaling? So I found so many places that I could find this quote. This code is working fine in staging, but not in live. I'm like, sorry, Stack Overflow friend, you and everyone else, including me. I can't figure out why this should work and isn't. It's a lie and it's a lie we tell ourselves and find comfort in that we have tested in staging and we should be okay. I'm not saying don't have staging. I mean, some people don't. We still do for a few minutes. But mostly I say you have to accept that staging is a part of testing and not an actual approval. Green is expensive. I already said this. It's really expensive to stand up a full duplicate system and fade over. Production is unknowable. This is an image of an organ in your body that we didn't know we had five years ago. It's called the interstitium. And it's basically a bunch of tubes filled with pressurized fluid that run all through our body, sort of like the limb system, but different. And the reason we didn't know about it is because if you cut into a pressurized tube, what happens? It collapses, right? So the thing that we're saying is until we had good enough imaging to not cut into the tube, we didn't know the tube was there. Production is the same way. Until we have real data flowing through the real pipes under real circumstances, we're never gonna be able to see it. And our imaging is not good enough yet to be able to test or predict that. So now that I've made you all depressed about the fact that we don't know anything and we can't control everything, let me talk about how we can make this a little better. Production can contain superposition. I love physics, like in sort of an amateur way. Production can contain superposition. It can contain more than one thing at a time. It can contain Schrodinger's code, both on and off at the same time, delivered to different people at the same time. It's amazing. It's so cool. I think about this all the time when I think about how we're doing traffic routing and shaping, how we're doing personalized delivery. I have a weird futurist hat thing where I'm not allowed to name this flag markup language, but if we used feature policies and personalized flag identity to be able to give people a persistent pseudonymous identifier on all the websites they visit, imagine what that could do for accessibility. I would love that. I'm so excited about that. So when I think about a person's experience of their software, because it's not our software once they have it, their software, I want that to be as perfect and tuned to them as possible. You can use feature flags and targeting to change your production environment to deploy without showing your progress to everyone. In fact, to do continuous integration and deployment, you must be able to deploy broken code. You have to be able to put broken code on production or you are not really doing CI CD. Yeah, yeah. And every time I say this, all the testers in the audience get like a cold chill down their spine because we've told them forever that their job is to prevent bad things from getting onto production. That's a misunderstanding of what testing is about. Testing engineers are amazingly smart people. They're very cool. And the thing that they are here to do is prevent bad things from happening to users. That's different than preventing bad code from getting on production. I can see how we conflated these, especially when production was like a gold master that you had to print and ship. But now it's not. Now we're saying the thing that we want testers to do is ensure that users have a consistent, reliable, and safe experience. And they can do that by testing in production. And they can only do that by seeing how production responds. So when I'm saying test in production, I'm not saying stop testing. I'm saying give people more tools. You can hide in the dark. You can do all sorts of things in the dark on production without sharing it with everyone. This is what feature flags are for. This is why trunk-based development is going to work. Because if you have trunk-based development, everything is live all the time. But it's not visible. It's not there for everyone to see, even though it's in effect. I think that's so cool. I'm so excited about this. The idea of dark launching has been around for a long time. Sometimes we mean trying out for a select audience, and sometimes it means splitting requests and sending some against a new service. And sometimes it means opening with no fanfare. So when a restaurant opens, they don't say, hey, we're opening on Monday and we would like everybody and their brother to show up and order all the things at once. It will be very famous. No, what they do is like Tuesday morning, they go, oh, yeah, we're open, no advertising. But if you wander in, we might give you some food. Because it turns out that testing your systems and your processes at full capacity right away is not great for restaurants anymore than it is for us. What we really wanna do is say, hey, let's get the processes worked out. Let's figure out how to do tickets. Let's figure out when everybody's smoke breaks are. Let's make sure we understand this before we add a lot of load to it. And if restaurants can do that, I don't see why we can't. I want to be able to say, let's figure out how this is gonna work before we add a lot of load. It's far easier and cheaper to fix a problem that's only appeared for a few people than to do fixes or apologies if a lot of people were affected. Similarly, testing your system resilience by starting with a small part of your audience is going to be much safer than trying to start at once. When I think about this problem, I think about Best Buy. I'm from Minnesota. We get a lot of newspaper articles about Best Buy. Best Buy has a quirk. It turns out they don't like paying their competitors a lot of money, so they're not on AWS. They have what we call data centers, which is like AWS or GCE or Azure, but you pay for it yourself and you can't scale it. You got what you got. And what happened was, for reasons unknown, I mean we can go into them, but for reasons on Black Friday, the Best Buy site went down. Okay, that's bad. We're gonna stand it back up. All right, we stand it back up. Thundering heard. Everybody is trying to hit the site simultaneously and the site is huge. It's got a shopping cart. It's got like animated pictures. It's got attractors. It blanks like, you know, in a sort of suave 2010s way, but it still blanks. And it's so huge that even if they had the capacity to manage sustaining people at that page size and like a normal level of turnover, they didn't have the capacity to load it all for everyone all at once, which is what the Thundering Heard problem is about. Like everybody is like hitting refresh. They're like, give me the page. Give me the page. Ooh, maybe Amazon, right? So they finally ended up doing some traffic shaping so that they were only lighting through a few percent of people a time until the page got loaded and then they had led through another few percent and they stood back up, but it took hours to figure out how to do this because it kept falling the heck over under load. Being able to choose who gets into the site at a certain stage would have made all the difference in the world for getting back up faster. Integration testing takes a lot of forms. When we think about integration testing, we think about automated tests and testing plans and sometimes stress tests, but it's really difficult to do full testing in a microservice world because they're not all your microservices. Every new microservice adds a node of complexity and that means that your math stops being additive and becomes multi, nope, I've lost the word, combinatoric. It starts being combinatoric. And all these nodes, and there are people who have hundreds of microservices, there's a new app out there that's, it might be Dynatrace where you can see all of your microservices, they have the lines connecting them and they move around and you're like, do you really have 400 microservices? Because that's a lot, okay. Some people do and you can't test what each of them does all the way through with all the variations. So everything that we're doing needs to be about the end path. Not like every variant, but the paths that are most likely to happen, the things that are most likely to be true. Integration testing is going to need to start thinking about behavior-driven testing. Does the software behave predictably in the way we expect it to? I don't care if it executes the way you expect it to. It can be a bunch of chain together black boxes with like tiny gnomes chipping out ones and zeros for all I care as long as it gives me a predictable input and output. That's fine. What we need to know is whether that output is going to be predictable. This is one of the last pictures that we got from the Opportunity rover, which stopped transmitting in June of last year. Both Opportunity and its twin spirit were designed to last 90 days on the Martian surface. With the expectation that Mars' extreme winters and dust storms would cut their mission short. It lasted 15 years and it sent us science that whole time. That's pretty amazing. So somebody asked the JPL engineers how that happened, like how did they make something that was designed for a 90-day operational life last 15 years? And the answer was, it was all decoupled. If the rock-driller didn't feel great, that didn't mean that you couldn't be using the electromagnetic spectrum thing. They had more than one radio on. The whole system was designed to talk to each other but not rely on each other. It was all decoupled. It was all loosely tethered together. And the only thing that really happened in the end is that the solar panels got covered by dust and there's nobody there to clear out the dust off a tiny little rover. And it's possible that in some future dust storm, it will get blown off and it'll start talking to us again. Even though it's been dormant for over a year, they're still listening. I think that's a great metaphor for how we should be designing software. We should always be listening for it. We should always be building it with resilience and the understanding that some parts are going to break but not everything is going to break. So I know that when I say all the world's a staging server and you should kill staging, I get a lot of arguments. So you can't just wholesale dispose of staging. Like we've discussed, you need somewhere to make sure your stuff is gonna work in production but you do need to stop thinking of it as a separate server or environment. I know you have some objections brewing in your soul and so I'm gonna see if we can settle a few of these. You need to have automated testing as part of your code hygiene and best practices. What if you can't check in unless you've done your unit test? I saw a great tweet from somebody a couple of days ago that says as a tester 95% of the things I find should have been found by unit test and weren't. I'm like, well, that's discouraging but also predictable. I think it's easy for us to believe that there are a lot of things that would be easier if we got to do something new but we need a firm foundation. This is a picture I took at Masada of a Roman fort that's about 2,000 years old. The wood's gone but the structure remains and if you wanted to build a Roman fort there again, it would be pretty easy. New technology is not going to save us from bad fundamentals. If we're building on an unstable surface, it's gonna be bad for us. So I'm not saying get rid of testing. I'm just saying do more testing. What about unit testing? I just said that but let me say it again. You do it the same time you always do before you commit the code to the trunk and if you don't require a passing test to code to trunk, what are you doing with your lives? Because that's just basic. Like if you want somebody to do something you either have to reward them for it or punish them for not doing it. That's basic. It's not even human psychology. It's like animal psychology. Either a reward or a punishment. That's how it works. Because otherwise if you can't test effectively before you get something into production you end up with the Hubble telescope which they machined really precisely micromilometric precision except they got some numbers wrong. And then they had to figure out how to make it a pair of glasses, launch it into space and attach it to a telescope. I feel like better unit test might have helped us with that whole second launch problem. What about bad ideas? Are we using staging to catch bad ideas? Well, yes. But if production is not robust enough to run experiments on, what if you made it stronger instead of coddling its fragility? What if you said it is important to me that my production be robust enough to deal with the fact that sometimes crappy code enters its orbit? Or you may not know something as bad idea just because you put it on staging. Maybe it's just false confidence. Maybe it's a comforting lie. What about essential cutovers? Things that you have to do in the moment like changing from one database to the other. But let's talk about teal transitions. Like blue-green is so like five years ago, I'm over it. We used to do database migrations and other transitions in a blue-green pattern, but now you can do two variants in the same place with less risk by writing messages to both places and then gradually moving them over instead of doing an abrupt cutover. That allows you to test to make sure you're writing what you think you're writing. So here's what I want you to go out in the world and do. Instead of using staging, or in addition to using staging, launch new features in production and keep them concealed. Keeping your environment consistent and well-tuned allows you to reduce overhead while increasing your confidence. Branch by abstraction. Branching is a mental artifact. It's how we think about adding new behavior to code. It's a pattern created by our source code repositories. It's not real anymore than any other kind of code is real. It's just a mental model. So think about what other mental models could get you to where you want to be safely. Test in production because it's the only thing that's real. Admit that you are testing in production because if you have a failure, it's going to be in production. And if this talk was too long and you read Twitter instead, let me just leave you with this. Kill staging because it's a plausible lie. Test in production. And if you want a free t-shirt, I didn't bring any because I'm lazy, you can go ahead and visit this URL and we will send you one and you can stop by our booth and get stickers. Thank you.