 a lot of stuff, so I'm just gonna get going. So, thank you, David. My name is Paul Bruce, like he said. I've got a lot of stuff to talk about. Mostly it's the fact that I work with a lot of different groups, and so I get to see a lot of different perspectives, and mostly my hope is that you'll take this not as I'm an expert in any particular thing, although I have my wheelhouse. You can get the slides here. There's a lot of links in them. I hold many degrees in awesome stuff from a long time ago, not really, but ultimately my wheelhouse is performance, right? So, high-scale systems. I'm a performance and scalability engineer. That's what I do. I also work a lot with CIN testing automation, right? Additionally, what I call learning frameworks isn't just artificial or machine learning frameworks. Come on, we're the original intelligence. Human beings need learning frameworks, and so I work with a lot of different groups to figure that out. My most recent thing is Creative Thinker's Toolkit, if anybody has ever heard of that. I also am starting to build this concept called Forward Dialogue, which actually, we did a number of things there. Some of the screenshots here. So I also work with this Selenium conference in two weeks, and I'm re-engineering the grid. I don't know if anybody's familiar with Selenium Grid, but it's pretty widely used, and there are vendors for that, but there's a lot of people who spend their own. And in terms of DevOps, we need visibility on how healthy those grids are. So I'm baking that into their code base because it's not there yet, and I think it should be. And I'm going out to that conference to volunteer as well. I also do a lot of stuff at home. I try to make sure that ideas that go, that work really well in work, if it works at home, that's great. Ideas at home also can get brought to see the light of day at work. So this is just organizational planning. So that's my dad life thing. I also do a lot of idea generation, particularly for the company I work with, full-time called Neotis. They're up in Lexington. They do load testing. They've got a booth out there. The only testing vendor here. And then also I do a lot of collaboration with people, particularly because I am not special. There's no unique qualities about me. I'm not a special snowflake. The idea is that the ideas come from collaboration. You can be the smartest person in the world and only have a limited amount of ideas. But when you figure out how to collaborate and communicate effectively with more than just yourself, if you're even lucky to do that, there's a wealth of ideas out there. So I work with a lot of different groups. This was, I guess, one month ago and two weeks ago. I did a thing about hiring and recruiting in DevOps, which turned out to be really useful for a lot of managers in the area. This is going to be fun. And hey, look, this was yesterday, right here. Who was in the open spaces yesterday? All right. There's a few hands. That's great. Thank you. I think you guys had a lot of really great ideas around the topics that you cared about. And so I have the luxury of just kind of peering into other people's conversations and then following up with them, which is really great. I also bring a lot of work home to read. I do a lot of reading, a lot of commuting, a lot of learning in that situation. And this is one of the best books I've ever read. Who's read Sightly Reliability Engineer? Come on. More hands. Sort of more hands than open spaces. For those who haven't, please, I encourage you, implore you desperately, at least just look at some of the first couple of chapters. There's some really important ideas in there. And even she says, oh, that's my daughter, seven years old. And she turns me on to a lot of different ideas, like for instance, the fact that she wanted this book because she saw it and she can read. She's been reading for years. And I didn't realize that, but software engineering was a term coined by Miss Hamilton. So that's really cool. So why am I here? Is this a technical talk? Yes. Don't worry. We'll get to the end of it. I think you guys have probably had enough Kubernetes up the yin and yang. You know what I mean? Like, there's meta about our jobs we don't pay attention to that significantly impacts our psyche, right? What is it, James? Talked about burnout this morning? That's not just a technical problem. That's in every field. And it's just as important to talk about this kind of stuff in technology. So what I'd like to say is that there are things that are more important than your work. Who would tend to agree that there are more important things than your work? Hold on. Hold your hands up. I'm not kidding. This matters to me. All right. Hold your hands up. Excellent. So sorry about this, by the way. There's more important things than this as well. And empathy is a really squishy term to some people. It's not to me because I've had to go through young dad life. I've had to step into the shoes and see life from a very different angle. And I've started to do that for everybody. For every single conversation I have. For every single business venture that I have. For my customers. For the people that I support in my full-time job. I look into what is most important for them. And to be able to at least step into the feelings and thoughts and understand where people are coming from is super important to be able to communicate efficiently. So my other thing here, the concept that I pulled you in because of the talk title is performance. The performance imperative. What do I mean? There's a lot of really important things in software right now. Equifax. There's a lot of aspects of software that are being ignored for various reasons. This is one that fortunately I have the luxury of talking to a bunch of SREs. And realizing scalability, reliability, availability, these all bake up into performance. And so what I mean by performance imperative is that quite frankly, we don't need to see errors. We don't need to see failures. And when people see stuff like this, we turn around and use something else. Do I have to say anything else about this? Also, by the way, being able to predict, we can't predict very well. The future is untold, so we need to respond quickly. That's the whole point behind reliability engineering is to be able to bake this stuff in to make it easier to handle real-time stuff. I'm getting no love from AV. That's okay though. Everybody has this problem. This is Amazon on Prime Day. They knew about this how long ago. And it's okay, like, stuff happens. It's okay. But honestly, this is how we typically live when it comes to availability of things that we didn't necessarily plan for. It's fine. It's fine. Okay. So this is something I actually realized a couple weeks ago was that performance, it's imperative. And I realized that there's a framing here that I think you'll all pay attention to. Obviously, the user experience. If it's slow, it might as well, you might as well move on from it. We all get that user experience thing. But what is poorly performing code and software due for developers, right? Well, first off, it's really easy to push code these days. And so that stuff can be really poorly performing. And you might not know because this stuff is so complex and hard to solve. The other thing that happens is because it's hard to predict and our systems are so complicated, the feedback loop oftentimes ends up being really slow. And so we ignore it, right? These are the developer experience moments where we kind of just say performance, I'll deal with that later, or security, I'll deal with that later too, right? But on the operational side, and this is, I think, where we pay, maybe in this specific local group, it used to be a lot of sys admins, and now it's a lot of developers too. But there's this balance in DevOps between who are we talking to, just the devs or just the ops, because we oftentimes just talk to one or the other, and not the combination. I realize performance in an operational experience, it means, look, we're changing stuff all the time, right? Infrastructure changes, it doesn't have to be code. It doesn't have to be that. It could be a live migration that you didn't kick off that causes performance issues, and so on and so forth. So the rest of this is, and I'll move fast, is three examples of empathy, right? I've now combined two different concepts, and I want to overlap them together. And this is the sticky, just follow it later, but my son realized that saving state on a Nintendo is easier than actually learning how to be a good gamer, and that sucks because I grew up in the 80s. So just consider that when you put something out there, you have no idea how it's going to be misused, okay? So first thing, user adoption, right? Slow is, we're done, right? Slow is broken. We still suck at mobile performance. Is it our fault? Is it the network's fault? It's both. So stats. Boy, this is troublesome. There we go. Okay. The world used to be this simple, and now it looks like this. Or maybe worse. Maybe you got some data cubes underneath there, some R models that you have no idea about the predictability or the reliability on a large set. But the cool thing is, for the user experience, the work that I do helps me be able to play around with interesting uses, bizarre and twisted uses, but still effective because, hello, we're engineers, of getting feedback from production. This was an example where a large vendor tried to launch on Salesforce Commerce Cloud, and since Commerce Cloud doesn't really provide any operational visibility into their back-end systems for their customers, how do they answer the question if it will scale? They made a major IT decision to move on to a platform, and oh, by the way, that platform doesn't give them much view into the back-end. There's problems with that, I get that, but how is the customer supposed to know if something can be launched on it if they can't see the cause and the effect, the impact of the load? So we played around with things, and we got rucks at this old stuff from Dynatrace to be able to provide real-time visibility under synthetic load, it was pretty cool. So that's an example of that. The next one is going on to the developer experience, right? What do I mean by downstream rework? Come on, you know what rework looks like. If it's slow, it's broken, and this, I had to draw for somebody who wasn't in the software space, who is still thinking in terms of cycles, yeah, but can't we push performance off? This is just an example of saying, look, if you take a few sprints to do something, you push off load testing using monolithic tools that take forever to get back, or monolithic organizational structures to get feedback back, it's going to be so late that you're going to have to be tripping over your own shoelaces in the next sprint. You're losing story points to something that, not your fault, we didn't know. Maybe it's broken at scale, maybe it's not, but if the feedback loop is shorter, you wouldn't have to trip over the shoelaces. Here's another version of it for the slide deck, and by the way, yes, on Twitter, there's the rest of this, but this is a weekend venture of mine just to say, hey look, let's talk about it in terms of how you would do a commit and what that looks like from that perspective. I did a survey last year with another company that basically said teams that bake performance criteria into their planning process have half as many problems with performance, right? Duh! But why is it not a duh, right? Why are these the leaders? Why do we have to segment the world into the market leaders and the laggards, right? What, because the Dora research says so? There's a good reason to, right? Because there are some people who get it, inherently get it, and some people have learned from difficult experience. So my experience is things have to be the performance-specific stuff, sorry about the slides, but there are a few Cs, right? You need to think about concurrency, you need to think about conditions, and you need to think about capacity. Good Lord, I think my computer is messing this up more than anything else. Here's an example of how you might do that in a story, is actually be able to bake this stuff into a story. Instead of just being handed a requirement that says, oh, by the way, we need the thing to do this many transactions. In how much time? Under what conditions? What is the back-end doing? So consider that the component of time is really important. You could put it on the back of the card, so to speak. This is not just my thinking, it's a guy who works in the space that's really good, Todd. So one of the things that SREs really care about is how do we make prod a less volatile place, right? It's not you can't ship broken code, it's why does it have to be broken and why can't you know about that in real time or beforehand, right? And this, we don't have time to go through this, but this was essentially a story of basically people put hard-coded variables in their testing scripts, and guess what happened? Those testing scripts got promoted, and then the data got deleted in production. Whoops. So these are some of the awesome stories I get to collect in my spare time. Unfortunately, I can't provide you names because I don't want to sully the reputation. But the point is our feedback, our feedback mantra now is early, often, and easy. And we don't have time to dig too much into these things, especially because of the projector. But for the most part, early it should be obvious. APIs, if you're not doing API testing, I don't just mean API functional testing, I mean API performance and security testing. That's the best place to start early. Often, we can't get often feedback unless we're automating a significant portion of things. And peeling off of your main pipeline to get this feedback allows you to ignore the length of feedback loops unless you have really good visibility on that at the team level. So if you're just going, okay, we're going to, you know, dev release that to a QA or staging environment and then do performance testing afterwards. Okay, but that's not the be all end all strategy. You need a more advanced strategy than that to really get early and often feedback. And the easy thing is that, look, there's various appetites. People want to either have devs do their testing or they might want to have a separate team do various testing. Or embed the teams. Whatever it is, the fact is it benefits us all for this stuff to be easy and to be able to fit into our landscape. So again, don't have time for a lot of this, but again, slides are pinned on my Twitter account. So please go back to them. This is solely to say there is a time budget for testing and if we don't respect the time budget, if we expect that hour long performance test to fit in a cycle that typically took what, like five minutes to complete at most, we are going to eject that like a bad habit. So we need to be really conscious and careful about the time intervals that people need that feedback. Additionally, doesn't this look like waterfall? Who thinks this works, looks like waterfall? Okay, there's a note up there that says for every feature or patch or whatnot. I hate to argue with you, but the fact is there's a concept in the goal by Elia Gulrat, a fundamental primary book that predicated a number of elements that are acclaimed at DevOps Handbook, right? The Phoenix Projects. This was fundamental to that. The goal was fundamental to that. And one learning that he had along the way was there are, there is such a thing as dependent events. You cannot deploy something that you haven't planned. Is that not obvious? So the difference is, waterfall used to be this huge bucket of stuff. And now we're trying to micronize that. We're trying to turn that into small chunks, small batch sizes. So last thing, operational experience. If it's slow, what is it? Yeah, for ops people too. If it's slow, it's broken. The first thing we have to do is dispel the myth that there is 100% uptime. There might be an aggregate, but it's never 100%. And your business leaders have to understand that too. Who is a manager in this room? Okay. Have you had luck selling up the fact that shit happens? No. It's hard. It's very hard to convince the business to say that it's okay to have a failure budget, but in fact it is. And in most cases, we'll start with this one. That's what most people think we do. We have a back room pet-a-server, and it's like, yay. Oh. The other thing is, I heard this recently on site. There was a thin wall behind me between what we were trying to do in one room and what some other loud business, clearly business group was trying to do. And I asked them, what is that? And they said, well, it's our change advisory board. One of many. And I was like, oh, cool. Tell me more. And eventually they got to the point where they said, reboots have to go through the cab. Okay. That's fine. You know, there's a lot of mainframes out there. No, no, no. All reboots on AWS accounts? Yes. There's a point at which if it's actually ephemeral, you might ship your stuff up to whatever it is, the cloud, but if it's not ephemeral, if you can't cycle servers in and out and manage that load, what have you really done? All you've really done is turn the problem of figuring out what your capacity needs to be and owning the hardware to what Adrian Cockroft would say. It's not my problem. That's what hard drives at Netflix was. They realized they could use AWS. It's not my problem to manage hardware anymore. Okay. Well, kind of it is in certain circumstances. So think about it. If you can't reboot a server, it's like, have we really gone ephemeral? Have we really gone dynamic? One last thought about this. I continue to kind of drill in on some of these questions, and they say we must have 100% confidence. And in reality, it turns out to be 100% confidence. It's because we have highly coupled and dependent systems. We have very coupled systems, and those are complex and highly coordinated rollouts that we need. Does that sound familiar? Complex rollouts? So therefore, it hasn't necessarily... All hands are on deck. Automation has been a second thought to getting stuff to work properly. And therefore, that leads to the inability to imbue steps reproducibly. It's just the time spend problem all over again. So I think that a lot of this comes from business leaders not understanding that things don't have to be up to 100% of the time, that there has to be the concept of an error budget. And again, it's problematic to push in an error budget to an organization that just wants your velocity to increase more and more and more, right? Does anyone know what I'm talking about here? I think it's worth digging into for a second. One or two people, you know? People say, ship the feature. Don't deal with tech debt. Tech debt is your problem, not mine. So that march of acceleration and innovation features, features, features. They're not even using the features we gave them last month. So that also has a negative impact on this. But ultimately, one of the things I do is when somebody says, oh, yes, we used the docker, I say, okay, how? How widely is it adopted in your organization? Oh, well, some devs are using it. Oh, what about the test organization? So this is where we actually start to go back and say, the book, a book, a really good book that talks about site reliability engineering, you don't have to be a unicorn to learn good things. And this is one of those clear quotes. And when I look at this and I look with my performance eyes, I'm like, damn, there's a lot of things in here, availability, latency, performance, efficiency. There's a lot of things in here that are under the wheelhouse of what used to be performance. And now it's kind of baked up. There's a lot of things that site reliability engineers have to deal with that aren't the traditional performance engineering crew. It's more than that. It's process improvement. It's efficiency across the entire pipeline. So too much text to read, but I think it's important to say, I wish I could give you specific numbers. I think the only specific number that I think on a regular basis makes sense is if your servers are at 80% capacity of anything, you got to stop and figure out how you're going to deal, how you're going to push in the next server. How are you going to rotate in more bandwidth? Other than that, it really depends. Your numbers are going to be different than his numbers, different than her numbers. It depends on context. What I would say is that there are some clear signals. They're called the golden signals in that book. And the goal here is not to just measure more service level indicators. We don't need more data most of the time. What we need is to align the organization and the objectives of our product managers and the rest of our team to the things that we're supposed to measure to begin with. Probably some of your measurements today are completely useless to decision-making process. Hopefully that's not the case. With the difference, we need to manage the difference between the service level objective that's supposed to be the team and the organization, their responsibility to define for you before it gets to production to begin with. And then cycle through and say, hey, how does production inform us afterwards? So the service level objectives, I think, is where a lot of the gap is in terms of thinking. And there's a couple of things that you can ask, to start seeding that thought process into your PMs, who usually take no responsibility for the performance of things because they're not devs and they're not ops. Whoops, what about planning? So when I mean dynamic provisioning, right, I have to deal with how long ago did you say five minutes? Oh, great. Oh, good. So there's a couple of things, right. What I've heard is that operational expense is a big deal in organizations. CAPEX is also a big deal. Balancing these two things is very important. There's a lot of reasons why. Go talk to somebody who has to do that, right. Get some context if you don't really care or know right now. Get some context because the push on operational is to reduce the operational expense, which ends up being that what we have to do is provide on-demand services. And that goes across the entire pipeline stack. We need to be able to provision all the testing, all the environments on and off when we need them. If only I had you. If only I had you about half an hour ago. This would be awesome. So Adam is an AV sort of expert. Anyway, so what that means, right, if we go all the way through is that, yes, of course you need to have dynamic targets. Yes, we can spin up a new stack on demand when we need to. Great, if you're lucky, right, if you get some main frame time, if you have that or not. Also, you need to have dynamic test data, which needs to be extracted maybe from production, cleansed, and then integrated into your testing harness. You also need to be able to provide dynamic warning and failure flags and update those as changes are made. They typically call them SLAs and tools, but that's a misnomer anyway, another thing in the book. Also, you need to be able to provide distributed load infrastructure, which is particularly my wheelhouse, what I do during the day. And also, you need to be able to provision this and put this into pipelines. So you need to be able to spin all this stuff up and then imbue that in a way, like, for instance, Jenkins files or Circle Orbs, Circle CI Orbs, in such a way where you can execute this stuff and then tear it down. It's really reliable, otherwise ejected right out of the pipeline. The real-time test results too. I'd much rather see halfway, even maybe half a minute into the test, if my code is on fire because I did something stupid and it doesn't scale. Or worse, somebody put something in there, like new test data that borks the whole indexing in the database underneath. So just to wrap up, anytime you want to contact me and we'll share sources, we'll talk about your challenges, I'm down, right? That's why I created a liability company last year, just to have conversations, just to be able to help organizations figure out what's important to them. And that's logistics. But in the meantime, I post regularly on Twitter, for whatever reason. And I also write occasionally, and there's one that's really good that I did earlier this year called well, I wish I could have called it devops testing for dummies, but I did that on my personal blog. What I had to do for the dev exchange for Capital One was call it something else. So it's pretty much the same thing, but it goes through, what does it really take to think about testing if you want like an 11-minute read so that you can kind of go, oh, wow, that's what we lost. That's what we left behind by firing all of our testers and doing something different, but that's an introductory to the way that test engineering would think. So if you haven't done that, I definitely encourage you to. Or just refresh yourself on that. So, thank you for letting me spend your time. Thank you for spending your time with me. And there's plenty of ways to connect with me. But the easiest way is just go right up to the open spaces, and I'll be there for the next hour or so. So thank you.