 All right, ladies and gentlemen, we are gonna get started in just a few minutes. Now's a good time to take those seats and we'll kick this off in just a few minutes. They're doing a final run audio on the video feed so that those that are not here with us in person can actually watch and hear what's going on at the conference. Are we looking good, sounding good? All good? Okay, keepin' goin', keepin' goin'. Just gotta make sure it all works the way we expect it to. Okay, it wouldn't be Dev Ops Day LA if we didn't start with a few AV issues. First thing in the morning. So, we'll give the AV crew a few minutes to get squared away. Are we all good? Oh, you just need me to keep talking. I don't know that I actually like hearing my own voice but we'll go with it, because it's a thing. And where did I lose? Okay, we've got people around. There are wonderful seats up front. I swear, Kote does not spit. I've seen them speak a few times. It's fantastic. So, please make sure you're making your way up towards the front to engage with our wonderful speakers. And test one, two. I'm just gonna go ahead and just start doin' my spiel and then we'll go from there. All right, so welcome everybody to, I've lost track of how many times we've been doing this. Scale's been around for 21 years. I think that means Dev Ops Day's been here over 15. This is one of those huge passions for myself. I love seeing the community come out. I love seeing just the awesome opportunities that come out of these conferences and the chance to network and connect with folks and hear some great talks from people in our industry and local folks as well. So, couple of housekeeping things. First off, am I good? All right, couple of things. Number one, during lunch, we are gonna have an unconference space similar to what we did last year. You will want to meet at the tables. Muriel, if you could raise your hand in the back, is gonna be coordinating that unconference space. Additionally, for those that don't know, there is a piano up on stage. That piano is here for upscale. So, we are gonna have live music, a first time at scale, at 630, going into the normal upscale program at 7 o'clock. And with that, if I could bring our first presenter, Kote to the stage. Hello, thanks for having me here. I was here last year and it was fun. Well, I was asked to do a brief talk at the beginning of this about the state of DevOps. So, that'll be hopefully just 10 minutes. Otherwise, I have too many slides in the second part. Which is, this past year, or not this past year, I decided to come up with a talk about like why people don't change. Because I'll get into, that seems to be a common complaint. We have all these fantastic ideas, but those people over there are being sticks in the mud. So, what's their deal? So, I think it's nice to kind of understand what's going on there. Well, briefly, here's me. Normally, I'd be wearing that shirt, but my dog actually rifted up after about 15 years before I came here, which is tragic. I don't know what to do about that. But I'm pretty lucky over the past, almost 10 years or so, I worked at Pivotal and then VMware and now, as we say, VMware Tanzu by Broadcom. And I've had the chance to really talk with large organizations and people there and management, developers and technical people to put it in a very basic way about how they're getting better at the way they do their software, their apps. And everything involved in that stack up and down. And I've written a few little books about it that you can get mostly for free if you go to my website and hunt them down. And then I also do a podcast, software-defined talk and some other ones. And I don't know, I've been an analyst at Redmonk. I did M&A at Dell and I was a programmer a long, long time ago. And now I make slides. So, as one of my friends said, I used to be good at my job and now I'm good at PowerPoint. So, I'll give you a brief idea of what I think maybe not so much the state, but very definitions of what people think DevOps is in about now, essentially. And like I said, I spend my time talking with mostly like management and executives and people like enterprise architects, things like that in very large organizations. And it's, if you're into this kind of thing, it's pretty thrilling to see what their idea of DevOps is. Especially if they're trying to apply it and think about it. So, I mean, the state of DevOps is that it's just part of what is there. It's part of the supply of life and existence. So, it's fine, right? It's sort of like asking what the state of the English language is. But I think more importantly how people use DevOps and how they're planning on using it, what they think about it, so that you can choose that and kind of adapt to it is the more interesting part of it. So first, well, that's unfortunate. So first, this is perhaps the most common thing that I encounter and that's thinking about DevOps as a centralized developer tools group, right? Which is a long way from the beginning of DevOps, which was like configuring your servers, essentially, to maybe put it down too much. But you see this over and over again, especially with the people that I talk with. But then also, there's actually even, you know that your community, your market, your tool chain is stabilized when you have a multi-year Gartner Magic Quadrant. And this view that they have of what DevOps is, is in their, what do they call it? The DevOps Platforms Magic Quadrant. And you can kind of read through there that if you have kind of a developer mindset, you're like, oh, it's everything I use except the actual programming language, right? And the kind of lesser version of this is really like the pipeline, the way you build and automate and deploy applications. And that's, this is again, this is all anecdotal, but this is really what I encounter the most when people say they have a DevOps team or they have DevOps engineers or even more thrillingly, they're introducing DevOps and how they're doing. They think about it as managing these tools. Now of course, I think this is what most insider people in the DevOps community think that DevOps is, or I should say, think of as DevOps. And that is to think about DevOps' culture, right? And if you've been in this community long enough, you'll remember, maybe, I forget the exact year, but they started doing the DevOps report as a, if you put out reports, it's a great form of marketing. But at some point from Puppet, the DevOps report shifted from a good sort of lead genie marketing thing into an actual report. And it really moved very quickly into talking about not the tools, not things like that, but the kind of culture, the way that people think about how they do their job, the way they kind of build up the human dynamics, all of that kind of group stuff. And you know, of course, this is kind of like the hallmark along with that chart that you see in the DevOps report every year, which I always think of as the awesome people or awesome chart that shows you if you're following all the good DevOps practices, you do great at your company. I always want to see the data about connecting it to revenue generation and share price and things like that, but I haven't really come across that yet. Not that it would diminish things, but it just would be interesting to see the relationship between those. And again, this is like the really DevOps-y DevOps people think about it as this way, right? This is where you get these ideas of like, it's not about the tools, it's about the culture, and it's all about the people. And I'll get into a little bit of this with the fearing change thing. Now, I also always like to point out, can you guess which one of these you're supposed to be and which ones are bad? It's like one of the funner things about this Western chart. So the next kind of view is that a lot of what DevOps is is kind of the opposite of the culture thing is really about the tools, namely getting the tools to be easier to use. And this has really emerged, I think I was talking before this talk, I realized you can't use this. Here's a pro tip about presentations. Like I said, I'm good at PowerPoint. They say many things, but they say you're not supposed to have a slide that no one can read. And I say they're wrong because then what you can do as a speaker is tell people what's in the slide and you just got this chart to pretend like you know what you're talking about, which I will demonstrate now. So what you see, this is a survey that we've done about Kubernetes at VMware and now VMware Tanzu by Broadcom. And you can see the way that you read this chart is that these are benefits people have gotten and the top one is the most recent. And you see that the benefits have been going down for using Kubernetes, right? So, and I think there's an explanation there. As more and more people use Kubernetes, as it goes more and more mainstream, you have more people using it. So there's various degrees of skills and tolerance about putting up with the idiosyncratic stuff and just the nature of Kubernetes. So you get more people who are encountering a lack of benefits from it. And I think another view of DevOps that I encounter is that it's all about fixing this or kind of filling out, making YAML easier to use, I guess, and making it so that application developers spend less time worrying about this, making it sure that in larger organizations there's common governance over, just making sure it delivers on the dream. And I think that emerges, this is just a quick side note, right? Like it's to some extent, it's been weird that this has happened because if you remember, the Kubernetes people are always telling you, this is not for you, you shouldn't be using this. This is the platform for building platforms and sorry about that. But I don't know, it's not only us DevOps people that as a kind of global IT community, we seem to tune that out and instead think like, yes, this is for me. I wanna build on top of this and work it and not just have it be a lower level thing. But I think a lot of DevOps people have come in and kind of filled out those hopes that you can kind of make it easier to use and get the benefits that people are talking about it. Now, I'll get into this a little bit further in the second part of the presentation, but this is a little bit more of an emerging thing that DevOps is and that is, it is kind of that thing above whatever your infrastructure layer is and thinking about, well, if I have application developers, they're gonna need to use this infrastructure and I would like to make it easier for them and I'm gonna build a platform on top of it. Now, of course, everyone's probably seen this. This platform concept killed off DevOps so I'm not sure why we're here having this conference. Maybe it's just for good times. We enjoy talking about the old days. We're doing sort of reenactments or something. But I think the people at Humanitech and other people kind of backed off of this DevOps is dead thing and introduced this platform notion. But I think generally this is where I see kind of, let's say intermediate to advanced DevOps people getting more involved in is building up and managing this platform that developers start using more. And this is, again, I'll come back to this, but this is great. This is from the Cloud Native Computing Foundation from last year and it's a diagram of what a platform is. It's really great because since I work at VMware Tanzu by Broadcom, I often get asked to give talks like this and they say, don't give a vendor talk because everyone's head will explode. So now instead of using our architecture, I can use this one and they all look the same. So we have a great one, but I'm not gonna show it to you here. But you can see the platform basically is all the infrastructure stuff which doesn't really even show up and everything that you need to manage your custom built and custom run applications. And so those are four things that I've noticed DevOps being in recent years, right? And again, I think the state of DevOps is it's fantastic. Everyone's done a great job, so give yourself a pat on the back or however you like to congratulate yourself. And now going over those four things and maybe other ones, I think it's great to think about like, so what am I doing? What do I wanna evolve to? How do I wanna change the way that I do DevOps? And more importantly, if I'm in a situation, you gotta be really careful when this comes up because you don't wanna be the kind of annoying person who's like, you haven't read the correct passage and you're wrong about what it is. But as you're introducing DevOps into larger organizations or any organization, it's good to have an idea of what that might mean and what that is. So next, the kind of main thing I wanna go over is this idea of like why people don't change and why they fear change, right? Like what is the deal? Now that let's say you pick one of your four meanings of DevOps and you're like, we're gonna go in there. Well, I'll give you the extreme case. We've got this mainframe system that our big insurance company runs at its core that makes sure our business functions. And like, I don't know, we don't like that anymore for some reason. Usually it's about cost and people retiring or whatever. And so we wanna introduce whether it's a DevOps process, a mindset about it, or we wanna introduce a cloud-native way of thinking about things. We wanna introduce something where our developers can start deploying every week, right? The original DevOps dream, which I think is a great dream to have. The more frequently you can deploy your software, the more feedback you get about what works if you're solving the problem, the better you can make your application and if you're doing it right, the better you can run your business, which a lot of people in the technical world don't care about how well you run your business until the business is running poorly and then you gotta find a new business that's running better. So that kind of chain, I think, is very important. And yet, this is what people encounter all the time, is like, I've read all of the books, I've seen the talks, I saw this guy, Kote talked about it, he talked about how it was awesome. And so why aren't people just changing to it? What's the deal with them? So what I wanna go over here is kind of understanding why people might not want to change and then my current theory about who the culprit is of why this change isn't happening. So first of all, if you read anything about change management, you know that any sort of corporate change management, regardless of YAML or anything else you're using, fails about 70% of the time. This comes up over and over again in the literature out there, the literature. Whether it's from management consultants to academics or things like that. So it's pretty discouraging, right? And it definitely is something people pay a lot of attention to. However, what's encouraging is that if you go research this stuff, you go look up where did this come from. There's actually, I think he's a British guy and he's done several thorough studies that basically says that's a bunch of crap, right? Like we have no idea how frequently things change or how frequently they succeed. And just to give you a little preview, if you dig it up, that 70% figure comes from I think maybe 1991 in Cotter and Someone's book. And if you read the passage, they're like, in our experience, I don't know, it's like 70% of the time. And there's no real sort of like studying or something behind that. And it just has propagated through the decades. So what that means is that the first kind of way of getting over people not wanting to change is that like, we don't know if this is the case because it hasn't been studied either, but in my anecdotal experience in 2024, maybe we can propagate this decades in the future, like organizations do kind of change. It's not exactly how you want them to change, but it happens, right? Over time, things do occur. It's not a impossible hill to go up. Now, let's figure out where this resistance to change comes from, like what the deal is. And since in the DevOps community, it looks like most of you here are old enough to maybe have encountered this idea. So I wanna explain the lore too much behind it. But we like our Star Trek analogies all the way back to that velocity talk, as you may recall. And I think it's interesting to have a path of there has been a lot of change driven by DevOps. So why is there this notion that people still aren't changing, right? Now, originally, way back when, it was like we gotta get these two people, developers and operations to talk with each other. And here's how it happened at Flickr. So why wouldn't it happen this way at a 100 year old insurance company in the future at some point? And I think this made a lot of sense. And I think indeed we actually have shifted over to this a lot, right? Like now, of course, there's still plenty of infrastructure and operations people who don't talk with application developers very much and application developers who think like, why would I need them? I can build all this on my own and all sorts of nonsense. But it is a lot better than it was in like 2010, 2008. Like the notion actually exists. Things have gotten a lot better. And now over the past three or so years, we also had this idea. This change is eking out a little bit more slowly, but it is something that I encounter in the organizations I talk with. There's an extreme amount of interest in this dev sec ops idea. I don't know why sec got put in the middle. I don't know if anyone's ever covered the kind of positional thing of that. But I was just talking with someone, a big bank processing company late last year and they had been putting this notion into place that, and then even last week, or this week, I've lost track of time. But there is this idea that the next bottleneck we have is security and compliance. And we're very aware of this bottleneck as something getting the software out, the apps out. So we need to solve that. And there's a lot of great work going on there that to solve this issue. So there's some resistance to change there, but it almost might be an instance of Planck's law or notion about this is that eventually, I think Planck's law is a little more morbid, that change occurs in the sciences as people die. But eventually the security people retire and drive around in RVs or big Harleys or whatever security people do. I'm an application developer, so I have a lot of weird stereotypes in my head. Surprising I'm wearing pants instead of shorts and flip flops, I guess. Now, of course, speaking to our Star Trek thing, I guess to my point, the thing people forget about the security people in Star Trek is they tend to get eliminated. So that's a bit of a threat, I guess, for them, but something they should be aware of. Make sure you're wearing the correct color shirt. Now, let's get to the culprit. Like when I talk with large organizations, this is what actually ends up being the barrier to change. You go to the business side. The people who are running the organization, whether they are senior executives and management up in IT, or they're actually people from finance or they're the ones who own the businesses and run them. And if you work at a nonprofit like a government, I mean, just re-write the words I'm using in your head to mission and helping citizens and things like that. But my experience is that they're the ones who have this really kind of, what's the opposite of virtuous? Like a bad vortex. I forget the word you used there, a vicious cycle, where they want things to be better. And they talk about speeding up the delivery cycle and making things more secure. And then they're up on a stage. They talk about that. They might even give you some books, some O'Reilly books just to come up here before you leave and take it from the box. They're saying, I'm not saying that. And then you go down and pick up the book and you look up and you have a question and they're gone. Like they just really haven't shown up. And so my current theory is that it's management, it's executives that are kind of like, not the final, I'm sure there'll be more, but they're the current bottleneck to get through, to introducing things like DevOps and change. And so usually with the room full of this, these notions go very well because most people are not these people and we really like making fun of them. So now we can proceed into that. But more generally, I think there's kind of empathy and understanding of why management is a bottleneck, but maybe you can start to figure out how to apply these changes and get through these bottlenecks in your world. And so what we'll do is we'll dive into the wonderful world of management. And since I'm in California, these are actually stills from a early 80s California Institute of Correctional Facilities building planning meeting, which I got out, which I guess that's an interesting metaphor to use for how you're building your IT systems, but man, look at that guy's hair and the glasses, it's just like perfect for how things were running when I was a kid. So here's the first issue that comes up with executives and management that I think causes some bottlenecks for things to change. And this is a notion, there's a joke that one of my coworkers used in looking at how an organization would transform. And their joke was that like, well, the amount of transformation you're talking about, the way we're gonna change how we're working, the budgeting, that's really a one and a half CEO job. And this comes from the notion that CIOs change around every three or so years, right? Now, that's not exactly accurate. When you look at it, at least with one study, the average CIO 10 years is about 4.7 years, which is, I'm a philosophy major, so I can't do math unless it's in Greek letters. But I think that's more than three. However, if you compare it to the CEO 10 year, that's more like seven years, or you can think about yourself, your 10 year at the organization you work at, right? And so there is this rate of change that does cause maybe the people in the organization to be like, that's nice. I'll talk to the next person when they come in here too, right? Like, you don't really, oftentimes when an executive comes in, they have new plans, otherwise why would you have a new one and they wanna change the way things are operating? But there's almost this mistrust that you start having. So I think it becomes really important for management, for the, for executive to sort of convince people about the stability of their desire for things to change. And then also when a new person comes in to maybe not just like upend everything, right? To kind of understand where you're moving from because again, I'm sure the organizations you work in are fantastic and enlightened and over in that right part of the Western stuff. But lots of people I talk with are very jaded and they're just like, I don't think anyone's gonna change because they keep doing this every one and a half times. So the next thing that is a good perception to change and again, a bottleneck of people not wanting to change, not wanting to fear it. I think is often the perception in both directions that management has when it comes to managing IT people, right? Deciding where funding should be, deciding what groups, what grouping you should have. You know, if you think about the role of a manager, especially a high up manager, like you can't really do that much except kind of report on the state of the system and kind of decide as if you've got one of those big maps in a wooden pole where you're moving little things around, like you don't have much control over things so you have to reduce things down to metrics. Now, there was a great nerd fight last fall, if you remember, that kind of exposed this problem and thinking that exists in a lot of organizations around developer productivity, which I've been trying to think about like is developer productivity, is that even a thing? Well, of course if you're a developer it's not a thing, no one likes to be measured. I'm not sure if you've ever encountered this but it's fine for other people to be measured but when it comes to metrics for my job, that's impossible. What are you even talking about? Exactly. So as far as how I'm doing in this presentation, can't be measured so it must be great. But this exposed this type of thinking that I think management still hasn't come through especially in non-tech companies and this was put out by McKinsey and the whole point of this was that they say you can't measure developer productivity and yet you need to because you need to allocate resources and optimize which is to say get rid of people and decide who to get rid of. So we propose the following ways of measuring developer productivity even though those wacky developers keep telling us they're impossible to develop and we don't know what we're talking about. And so you can see most of the ones in red are the McKinsey ones and you can see that they're about activities that people are doing and evaluating the kind of worse of the people there. And this is very common when you talk to management people because it's how they think. I don't mean in a dismissive way but it's data that they need that drives decisions they have to make about resource allocation and focus. But of course, I think as our friend over here said that it's true, what you do when you have a more thorough study of how measuring developers measuring people like yourselves really what comes up over and over again. We keep coming up with not very complicated but different ways of describing it is like one of the better ways to measure productivity is to go ask individuals if they're happy. And if they say yes, they're probably productive. Like you kind of don't have to mess around with very many other things. There's all sorts of internal metrics and measuring yourself. Metrics are great as long as they're your own once they go beyond your team, terrible. Don't do that. But you see this in some of the more recent work on the, I guess it would be your right side which I think is interesting. There's a lot of the sort of like the children of the DevOps report, reports that have been coming out in those, the other metrics that are great. But if you look through these, there's the classic kind of Dora and other metrics you're measuring output, if you will. But there's also a fair amount of metrics about like are people just happy? Are they thriving? Are they in a good environment? And getting an executive to kind of do things in that way takes a big leap. And until you've shifted them over there, of course people are gonna fear change. Because again, the metrics that are being gathered are all about who's not gonna work here anymore. I guess on the positive end, they're about who gets a bigger bonus and promotions. But anytime you encounter a metric, it tends to be not great for employees. So then the next mindset for executives is just a little bit of like, you can kind of figure out this puzzle, right? So that CIO comes in or a higher level executive, they feel this urge to change. And let's just look at incentive as an overused word. But let's look at the reward structure. So this executive wants to change things over. And if you look at the risk-benefit analysis, if they don't actually change, their compensation stays the same, they actually understand that if you don't change the way software works, the business is gonna not function well. And so to them, the risk of not changing is high because the business is gonna go poorly. And so it's not good if they don't change. Now often, if they do change and they're successful at changing the organization, they're gonna get famous, they'll get a promotion, they probably have a lot of equity in the company so their share price is gonna go up. And so while transformation, while changing is high risk, the payout is very good for executives. So they're incredibly motivated to change, right? Now if you look at an individual, that's not so great for an individual. More or less, if you keep doing the same thing, you'll get compensated. If it doesn't change, you'll probably still have your job. Maybe the company will do something else, someone else will come in. But not changing, nothing generally happens. You keep kind of chugging along. And so that's kind of a great outcome. Stability is nice. Versus if you do spend time changing, are you gonna get paid more? Probably not, right? Like you probably, if you're in most organizations, you don't have a huge chunk of equity. If you know what ROSU is, you're not in this position. But you're not really gonna get rewarded for it, except maybe like a cool hot dog festival or something like that. But the risk of changing is huge. Like you're moving to doing things in a new way, being with new groups. So it's all risk for an individual. And so really, is that valuable? Like probably not. Now here, I think it's pretty obvious what management needs to change, right? They need to have a reward for actually going through all of this change, which is kind of an impossible thing to ask. But it is like nice, right? And this is another, again, going over this is like, it's a bottleneck to change because why would anyone change? There's really no payoff versus what you're already doing. I guess, as technologists, we get really excited about new shiny objects and things. But again, culture is the real problem, not tools, as I've been told, which I don't really think is too accurate. But so you might wanna learn new things and do stuff, but that's not really what a lot of large organization changes. Now the final thing, or one of the final things, is that I think one of the bottlenecks is just management knowing what they're doing, right? And like I was saying earlier, you can't really do that much at a high level because you're not on the front line, or whichever line, metaphorically speaking, doing the work. And that was highlighted really well at a talk at DevOps Days Dallas a couple years ago, where I think he was like, I don't know, at some point, the chief architect of Lean or something at Toyota and now a consultant. And he was saying just a week before that, he'd gone to this corn distribution factory, which is a fun analogy. And there was a new CEO who of course was the son of the former CEO. And in order to optimize that, so you got a new executive coming in, they wanna make changes, make things run better. The first thing that this consultant did was had the CEO go work on the line, sorting corn, and I bet you can guess what happened, like within 10 minutes there were many changes made about how the work was performed, right? So it's a little bit to ask to have your executives go in and like configure servers and program, but they need to think about like, maybe the people doing the work know how to optimize it. So I should let them do it or kind of have a notion of doing it. And yet a lot of times people come in and there's not a lot of grounds up desire to change things around and you get this corn sorting problem where the executive people have no idea how the corn is actually sorted. But that creates another bottleneck where just the ideas aren't too great. So then the next thing I think is kind of a lack of, and this is more in the IT world, kind of a lack of vision, if you will, right? That an executive has. Now it's kind of nice to think about something like this as a vision, right? Like as the goal that we have to simplify things. But this is not actually what this person from DBS Bank was saying. I always like to bring this up because it's a very good, I think this is actually a function that an executive can have. Like what she was actually saying is that something that I think is a lot more tangible for DevOps people and developers is kind of setting this mission about what's the point of us doing this, right? You're trying to rally people being motivated to change, not just optimize what they're doing, right? And what you can see here, again, I think is a great goal, they're a bank, and I don't know about you, but after this talk, I'm not gonna log into my bank and just kind of hang out with it, right? Like I don't really wanna spend a lot of time interfacing with my bank. And so if you have a notion like that for whatever organization you're in, I think it really helps guide you a lot in the changes that you're making, right? And oftentimes, to use a term of art, this vision is not that crisp, like the executives haven't thought through that in a way to motivate you. Now finally, I wanna go over another thing that is difficult to use, but I've noticed it's very important and very helpful. And that is that a lot of people, especially in the DevOps community, we kind of go too far on this notion that my friend Bridget wrote up a long time ago. And that is that, as I was joking about earlier, that whatever technology we use doesn't really matter, right? Which in general is kind of an interesting notion, but I think we lose out to remind you about platform stuff, of thinking about how do we change our technology around and maybe even how do we use the constraints of a technology to help drive or even force technological change, right? And if you have this notion that technology is easy and people are hard, then it's kind of easy to ignore selecting the technology you use as a way to force that change. And also as a side note, we're probably all technologists in case you just came here to pick up some free swag and stickers. But just a pro tip, if you're in technology, you should never tell people technology is easy. You should always tell them it's extremely difficult. You barely can figure out what you're doing because it's so difficult. And therefore that's why you should get paid a lot, right? Like you don't ever want to tell people that what you do is easy and can be achievable. So remember, people are hard, well, outside of our community, so tell them that people are difficult, sure, but technology, woof, even more difficult. And I'm gonna need a raise to deal with it. So I see like, when I see organizations put this platform in place, what they're doing is they're saying this is how they, we used to call this an opinionated platform. They're actually deciding this is how we operate, right? This is how you configure things, how you deploy them, the middleware that you use. And I think that can kind of, if you pick the right kind of platform, it's an expression of how you want to operate, right? And I think it's worth spending a lot of time thinking about that when you're picking the tool chain that you want to use, the stack that you want to use. Now, the issue becomes, how do you build a platform that people want to use? And this is a platform owner at Mercedes-Benz. And they have this notion, which if you've been following the, as you were called, DevOps is dead, it's now about platform engineering, so hopefully you've been learning about that. And if you've been following what's been going on there, there's been this reemergence of this idea of platform as a product. And that is managing the infrastructure or the platform as if it is, can you guess, a product. Now, the thing about a product is hopefully you have customers. And in that case, your customers in general are the application developers. And so when you're product managing your infrastructure, you know who your customer is now, the application developers. And it's a good idea to go ask, well, what would you like, right? And instead of just thinking about how you are building up the infrastructure, how you're delivering it reliably to whatever kind of SL thing you want to use and okay, what nots and all the in B somethings, like whatever all those ways of thinking about it are, if your customers don't want it, it doesn't matter, right? And so this is a notion that I think is genuinely kind of new at least in widespread use is to, as you can kind of do, think about here, go to your customers and ask them, so if you don't want to use my platform, what do you want to use, right? And really product manage that. And probably in this community, like people know what product management is, but the great thing about product management is it's very well understood. Product management is easy, people are hard, right? Like you can look it up and like get a lot of instruction about it, I mean there's ways of messing it up just like there's a way of messing up frying eggs, but like it's really hard to mess up because it's very prescriptive and there's lots of proof behind it. So this is an example, this is a survey that I made sure we just made for free, but you see people who are building, who have that kind of DevOps as platform mentality, putting up, as you can see, figuring out how to connect together the network, but they go out and they ask the application developers what they're struggling with, what they need help with and that generally, if you ask people what their problems are and you solve it, they generally use your technology and get beyond not wanting to change, but it requires that product management approach. And you know, it generally works. Like here's my ongoing collection of technical things of putting a platform in place. People are always kind of suspicious of a platform, but there's all sorts of great proof points over the years that it works, not only for technical effects, as you can see, but again, all the way back to the business side of things, just achieving whatever your organization wants with it. And then, if you're really interested in platforms, we have a great cloud foundry one, we should check out. You know, I always know you're not supposed to talk about how you get your bills paid, but I was at configuration management camp the past two years and I realized no one follows that rule, they just suggest the solutions that they have. But if you want a notion of what a platform looks like, you can look up that reference architecture, there's all sorts of platforms out there and they've been built up over the years to really be that here's a tool you introduce to get beyond people's notion of not wanting to change if you product manage it, right? If you on go think about how you build up and provide this platform. And you know, I think it's a good way to get beyond that fear of change. So hopefully that gives you a notion and some understanding of if you're in an organization and it seems like you and your friends or your coworkers have some notion of what would be great. You're getting this push from above to do something different and like seemingly month after month, year after year, things are just the same, right? Hopefully you have a few things in a toolkit to go and kind of diagnose what the issue might be and try to understand how to get through the bottlenecks and the barriers that you would have. And with that, thanks for having me. I couldn't really see how much time I have left but I'm happy to answer any questions. Great, I've fully informed everyone. Oh, sure. Any questions out there? It's a great thing about no questions is you don't have to come up with answers. Oh, there we go, yes. You were smiling a lot so I'm gonna ask you first. Sure, so to summarize the question, I mean you can tell me if I'm summarizing it incorrectly. There are other forces in the organization, finance, maybe the corporate stewards who they wanna know how this is gonna be good for the business and when it's a business that usually means, well, two things, not losing a lot of money suddenly and then also making more money. You're just kind of being level is cool but it's really not, they would like it to go the other way. So, well first of all, it's a difficult problem but I think what I've seen in organizations is there's two things. One, whoever is the champion of this to use that phrase, they have to set a timeline expectation that it's gonna be a couple of years to put this change in place, right? And you need to do that because essentially, and I'll get over this silly saying, is like you've gotta be successful first before you change over. But what that really means is you need to have a few wins, so like two or three instances of how we shifted over this way to operating and it resulted in a good, this is a fun word, fiduciary outcome, right? Like it actually made the business run better and it kinda doesn't matter the size of that application, whatever that business is, you're just proving that working this way has a good result or the cynical way, you're proving that it's not bad, right? Which in the zero sum game of corporate politicking, that's kind of a lot of what you're doing because someone else has got a great idea and if they can like eliminate yours, they get to do theirs. I think of it as the, you know, turn off the oxygen supply in the room and hold your breath the longest is like how a lot of corporate stuff happens. So, you spend the first year getting success on a few projects, working in this new way and then you just market it a lot. Now, there's some other stuff actually in that changing mindsets book which is the only one you can't get for free but if you have an O'Reilly account you can get it or buy it, what a notion. There are some things if you get to some enlightened finance people you can apply a lot of the ideas that we have in the DevOps and the application development world that if you have shorter cycles you have better risk management because instead of a 12 month window of not knowing what's going on you can have like a three month window and it actually is very responsible to course correct instead of just like losing out on that but that takes a little bit more finagling maybe a lot more lunches with the finance people to figure out if they're interested in that. But yeah, I would just build up some successes to use the old phrase, you know, success is the best deodorant. You had a question? Yeah, so yeah, exactly. And I guess one thing I didn't mention here is a lot of my operating theory here as you can maybe pick apart is that management is like the programmer of the organization. Like they're the ones who are building that system and I think that the hands-off approach that we kind of want management to have an IT kind of plays against that notion because we can't change the organization as a whole so we need someone to come in and change it. Now, getting exactly to that point I think if you can't have financial incentives and compensation the executives need to figure out something beyond those hot dog parties to have as compensation, right? Whether that's like a nicer working environment, working less, people working on things that they're enjoying but you really gotta think about how do you incent people to go through the change? And I'll give you like one example, my, I don't know, to anonymize it's someone I know they were in charge of get this, if you work in a large organization you're probably not surprised but you know you've heard of auditors but there's actually auditors for auditors in organizations and this person is an auditor for an auditor and I remember one Thanksgiving this person had to basically not take off Thanksgiving because one of the auditors for auditors had screwed up and so they had to come in and fix that all up, right? So you know in this technical world a lot of what auditors do is like nowadays their tool chain is like Microsoft Office, right? So they're emailing things around and building up spreadsheets but moving people to like a more DevOps-y way of thinking you can go to the auditors and say like well we can automate a lot of this stuff, like you don't have to go and trust and talk with people and therefore work over like you know your 40 hours or your 30 hours or whatever, right? So you're figuring out how you're gonna improve the day-to-day life of the people who are changing rather than just like despite my spending time less with banking vision, rather than just like delivering on the excitement of change that's gonna benefit our shareholders and make sure that we can survive against the macroeconomic headwinds and things like that. But like how do you really relate that down to what an individual does? Now of course if you can do the financial stuff that's great too. I don't know about y'all but money always nice to have instead of just vision. All right and with that, thank you very much Kote. Yeah, thanks again. Any other round of applause? All right we're gonna be back with our next talk at 11 a.m. and it's gonna be lessons learned in hyper growth deployments. So now's a good time to take that bio break and we'll get started in just a few minutes. Ladies and gentlemen, we are gonna get started in just a few minutes. This is the one minute warning. We'll get started in about an actual minute. So one more minute and we'll get started. All right we'll go ahead and get started if I can bring to the stage our first sponsor Doppler. I can get a round of applause, thank you. Thank you everyone. So my name is Hamza and I lead the solution engineering team at Doppler. Doppler is a sequence management platform that acts as a source of truth for secrets all the way from local development to production. It's a single pane of glass that allows you to orchestrate secrets, have access controls built in, have secrets rotation, dynamic credentials, all those things within that single platform. So we have a booth out there, feel free to drop by if you wanna talk about it in more detail and yeah, thank you everyone. All right and our next sponsor, DNS Simple. Hello everyone, so my name is Simon. I work for DNS Simple. We are a DNS service provider and domain register. We aim to make DNS management at least enjoyable. I wouldn't say fun because I know that many of you wouldn't believe me. So if you wanna learn more about our offering, we have a booth literally in front here. We also have awesome stickers. So at least if you don't enjoy DNS, you can probably enjoy some of our stickers. So feel free to stop by and ask any question. Thank you. Thank you to both of our sponsors. We couldn't put this on without them. And just a real quick housekeeping piece. Please don't forget at lunch we are gonna do the unconference space at these white tables back here. Additionally, as part of that, I just got word that there will be some lunch around. So if you wanna join us as part of that, there will be food to snack on as we're going through. And with that, Andrew Phom. Let's see if this works. Okay, there we go. Thank you for having me today. It's amazing to be here in person. It's been a, I think I've done one or two talks since we've been doing more in person events. And also wanna say thank you for spending the time here today. I know time is a really valuable and it's a big time commitment to come to a conference and spend time listening to us talk. So thank you for that. So this rough agenda, I'm gonna give a kind of an overview of why I'm here. Why do I care about this problem? What hyper growth is like? Who's been through hyper growth? And I'll start there. Who's going through the exponential curve with one or two people? I'll kind of give an overview of that. And then some of the lessons we learned. And one of the reasons I really like this is that hyper growth gives you a really condensed view of the world and you can kind of extract some principles and kind of foreshadow that foreshadow where everything else is going. And then kind of wrapping it up why what these lessons taught us about intent-based deployments. This is me. I currently am a founder and CEO of a early stage startup. We are working on problems around production, specifically starting with deployments, but really our end goal is to make platform engineering teams before small suppliers. It's something I feel really passionate about. I've spent probably the last 22 plus years in infrastructure, maybe 25 years in infrastructure. I've seen a lot of things I guess over the years. And across these three areas of sort of the internet that I've seen, there's been one constant. This room is big, but any one one adventurer I guess is like what's the, just so you can shout it out, like what one constant over the last, say 20, 25 years of the internet has been. Anybody? Nobody? Yep? Growth? Okay, good, good, good, good guess. Bugs? Things break? Okay. I'll go with something even more fundamental, interns. So I'm actually here today because of interns. In 2016, I was running the production side of Dropbox. And I spent almost a decade there, but at that time I had focused on reliability and cost. And I remember the time our head of cloud engineering asked me, he said, Andrew, can you spend some time and talk to the rest of engineering? Just like figure out what's going on. I wanna understand what problems there are because we were really focused on getting gross margins down. And I spent some time just like walking around engineering, talking to engineering directors. And it was amazing. I, it was actually more like a gut punch, I shouldn't say amazing, because what's gonna come next is kind of terrifying. Three engineering directors told me that they had interns and they're one-on-ones crying. And a fourth one told me that a staff engineer that we had just hired quit after three weeks. And the common theme around this was that this idea of developer effectiveness, this idea of getting things to be super simple and lowering the cognitive overhead, it wasn't there. And for me, that was sort of like, like just light bulb moment. Like this is what I want to do. This is what I'm gonna do from now on. It's gonna be what I work on, whether it's at Dropbox, whether it's at a startup as a CTO, whether it's founding a company. And I sort of, I spent a ton of time just sort of soul searching to figure out, okay, what is that thing? And it's really that moment when like listening to this problem with the interns crying, solidified that for me and said, okay, this is why we do DevOps. This is why we do infrastructure. This is why we do platforms. It's because there's someone on the other side of that. It could be an intern, it could be a staff engineer, but they're trying to do their best work. And in this case, we are an impedance to that. And for me, that was sort of an untenable situation. That's why I'm here. I think you're here to hear about hyper growth. I want to do this in hyper growth. It's a great learning experience. I was at YouTube in 2008 through 2012. And this is kind of the growth curve that we looked at there. It's kind of crazy how fast and how many users were adopting the platform at the time. And I had started my career at AOL and was spending time and video. And the reason I joined YouTube was because I wanted to watch The Daily Show whenever I wanted to watch The Daily Show. That's actually the reason YouTube, or AOL was not making enough progress to that despite the fact they had Time Warner. And YouTube seemed to be a better bet for like watching The Daily Show whenever I wanted to. So I joined YouTube and saw it grow through about, I don't know, some sub-billion users to like almost a billion. And at the time, we were taking on 24 hours of video every minute. That number in terms of scale is, in a day we would take, we would upload more uploaded video in one day than the entirety of Hollywood down the street would produce in its entire existence. This is roughly 2010. So you can imagine sort of the scale they operate today. And so we had to solve a lot of problems. Like the tech that exists today didn't exist, right? A lot of the things that we had to build and had the problems we had to solve, we had to do them with a very much more rudimentary set of tools, which we're gonna talk through, but why they can sort of mirror where we're going with the world today. At Dropbox, this is kind of the growth curve there. This is the growth curve there. I joined 2012 and was there through 2020. We went from about 50 million users to 700 million users. So another humanity scale product. The, and we doubled our infrastructure every year. And when I say doubled our infrastructure, I think when I joined, we had about 400 databases. When I left, we probably had between 12 and 15,000 databases. And keep in mind hard drives are getting bigger and computers are getting faster. So like some of this density is compressing. We were building, to support the system, things, we were doing things like the database storage systems had to, they were fully consistent sub 10 milliseconds, sort of write latencies with cross charge transactions at that scale as well. And so we were building systems to handle that. But the system I think that taught us the most was a storage system. And so it's a storage system that started with zero bytes on disk. And over the course of a very short period of time, we moved one exabyte of storage out of S3 for our own infrastructure. And we did that in 24 months. This is, and when I say we did this in 24 months, this is turning up data centers domestically, building an international backbone at a terabit scale, qualifying, developing hardware for servers. What else do we have to do? We had to build a storage stack. We started with zero code, and we actually built a storage stack that is geographically redundant, has over 29s of durability. And that was in 24 months. And so there's a lot of lessons there that we had, we looked at, we said, okay, how would we look back and did this again, what would we do differently? Because we were seeing that, okay, there's a cloud thing happening here, there's a lot of things we have to extract from this. And so for me, the reason I enjoy hyper growth is this is sort of what the normal organization's lifespan looks like. You have this very long multi-year, you get to iterate, you have, who here has quarterly planning cycles that last forever and really the next one quarterly, I can see a lot of heads nodding, one quarterly planning cycle goes to the next quarterly planning cycle. You can't move an expeditive data in 24 months if you kind of go through that. You have a process that looks way more like this and you just basically try to fix the problem as it comes up. And so that was, that's what hyper growth feels like. It feels like this sort of constant hamster whether you're always on, it's unlike anything else I've experienced in sort of engineering. But I think it's a good segue over to what we think about as a foundation. And I think I like this because this is where I think the world is today. We're spending a lot of time on this foundational thing, on the foundational infrastructure. We're thinking a lot of around, IAC infrastructure as code, all of the baseline cloud provisioning infrastructures where things are today. We haven't even got to the application layer which is like the back half of this talk. But in 2008, I think like to use this is like, let's reflect. We said that this is kind of like what we wanted. We wanted something repeatable because we were gonna operate at global scale. We needed to be pool-based because we knew that we were gonna have things offline, online, we couldn't just rely on a push-based model and make sure everything's gonna get out there. It had to be in code. We just said like it's, by the way, the infrastructure teams at both YouTube and Dropbox when I joined were 10 people. And they were, I think only were about 150, well Dropbox about 150 now. So they're still very small relative to the infrastructure they support. So it has to be in code, it has to be reliable, I have a typo and it has to scale. This is what we did. We had a shell script that ran every five minutes with some splay on every single server, thousands of them. And it would sync down all the things that we needed to have synced to the servers and like all the infrastructure layer pieces. We would update them. We had a little CF engine in there but very, very little just basically because it understood how to change like syscuttles. And then to deploy code, we would literally run HD pool on every single machine like tens of thousands of them. Because this is pool based, right? They're all gonna sync to themselves and it's gonna be consistent. And then we ran everything under monitor. This supported and ran for us for a very long time. Tens of thousands of machines, daily deployments, hundreds of engineers, four nines of reliability and products at scale, humanity scale, right? And so there's a lesson here, right? We look at those three commands and we see what did they turn into today? They look more like this. They do exactly the same thing. There's nothing here besides we've just up-leveled this a little bit and given a little bit better interface to things, made it a little bit cleaner because now we're supporting, there's lots more people using this type of tools. But we're in a world today where what we, the principle here is like infrastructure code is good, right? Like this is a good pattern. This pattern actually will scale and so if you're adopting this pattern, I'd say continue adopting this pattern. Don't think there's another way of doing this and there's a better way of doing it. Probably for the next five to 10, 15 years this is actually the pattern that should be there. And it's a very reliable pattern. So I wanna start, like to frame this presentation with that because I'm not here to say like there's another way of doing it. No, we're just catching up to where some of these like large scale infrastructures were because there's a lot of crossover between where we were 10 years ago to what people are deploying in the public clouds today. So with that I'm gonna transition over to operating in the cloud. What did we learn there? Because I think this is where it gets interesting because I think most organizations, I think about the timeline of this, probably five, who's done a cloud migration in the last five years? Okay, last 10 years? Some, okay, who's not yet done a cloud migration? Okay, so like third is like, okay. This will be useful for all, I think all classes of this. Who's building cloud native applications that have never done anything not in the cloud and are doing like everything correctly in the cloud today? Anybody? One or two? Okay. So YouTube was a bare metal data center. It was its own cloud and it looked very similar to all the types of apps you would see today. Batch systems, databases, caching, there's nothing special there. It is literally a four million line Python application at the time. Every single one of these things, by the way, runs the same code base, there is nothing special. We just took the entirety of the Python code tree and put it on every server and we would just use a different entry point. Sounds very similar to Docker like in some ways. And then we had various like subsystems that would only the index servers or only the in-search like interface. But very similar to what people do today, we get acquired, right? And these guys show up, right? And now we have two clouds. And it's like, okay, well, what do I do? And one of these is really good at a certain set of things. Probably if you do multi-cloud, my guess is it's not for reliability. It's because there's a capability in another cloud that you like better than the cloud you're in. I'm seeing a lot of heads nod. That's exactly the case here, right? So we can now map exactly what's happening today back to what was happening 10 years ago. They're really good at storing lots of data. They also are really good at serving lots of very small thumbnails. By the way, that's a very hard problem. If you've ever tried to build a thumbnail system at YouTube scale, it is like one of the hardest problems in the world. They're very good at that. And they can like scale horizontally forever. And so what we learned is like, okay, we've got to build a tool chain now that supports both of these. Now the problem is this. This is the tool chain that we were running at YouTube, which is also very similar to the Dropbox tool chain. It was SVN up instead of HD up. And we just used system D instead. This became a problem. This is actually the biggest problem like multi-cloud management for us is the tool chains were just so drastically different and there was no good crossover of the tool chains. And I'm guessing if you're in Amazon and Google and you're like, I've got G-Cloud and I've got AWS, CLI, they look very different. Terraform makes it a little better, but like it's not great because not everything I'm guessing is Terraform and it's just a pain to manage these things. Well, that was the same for us. We had to learn just about every single command that's possible at Google to make this work. And we had to go all the way down to the bare metal in some cases. And so we had to learn how to create storage, create storage systems. It's like it was a big pain to deal with this. I left before we had to resolve this problem and I think that actually the resolution for this was they moved everything into Google and so now the tool chain is consistent. Still a lot to learn, but it's at least a consistent tool chain. For me, attempt number two was let's try this at Dropbox. Let's see what happens here. We had a very similar stack. Dropbox is no physical infrastructure. Amazon has a bunch of things that we used to use. So block storage was there, they had block of run ends, all the databases and web front ends were in our infrastructure. This is where it gets interesting because we talked about that big storage migration before. So we ended up with this. We said, okay, we're gonna move all the storage systems in the magic pocket to us. We're gonna do EU block storage because it can't go away because we've gotta handle GDPR and all these other things for geo-replication. So we have some stores still there. There's actually other infrastructure there as well, but for the most part that's like the big chunky things. And you're kind of looking at this like, okay, well now how do I manage this, right? Because we have tens of megawatts in power in space for managing ourselves. We have very large like say like tens to hundreds of millions of dollars of AWS bills coming in as well. So this is not like you can manage this as like, okay, this is I can assign two people to manage Amazon and it'll just be fine. You actually have to figure out how to reconcile the stool chain. And I think that if you're in multi-cloud a day, you're probably starting to face a lot of these challenges as teams are doing split brain things and there's a lot of cognitive overhead for it. This was our attempt. We said, okay, let's build a single interface. I think it really easy. We'll give one command line tool that everyone can use and we'll wrap everything. I haven't looked enough at Azure DevOps but I'm guessing this is like sort of their approach at this point right now. Bunch of pluggable things that goes into a one command line tool that like sort of fends its way out in the back. So we built something called MDB machine database which is more of like a service database to be honest, where you can have a web server or a server. You could add a tag to it. And this is sort of like what it would take on the persona it would take on. Nothing crazy here. We'd say, okay, and we give it an owner tag. This is the owner of it. What becomes interesting though is when we did things like this, we need to reinstall. Well, the service center really didn't want to deal with reinstalls or kernel upgrades or anything fundamental to the system. And it became a problem because they're like, we don't know anything about this. We don't want to deal with fault tolerance. We don't want to handle any of these things. Yes, I get it. Kubernetes makes it a little better now but I don't think it's like a panacea. So like this app team is saying here going, well, I own this thing and really I only care about the web server. I don't really care about anything about your underlying things, right? That's how we get to serverless today. And we have a platform team going, well, I need to reinstall that thing because I need to patch the kernel. And so this just ends up this like coordination nightmare around people trying to like convince the other team that they're the thing they're trying to do this week is more important. You have TPMs running around scheduling. And I will stand here and tell you like we got this entirely wrong. I think that there's hindsight being 2020, I think Netflix, and I'm gonna walk through this now, Netflix came up with a much better pattern for this. And they have a blog out there called managed. It's on a website called managed.delivery, I'll have them in the end, where they talk about how to separate this in a little bit more elegant way. And I'll walk you through that right now. So you have a person, right? They typically just wants to ship code. They don't care about anything else. And so when they start, they have something that looks pretty simple. They're just like, okay, I'm gonna, you know, out of GitHub Actions or whatever, I'm just gonna push my thing and it's gonna go to the staging environment, production environment, I'll call it a day. Like they may not even having two environments, like maybe just one. They're like, this is easy. And then somebody shows up, call it the DevOps person, the SRE. Somebody in the organization shows up and says like, well, we need SLAs, right? Because like this thing now has user traffic on it. We can't just like let it sit there. Well, who has these SLAs? Like the DevOps person? Is it the, is it the software engineer that threw this thing up? Either way, right? You end up with like, okay, now we're gonna start getting config bloat. And like everyone's like, I don't understand what the other person was intending to do here because I'm just trying to add my SLAs and they're like, well, I'm just trying to deploy code. So now you have this like competing priorities. Well, it gets even more interesting because the compliance team shows up and told the infrastructure people, well, we need another region because we need to have that thing for GDPR and we need to go there because the sales team wants it. Or I guess has anybody dealt with like tenancy problems where it's like show up and say, I want to run n numbers of copies of this because like our enterprise customers want these things. That's the other variation of this, right? And so now you've got this like massive configuration file for deployment that's super hard to understand. I just like squinting at this here. I'm like, there's a bunch of templating and there's probably a bunch of esoteric things that no one really understands what the intent was originally, six months a year later. I would love to live in a world where like everyone does what they do best. And we can split this up in a much, much more clean way. And this is what Netflix proposes. The Spinnaker project actually has this under a subheading called managed delivery and they run a project called Keele which I think is being developed still inside of Netflix but doesn't take a lot of commits back out from what I can tell. They separate this. They move to a much more declarative model where infrastructure and delivery rules are set up once and applied to everyone. So service owners never actually have to think about them. They're just mapped automatically through a compiler to back down to it. And so if you, this one is a, it's still not like super clean on the screen but if you look at it, basically what we're saying here is that there's an environment called production. It has to, its precondition is that staging must be stable. How we talk about staging being stable is not introduced in this file. It's just staging has to be stable because that's the requirement. And it's got a rule that says it's not the weekend. Right, that's, and now we can understand the intent of this. Production has to follow staging. It has, and it can't be the weekend. And we also can extract the infrastructure. We're not talking about a cluster or anything like that. We're just saying there's a runtime called Production US on Cades. Could it have easily been Production US ECS or Lambda or whatever else, right? Now the infrastructure person can manage their portion of this and we're bridging this together here at delivery time. This is just infrastructure to delivery. Business rules plus the baseline. And then they propose let's move this out to separate infrastructure from service. Let's have the infrastructure people deal at the bottom layer and the delivery teams write delivery rules because that's what they're responsible for. They're responsible for the delivery of the thing. And then the service teams just need to worry about like the basics of their application. How many replicas? What's the image? Like what code is going on? It's like we remove the cognitive overhead and we join this at deploy time, right? We can put a compiler in place here effectively to figure out what the dependency tree looks like. And so this much more declarative mode, right? We have what were three pieces now and what happens is we actually join these and we build a desired state out of this as opposed to having three, some imperative configuration that's like this, then this and all these different regions. We can write this once and then map this back into a single desired state. And so what we learned and I've learned this through sort of looking at the Netflix folks and I think Dropbox towards the end of my tenure we moved towards this. We found that declarative scales way better, way, way better from a developer standpoint. And Kube has a lot of these properties. The problem is it's not mapped universally through the organization across all the various pieces of infrastructure you have. So bridging this all together, it's not a desired state of everything, it's just like little tiny slivers today. And so what we were striving for at Dropbox is to figure out a way to bring this all together. I think Netflix actually has done a very good job of doing this about bringing a bunch of these pieces together in one place. Okay, so now we've solved the baseline infrastructure problem, right? We've said infrastructure as code is good. We've now also, so we can deploy that, we've said okay let's build declarative states, that's a step up from where we were managing pipelines and trying to get everyone to figure out what's happening and it's codified in a TPM's head or at least manager's head that's trying to figure out what the next person I talk to to make sure these rules happen. I mean for me when I was at YouTube we had every single person in the engineering team would get on IRC when we did a deployment. We had probably two to 300 engineers and they would sit there. And a TPM would run through 200 line spreadsheet. We'd each validate whatever we're gonna do and that process could take days. And so as we added engineers, as we added these things like it just scaled like the deploy times just kept going up and up and up and up. We went from like being able to deploy daily to like weekly and it was just very, very painful. Had we had something like this it would have drastically changed sort of the game for us in terms of like where we could actually have asked teams to interface with. The next lesson and I think this is probably what the farthest from where I see the world today is my personal view on this is that we still live in a world where there's a lot of belief that like production is a static thing that if I set these rules it'll never change. And I have this like data center and I'm gonna put my things in this data center and that's gonna be good. Let me ask you this. Who here manages more than one system? Okay, every so, okay this is easy. More than two systems? Gonna, okay, all the hands are gonna stay up. Anything more than five systems? Okay, so everyone manages distributed systems now at this point, right? Like that is a constant. Let's just assume like the cloud is a distributed system at this point. Everything crosses boundaries now. There's no more, okay I have my application I'm gonna put it on a server and this is gonna be my thing with my PHP app and my MySQL thing and that's all it's gonna be. It is a distributed system. Which means that everything crosses boundaries. And as much as I would love to live in a world that's like Amazon where everybody has to have interfaces that are well-defined and everything can break at any given point in time. I'm guessing no one here really lives in that world. Yeah, I see. Pragmatically, right? Like we don't know the dependencies we take in a lot of cases. So I would postulate that control is an illusion for most people. Like we want to have this control but maybe another way of thinking about it is like what if we just gave up the control? What if we didn't actually try to have all the control to exercise across the system? And here's the reason why. For service dependencies, in your own infrastructure does this group feel like they have pretty good understanding of their service dependencies? Or like 50-50? Okay, one, yes. No, that's not, it's like, I'm guessing if I said like enumerate all your dependencies people would have a problem doing that today. That's just in your infrastructure. Do you know your cloud dependencies? Like what, for example, do people know that like EC2 and S3 are not a non-blocking switching fabric? Meaning that like you can overload that and I have been in an environment where we've overloaded that? That's a dependency. Like we need to know that in order for us to try to copy X bytes of data out through EC2 instances. It will break. So there are underlying fundamental cloud assumptions there that you fundamentally can't know at this point. In the hardware world, maybe you can go pretty low but in the cloud you can't. It's, you know, there's a abstraction layer, abstraction layer, abstraction layer. And so you have dependencies you don't even know about. And the last one is how many people have application teams deploying apps that take SaaS dependencies? Anybody have this? Okay. Do you know when your SaaS providers are doing upgrades or changing things? Okay, see, yeah. So this is complete illusion now, right? How many of you even know which new dependencies that app teams take on SaaS providers? I'm guessing that's also a very low number. Security may have a better view but like, it's not, okay, it's hard, right? So like the best you can do is defend the thing you have, right? You can't actually try to enumerate the dependencies and manage it that way, right? Like we're just gonna assume that failure is gonna occur. And so now we have to adapt to like when failure happens. So okay, we're all aligned. Change is constant in the cloud. Now let's get to like deployments. Pipelines are really bad with change. Pipelines are like probably, like it goes in one direction. It's a singularly linked list a lot and it from that perspective. It's like looking at an example of this. This is the deployment pipeline. Okay, it's gonna go out like that, right? This is the steady state best case that can possibly happen. How many of you have had this scenario where let's say production US just decides to blow up? What do you do? Do you roll back? Do you fix production US like out of band to get it to keep going? Do you go and I don't even like start a new pipeline just for production US and like stop cancel this like, okay, this is now like it's hard to reason about as an operator. Put yourself in the application or the intern shoes. What does the intern do? They have zero clue how to handle this problem, right? Like so we're building systems that are like not feasible for people at like the higher altitude here. So this is painful and slow. We can take a lesson from open source world, from Netflix, from a bunch of places. Convergence engines are a thing. People use these, they're not, they adapt things to real time change. You can set desired states say and then have an engine try to bring you to that desired state exactly the way Kube does this. What if you could do this like everywhere instead of just inside of the one place called Kubernetes? That's been the sort of what Netflix has promised with Keele and a bunch of what they've built and managed delivery and spinnaker. It's not a panacea, right? But like it takes you way farther because you can take that same system, right? The same failed mode here. And now you can resume, right? If something happens, you can resume there, you can pause, you can do all kinds of interesting things. And if you think back to the, I should have put a slide here. If you think back to the declared states of like how the environments work, parallelism now becomes easy also. Because if you understand the dependency tree, you can unwind this dependency tree, right? And you can say, I can actually, I didn't actually say in any point in this conversation that like these things had to go in that order. The pipeline just made them go in that order because that's how we wrote this thing. But in a lot of cases, it may not have been actually that's what I wanted, right? It was just like it by coincidence happened that way. Let's make this a little bit more complicated because now cloud systems are way more complicated than our previous lives were. Let's consider this use case. We're gonna roll from version one to version three. This is what happens today, right? This is what your normal deployment pipelines are gonna look like. We're assuming like one day per release, it's called a storage system or something like that. Timescale doesn't really matter. This is just to be helpful here. The first observation is like, it's going to be, it's going to take one per thing. The next observation is that staging is blocked from taking a new update from a new commit. You can't continually update staging in this model. Unless you break its dependency on the production, it becomes very hard to reason about this really fast. And it's not super fast. Now let's put this through a model where we understand the dependencies and we understand how in a convergence model, you could make it look more like this. So you're at 50% of your time to deploy also. So this is the best case, right? You can somewhere between seven and 15 is probably where you're gonna land depending on your dependency tree. But you can actually, this becomes way more powerful, right? You can get a much more simple language, much more simple configuration by giving the engine the ability to reason about it. The engine can actually figure out what is actually gonna roll out in what order. And you're now in a place where you can adapt to real-time issues as well, right? Because you have a desired state, you can go through this desired state. You're replacing this notion of it has to be linear with a notion that it's gonna happen real-time all the time. So whatever circuit breakers you need, all of that can be put in line with it. Okay, so my takeaway here is convergence is way better or some type of adaptable engine is way, way better for production systems in the cloud. I'm running a little fast on this, so I will make sure to save some time for Q and A. So let's put this together. I like to think of like modern intent-based delivery systems. I think we're on a place where the current, and here's my, I'll summarize the argument as this. When we did lift and shift to the cloud, the tool chains did not evolve. We lifted and shifted intentionally because the variable we were trying to control was physical infrastructure to cloud infrastructure. And so we effectively moved things from point A to point B with the exact same tool chain we had prior. But we've agreed that change is constant now in the cloud and things are like, we can't control the dependency tree the way we used to think we could control in the physical environments. So now we actually have to go back, right? Like this is what we had to do when we built storage systems is we had to go back to the first principles and say like what are we gonna do now that we have this thing that we moved here very quickly, which is manufactured growth? How do we manage that thing in this new environment? We haven't actually gone through that journey yet, we're just starting it. I think infrastructure as code was the very beginning of this and now we're in a world where like what is the next generation of things, of next generation of a deployment type of things look like. So the first is separation of responsibilities. Like we need to find to be in a world where service teams can define how apps were managed, how infer can define how infer is managed and we need a way to move that out to production environments without those two teams blocking each other, right? Infrastructure is a complicated problem now, it's way more complicated than it was 15 or 20 years ago. So now we need to have those teams need to be able to handle that stuff independently. A good example is like who had to deal with the log4j bug security? Okay, one hand going up. Everyone knows the log4j, so what if you could upgrade everything independently without the service teams even worrying about it? Like most systems today it's very hard to do. I don't think who even gets to that level of like how do we solve that problem? Those are the class of problems that infer teams and security teams are faced with now, right? Like that's the class of problem and they don't have the ability to like understand what's happening up stack. So there is a separation of responsibilities and interface that needs to be there. I would say declarative, you basically, this idea of desired states is way more powerful than sort of like the if-elces that we have in the current sort of CI world, CI systems. I would also say that like your CI system probably should not be your deployment system at this point, it's way more complicated than just point A to point B. We're not printing gold master CDs anymore. We're actually deploying a distributed system. I think the other one adapts to real-time change. Again, if we were deploying gold master CDs there is no change, right? We print a physical thing, so a pipeline kind of makes sense. We get one artifact at the end, that artifact is shipped out to everybody's computer. I worked at AOL, so like I feel like I can say this. We had a CD and we shipped it. That artifact is very different than the artifact we're shipping today. And so I think that necessitates a change in how we think about delivery. And I think you also have to abstract infrastructure. I don't think there's a world where infrastructure can leak the way it's been leaking upstack today for a bunch of reasons. One, I don't think it lets people do what they do best. Again, the intern problem. I think that it's way more complicated than people realize. We've spent more time on app side the last five years than we have on the infraside in sort of the general computing world. And that's causing, there's friction there. And then I think the other one is that you want, if cost is a thing, you need the ability to control cost at supply and demand level. And this is a whole different talk, but I think of infra is a supply side and applications is demand side. And so you want to be able to control both those cost levers. One of those allows you to reduce sort of like the computing cost. And one of those is about how many users are coming onto it. And if you don't have that abstraction, that also removes your ability to control costs. And so it effectively margins if you're at sort of the enterprise level. This is roughly the architecture of both what we built and also what Spinnaker has sort of done where you can kind of combine these things. You end up with basically compile, you compile down to desired state. Desired state is handed to an engine. The engine can then fetch and runs, fetch and applies against things to get them to the current state. It is very, very different than what the current pipeline model. So it changes the reasoning of this room more so than anyone else. Upstream is actually easier to reason about, in my experience, the downstream infrastructure teams generally don't care. They actually have a way easier world in like a lot of ways. Cause I don't have people coming and asking them questions anymore. So let's see at the top, right? This is like, let's get our requirements together. And then we take them, we build a desired state, calculate the changes. And then at the bottom layer here, we fetch the current state of your environment. We say like, okay, what's happening? And then we can compare that to your state that you want and then we can apply changes, right? Like, and you can make this kind of a loop happen. Which is a way more powerful construct than what we have with pipelines. So I will finish this before I came here. If they don't tell me and I will send them out in some way, shape or form, or you can just go to the URLs. The first one is basically how we took a four to six million line Python monolith and broke it apart into a serverless framework with a lot of the same delivery principles. Netflix's wrote a bunch on managed delivery. We just wrote some blog posts on this as well. I would say the managed delivery team has like since scattered across the sort of industry, but I think I've been spending a bunch of time with them recently and they feel very, like they have a bunch of case studies also around sort of why this is what, what they've seen in the wild with this. I think like Samsung SmartThings uses this as like their default delivery mechanism at this point. So I will pause there for questions. Also we can have lunch. Yes. No, no, I was gonna say, or we can go have lunch. Yes, go ahead. My philosophy on this is that you probably want to handle it at environment level, not necessarily just service level. And so there's like, you want to have a reason, a building block you can reason about. And so going down to like the, depends on the layer of minutiae that you're talking about. I think for environment it's fairly easy. You can say like this constraint is this environment must get to this level. For storage systems, which is my example here, very easy. We wanted different versions everywhere for storage because you never want to have the bug that deletes everything in that, in all regions simultaneously. So we would have like, and you have data vintages and you have a whole another set of problems. So in that case, actually we wanted it to look like that. We wanted that problem. I think generally the deep, I think on the other access, right, like deployment access, and I'm increasing complexity of sort of reasoning here. I think on the other access like the telemetry teams are getting way better at that. And so that, I think those two things put you in a different quadrant now. So if you assume old tech on all portions, yeah, I think it could be a problem. But yeah, I think you have to up level these things. I don't think you should up level one without doing others and all these type of things. Okay, so question is, who and what process is where the lines? I think that every team I've ever managed, I had a concept of like at a very course level serving path versus non-serving path, that's like super easy to draw, like batch versus not batch. Then you get into the core libraries versus not core libraries. For my teams and my philosophy, it's very hard, some of this doesn't translate well. We, I believe in one of, not N of, so one RPC system, one version of everything effectively. And so then rules around how fast you have to converge to said version. And so the answer to that is like, I'm sort of side stepping your question by saying like there's gonna be a layer which like this stuff has to go out and you're going to do this and there's like a very small set of people that control that. And then there's like everything else sprawl on top of that. Can you see it? What do you? I, in this model, I would put this on build side, on the build side. You know what's going out. And so you can keep a catalog of that. So you can kind of put the back pressure to say like, okay, these things need to get in a certain place. The way we handle this in my previous lives where we would actually put different gates on, like they end up being human gates. When I say human gates, human back pressure gates, making it painful for them to do something else until they update. Then you're talking about like, make it less painful to update. So I think of it as like build system of what I want the end state to be and then automate those individual pieces. I actually don't, not very strongly opinionated about what tech is gonna have to happen where. It's just gonna be what is the human, how does that pain for the human be very low? So in my previous life, build system team would have handled this. They would have made sure that we would have had some sort of report. Deploys would have been blocked if you weren't at a certain level. And then if you weren't actually, if your services were not deployed within 30 days, we started sending pages to you. We're opening tickets and we started sending pages to you. Didn't matter what time of day it was. Like that, you know, this is the thing. So we'd open a set of or something like that against the team and use that organizational construct as like back pressure. I don't think there's a clean way that I've seen yet to do this. So over somewhere, how's it gonna happen? It's gonna happen, yeah, yeah, yeah. Oh, convergence better in pipeline. Convergence engine is gonna handle real time things where pipeline is gonna statically move in one direction. So you can't really make a pipeline go backwards. It's another pipeline at that point. And convergence typically runs in a loop. Think of it that way, it's pulling constantly. And so let's say you see a failure, you know how you can respond to that failure real time without just like the pipeline stopping. Think we're at time. All right, well, we are at time. Thank you so much, Andrew. All right, we're gonna get started in about 15 minutes with the unconference. In the meantime, lunch will be available in the back. We wanna see you come out and hang around for the unconference piece. Muriel will be up here in just a few minutes to set that up. And then we will kick back off with talks at 1.30. So get yourselves some lunch. The unconference piece is an opportunity for you to define some topics and to get groups together and chat about things that you actually care about. So Muriel will be up to explain that in a few minutes, but we'll be back at 1.30 for talks. Thanks. Announcement. Hello everybody, welcome to our open spaces lunch. My name is Muriel and this is, I think my third year helping out with DevOps Days LA and this is our second year of doing open spaces lunch. So thank you all for joining. A lot of you I think have gotten your food. If you haven't had a chance to vote on topics, you can do that on the mentee here. This is basically your time to talk about the things that you care about. So find a friend. There are some open spaces tables in the back, but you can also join up here somewhere in the chairs. You can pick a table topic. There are some listed on the tables already. You can pick a topic from up here or you can chat about something completely different. Like what are you interested in right now? What has been bugging you and giving you nightmares at work? Here's some time to talk with your other DevOps folks. So grab a sandwich, find a table and make some new friends. Thank you all for being here. All right, thank you for everybody new that's come in. Grab your sandwiches. I know we've got kind of a limited table situation, but you can also group up in little circles at the back for your different tables. Tables already have table topics, but you can discuss whatever you'd like. If you want also, there are a few people out there. If you want to find Alan, we're going to have a little security roundup. He's got his hand raised over there. Alex and Gareth have data in DevOps, scale retrospective somewhere towards the back by the curtains and everybody else that is kind of floating around, find a friend, loop up. You can loop up towards the back, pick something on the screen, meet a new friend. All right, thank you everybody. I'll leave this up there because other people don't. This will be your microphone. You can talk with this here. Or if you need, we can put this love on you. Okay, I'd like to move around, but it's a short speech. So there's not probably, however you want to do it, but then you'll see right there, you can see what's up there. So you don't have to always keep going. And then so like I can have it here, even with like one, actors or someone like holding on. I do know someone, 30 seconds or something. Yeah, he's got to like two minutes and one. Ladies and gentlemen, we are gonna get started in just about 10 minutes. Now's the time to make that final bio break and find a fantastic seat. We've got some great speakers lined up for after lunch. So we'll see you in about 10 minutes. All right, ladies and gentlemen, it is about five minutes before we begin. Now it's time to start taking those seats and we'll get started shortly. All right, thank you all for coming back after lunch. Looking forward to some amazing talks. Our next talk is from Paul Tevis. We've got two lightning talks and then we'll go into a full talk with Gareth. But for now, Paul, it's all yours. Back, can you hear? Oh, now they can hear me in the back. All right. Everyone's lunch sitting okay, a little sleepy. So my slides will advance automatically every 20 seconds. So this should keep all of us awake, but it's also the way we're getting through this in 15 minutes. So once I hit this button, we'll go. Hi, this is three heuristics for fostering high trust generative culture. My name is Paul Tevis. I'm the people and culture guy at the tech conference as I usually am. Anyone, anyone, okay. I started my career as a software engineer, engineering manager, spent some time as an agile coach, spent some time in learning and development. Now I run a small company with my two business partners where we help organizations with their interpersonal and cultural practices in ways that produce better business results. And that's what I'm here to talk with you all about today. So let's get going. Raise your hand if you have read the 2023 Accelerate State of DevOps report. This is not a judgment. I'm just assessing the state of information in the room. Okay, okay, cool. Just wanna know where I can start, what I need to explain with. I'm gonna ask two more questions that are also of that assessing where we're at stage. Cool. Next question, raise your hand if you've heard of the Westroom Topology of Organizational Cultures. If you were in the room for Kote's talk this morning, you at least saw it on one slide. So lower if you kinda know what it is. High if you like it, I can explain it. We use it all the time. Little bit, little bit, little bit. Okay, cool. Third question for you. I want you to raise your hand if your organization has made an explicit goal of creating a Westroom generative culture. That is about what I expected. The fact that your hands are not up is fine because the things that I'm gonna give you today are things that you can do that help to foster this type of culture and also make you personally more effective if you do them. So little bit of overview, Ron Westroom. He was a safety researcher and a sociologist studying high risk, high complexity environments. And one of the things he found in his research is that information flow predicts organizational outcomes in safety contexts. And so he developed an instrument for assessing information flow and culture, which is the slide that if you were here for Cote's talk, you saw, right? He said that organizations, we can categorize the way information flows through them into sort of three different buckets. And these are organizational behaviors, like things that we would see in each of these cultures that tell us about how information flows through that. This shows up in the DevOps world in 2014. In the state of DevOps report, they're looking for a way to assess culture. Westroom has this instrument, so they start studying it. And one of the things they find is these are not just predictive of safety outcomes, but of organizational performance as well. And that is a result that bears through every year in the state of DevOps report up till this year where they find technical practices are important, but they're enabled by this type of generative culture. And in fact, every good thing that they studied in the 2023 report is strongly predicted by the presence of a Westroom generative style culture. So every good thing that you want is informed by doing this type of work. So the question is, great, how do we get one of those? How do we foster that type of generative culture? And this is the question that my business partner, Allison Pollard and I have been digging into for a number of years. We work, as I said, with individuals and leaders and organizations to help them show up more effectively, have more productive interactions. And so we've been looking at what are things that individuals can do that foster the types of organizational behaviors we need in order to get these better DevOps results. And that's what I wanna talk with you here today is those things. We wanna focus on the individual actions because as John Shook reminds us, the way you create cultural change in an organization is by acting your way to a new way of thinking. You don't try to convince people that they should do a thing. You get them to start doing the thing and that then causes them to learn stuff and to realize, oh, there's actually a better way of doing this. So what are individual guidelines for action that we can use that help us get more of this stuff, that help us get higher degrees of cooperation, that help us to train our messengers, that help us to do these things? So that's the lens. Individual actions that predict organizational behaviors that get us better organizational results. So where we start with on all of that is conversations. Because it turns out that is where culture manifests and where culture is embodied and where, remember, Western's work is about information flow. Most of the information that is flowing in an organization is happening in one-on-one interactions and in group conversations. So we need to get those right in order to enable these behaviors. The problem, of course, is the conversations are hard. I have all three of these books on my bookshelf, I've read all of them. They're all wonderful, but they all have like 17 step processes. And another thing that we know is that under stress and uncertainty, when the compliance team shows up and says, hey, we need to spin up another instance in order to deal with GPR, you can't think through with the 17 steps. What you need are heuristics, right? Heuristics rules of thumb. These are mental shortcuts that allow us to reduce our cognitive load and produce good enough results. So what Allison and I have been doing is working on a set of heuristics that allow us to have better individual interactions that lead to better organizational results. And what we've identified are six different failure modes and three different heuristics that help us stay out of those failure states that get us more towards that Western-style generative culture. With me so far, okay, cool. So let's go through these six failure modes. I'm gonna ask you to take a look at yourself as you're doing this because what these are about is the way people experience you in an interaction. You also experienced them but we're gonna focus on you for a minute. Step outside yourself and think about how have I been described? Do any of these sound familiar? Are these, how do I show up? Not how do I feel but how do other people experience me? So this first failure mode is what we call vague. So when you are vague, people experience you in a way where they don't know what you think, what you feel, what you want, what data where you're working from, what your conclusions are, if you've even come to a conclusion, who knows, right? Vague is very squishy, right? It can sound like, look, the details don't matter. Just deal with it. You're like, what is dealing with it even mean? When we're vague, you can see how this impacts information flow. Information's not flowing in a way that's useful. Now on the other end of the spectrum, however, we have another failure mode that sometimes is the, it's really kind of the polar opposite of this and this is where we are rigid. So rigid, when someone experiences me as rigid, it means that they know exactly what I think and what I want and that is how it's going to be. The picture I'm painting is so clear and so detailed, there is no room for them in it. It's inflexible. I'm not open to influence. I don't know if anybody else has ever been described this way, but I know that I have, right? This is rigid is very much the we're doing it my way. Both of those get in the way of useful information flow in an organization. They're both failure modes and the useful spot is to be somewhere between those two modes. And so this heuristic is what we call being clear. Being clear is when you avoid the traps of being vague and being rigid, you're sharing the information you have, you're sharing your reasoning, you're sharing how you got there and you're also sharing how strongly you believe those things, where you're open to influence, what you're willing to be flexible on. So people know where you stand, but they also have a sense of how they can participate with you and so it can sound like things like this, including what we know is and what we don't know is because clarity is not the same as certainty. We can actually be very clear about what we do not know and that can be really powerful in a conversation and in an organization. So we can see that when we're clear that contributes to some of these organizational behaviors, we're not gonna get to high cooperation if we're not making clear asks from other people, if we're not training our messengers. So we can see how these individual behaviors, and by the way, those of you who are frantically typing, I'll make the slides available at the end. So the second pair of failure modes is the first one starts with being uninterested. If the first pair is about you knowing what I think, this failure mode is about me not knowing what you think, what information you have, what data you're working with. So I'm not asking about your perspectives, your feelings. That can sound like things like, well, that's just not how we do things around here. I'm not expressing an interest in what's going on with the person I'm having that interaction and that conversation with. Now, of course, there's another failure mode that you can probably see where this might be going on the other end of it that also has to do with asking questions. That's when I'm asking so many questions that they're laced in a lot of ways with judgment or blame. This, by the way, is one of my favorites. I build a house here, I get really curious and I ask a lot of questions and then people feel like they're being interrogated because I accidentally, when I'm trying to dig into stuff, figure out what's happening and I ask things like, so why would you do it that way? So I'm trying to find out what's going on but I'm doing it in a way that's not contributing to them actually sharing the useful information. Blameless postmortems are about avoiding this kind of interrogation. So again, the useful spot to be is somewhere along this spectrum in what we call curious. So clear was our first one, curious is our second one. Right, where we're asking about things, we're getting information from the other person but we're doing it in a way that feels like we're partnering with them, like that we're there with them in the conversation and it's flowing well. And what do you notice? What do you recommend? How important is this to you? Where else might this happen? Because when we individually start to get curious, when there's a pattern of curiosity in our organization, then we can see how this maps to some of those other behaviors at the organizational level from that Western generative culture standpoint, right? Where we start to see things like failure occurs, we get interested, right? We engage in inquiry, right? Something new and unexpected happens. What can we, how can we capitalize on that? What can we do? If we're not being curious at the individual level, we're not gonna see this at the organizational level. So that's the second pair, right? Where we think about being clear, being curious to avoid those four failure modes. The last pair starts with being distant, right? Distant is where I may know what you think and what your perspectives are and it really seems like I don't care, right? Distant is when you're in a conversation with me and it's like I'm not even there. I'm off on some other planet. I'm talking about things that aren't relevant to you. It's, we're disconnected in a lot of ways. Sure, sure, that doesn't matter right now or perhaps even worse. Don't bring me problems, bring me solutions, right? You're not interested, you don't express care about what I am experiencing. So that's distant. At the other end, we have glued. So glued is actually when you express too much care about the other person, right? Where you're over attentive to their needs and their feelings. Where you're not able to do something because you're stuck with them. You're stuck to them. You can't move over separately from them. Where you don't tell somebody something because you're afraid about hurting their feelings or how they're gonna take it. You're glued to that other person, you're over attentive. And so, or you hang off their every word and you do everything that they suggest. I hear managers talk about this a lot. They're like, my people take the one idea that I threw out there that I was just, mm, and then they run with it. And those are an example of being glued. So again, the third place that's useful to be is connected, right? We're between distant and glued. We recognize where we end and the other person begins, right, and we're still recognizing that we're interdependent with them. So it's recognizing separateness and interdependence. And so it can be things like expressing concern and care about what's going on with the other person and what's going on with that group. Letting them know proactively, hey, this might affect your group too. This is something that we need to pay attention to. thanking people for bringing up problems even when it was hard for them to do that. Acknowledging that there is another person on the other end of the conversation. And so you can see how this contributes to these organizational behaviors as well. Particularly when we're talking about sharing risks, when we're bridging across to the other rest of the organization and we're talking about cooperating. The more that those patterns of connection show up at the individual level in conversations, the more likely it is that these organizational level behaviors are gonna show up. So how do you use this? Because remember, I said these are guidelines for action. So in practice, these are not independent, right? When you start to be a little less rigid, you're almost certainly gonna start to be a little more curious. But you can start to think about how do I usually show up in an interaction? And if I'm planning a conversation, I need to go talk to this person about this thing, you might say, you know, if I just YOLO this conversation, where am I gonna end up by default? Ah, I'm probably gonna be here. Hmm, is that gonna be useful for that conversation? So instead of doing that, which sliders might I wanna move? Okay, maybe I wanna try to be a little less vague. I want them to experience me as a little bit more clear or as a little bit less interrogating. And so you can start to think about what are adjustments that I need to make in order to help that conversation, that interaction go well. Now here's the important part. What happens in your head doesn't actually matter. Because this is about how the other person experiences you. They don't interact with what's up here. They interact with what's out here. So you need to start to think about behaviors that help you to move those sliders. And one of the guidelines for behaviors that Allison and I have come up with is, what is something you can do, say, or ask? If it isn't something that you can do, say, or ask, it's probably not a behavior. So what might I say that's gonna allow me to be less vague and more clear? What might I do that's gonna make me seem less distant? How might I adjust my behavior? And then during a conversation, maybe you didn't get a chance to plan for it, compliance team just showed up. You can ask yourself, you can assess where am I at right now? How is this person experiencing me? Are they experiencing me in the way that I want them to? And if not, what can I do, say, or ask to adjust that? And finally, you can, after the conversation, say, hmm, where did I land with them? Did they experience me as rigid? I might even go ask them later. Did I seem a little inflexible on that point? Yes. Okay, great. What did I do, or say, or ask that caused me to land there? How can I learn from that for the future? So this is actually a bonus heuristic. This is a thing we call plan dance retro, recognizing we can use clear, curious, and connected both before, during, and after our conversations so we can plan better ones, dance in the middle of the conversation and retrospect afterwards. That is a lot in a very short period of time. I'm very excited about this work. I love talking about this. I appreciate it if you come and find me. There's a QR code here if you want to go find the slides and a worksheet. Rest of the material is all up there. Thank you for your time today. Thank you very much. And our next speaker is gonna make his way on up. And while he's getting set up, we are going to get a word from one of our sponsors. Come on up, X Matters. Can I get a round of applause for our sponsors? Hi everyone, I'm from X Matters. The way our product works is we, every time things break or things go wrong, we make sure that your business gets back on track before it affects your customer. And we do that through our drag and drop no code, low code feature that lets you auto remediate, consolidate logs, as well as leverage AI to identify the issue and suggest how to fix them. So come check us out. We're giving out a Lego set here for Super Nintendo and a bunch of other stuff. So thank you. All right, huge thank you to X Matters and to all of our sponsors. And now our next speaker, John. Hi, my name is John Engelke. I'm an engineer at Jet Propulsion Laboratory. Worked on the Mars 2020 project and I'm involved in an open source project right now to create tools for the broader community, including one that is an application that I hope can make life a lot easier for people. Have to put this here because Raytheon has helped us out and I actually am assigned to the lab through Raytheon. What is DevSecOps? What is ShiftLeft? Basically it's the concept that you're gonna take some of the operations that happened after the development phase that went farther out in time and kind of bring them back into the development phase. So infrastructure is code, testing is code. You're gonna bring everything back toward the actual developer so everything can be merged together and the development process itself can have insight and control into the actual testing, code validation, secrets testing, everything. The developer can see what's going on so they can manage it. What is Slim? Slim is a project run at the Jet Propulsion Laboratory to create open source tools for the broader community and one of the things, it's a standards project, really called software lifecycle improvement and modernization and we're creating CICD pipelines, we're creating application templates, we're working on information sharing standards and governance standards, we're trying to make it easy to set up a full future to application really quickly and that's what I'm gonna be demonstrating today. So we have a Python starter kit, it's DevSecOps ready, it provides infrastructure as code, you can see it, all the deployment stuff's in there, uses GitHub actions, it's working on GitHub. I call it cheap fast good, can we possibly hit all three? Maybe yes, good's the implementation part, right? So you take this template, you clone it, you make it work for you and then you have to walk through it a little bit and do a little bit of tweaking and updating. And here's a URL for it, you can take a look at it after this presentation. Here's what we're gonna do, we're gonna clone the template, set application information, hopefully have time to add publishing credentials, test and deploy it and hopefully get this thing published to test PyPy during the course of the next five minutes. Okay, so now here comes the demonstration part, so I'm gonna come out of this and let's see what we can do. So I'm browsing here to the Slim Starter Kit Python and I'm logged in as a user called Ingy here. I'm gonna hit, let's see, where are we here? Gonna hit use this template and then create a new repository. And I'm gonna just call it something, let's see, Python Starter Kit scale 21x. Okay, so it's available in my account, I'm not gonna bother with that, I'm gonna leave it public, I'm gonna hit create repository. Okay, so now it's taking me straight in here, it's generating the repository, let's see what happens. This'll take a few moments I think, I hope. Okay, there we go. Let's take a look at what we have inside of here. We have a GitHub directory which has a number of workflows in it. We have some issue templates, we have pull request templates, a dependabot check. These are already scripted out, they're drop-in. You can come here and use it for any of your applications but in the workflows directory, you can see that we have what's called a code QL to static code analysis that checks for both code quality and security and it's using a tool built by NASA called Scrub which is sort of managing it. We have a pilot analysis to check for PEP 80 compliance. We have a Python publish, GitHub action and a secrets detection. Well let's see if any of those things worked. Well look at that, they're running already. So I just did a quick clone and already we have a secrets detection running. We have a pilot running which runs successfully and check the application. Let's take a look in here and see what it said. It actually created a report for us. Let's take a quick look at it. Look at that, I just downloaded it and I'm gonna unzip it here and let's see what it says. Okay, a couple of things here. You can customize it of course to get rid of the doc string checks. Mostly this is talking about doc strings right now. Minor things but the code looks good. So we just did the clone. Let's see if we can make some changes to it to get it to publish. So the first thing I'm gonna do is I'm going to try to set up my test PIPI credentials. So I've logged into test PIPI. Now you could easily, we typically have you tested on test PIPI which is actually a separate PIPI instance so you can run your application deployment against it and make sure it works. I logged in here already. I'm gonna come down here and I'm going to create a token. Forgive me if this displays a token publicly but it's only temporary for the purpose of this demo. So let's see, let me, I'm gonna set this down because I have to pop into the other screen real quick instructions. So we have to create a shared PIPI API token. Okay. All right, so let's create it. Okay, there's our token. I'm gonna copy it. I'm gonna come back to my repo here. Gonna dig into the settings but first what I'm gonna do is take a look at my GitHub publish action. And you can see in here that actually if I can get it to scroll, it's looking for PIPI underscore API token. So it's looking in there for that and token. Okay. So let's go back to make sure. Okay. Come down to secrets. Make sure I can get in here and get to this. I'm gonna show you what I have here for a different repo real quick. Then I'm gonna run through, let me run through the updating of the application. I think we're a little short on time. I knew this would be a little specious. But we did see our GitHub actions run and actually do the code analysis. GitHub, by the way, has started to require everybody to use two factor authentication and so has PIPI. These are obfuscated here. Thank you. Okay. Well, let me walk you through modifying the application real quick so that we can see what has to change in order to make this publishable. Although I think it would be publishable without it. Well, what we've done is we've made a, it's using modern Python tooling, so there's a pyproject.toml file. What we've done is we've come in here and we've called out the places that actually have to be changed in order to make it publishable. So for instance, this one is gonna change the psk-scale21x. The reason I used an underscore was on purpose. Okay, second thing we have to change is we have to change the setup.cfg file. Okay, and again, everything is called out that needs to be changed here. So we would edit this project here. Let's see what has to be changed. And we're gonna call this, it was a little bit aspirational to do this in the few minutes we had to actually get it to publish to PIPI. No, it should be an underscore and a dash, that's right. Because Python translates the underscores to the dashes when you do deployments. Some of them need to be dashes and some of them don't. Okay, and modify for your project. Okay, I think that's what we want. Okay, we're gonna edit one more thing. Actually, I have to clone it to edit this. Okay, I'm sure somebody will let me know when I'm running out of time. You're just about there. Okay, I think that killed it. I will summarize what I've done so far and how close I am to actually publishing it. Basically, I have to rename this directory right here and I have to insert some keys in here. And then what I will do is I will come here and I will actually create a release to release that exact version of the project. I will finish this right now off the side because I don't want to spin down everybody else's time, I think I was a little bit limited here myself. But the point is that we did create an application that is basically publishing ready and we have interactions. We have several actions already running straight out of the box when we did the clone. So thank you very much for your time. Thank you very much, John. And thank you for battling the demo gods because they never give us the easy path. So thank you. Gareth, if you wanna come on up and get started. And I've got Rin from Honeycomb as our next sponsor. Give it up for Honeycomb. Thank you. I'm Rin, I'm with honeycomb.io and we are in observability tool for the present and future of distributed systems. Meaning that, sure, you have something that you maybe solve your problems with. Maybe you have logs through your cloud provider or maybe you're dealing with application performance monitoring and you have a million dashboards. But what if I told you there was a better way? What if I told you you could look at a lot of data about your systems and be able to look at problems that you can't predict are problems. The kind of problems that you're like, well, the database starts, the database key starts with an MC and it's somebody from Scotland. What if I told you you could diagnose those problems quickly with lots more data? Come see us, we'll tell you about it. We're right outside and we'll be in the scale expo hall the next two days, thank you. Thank you to Honeycomb and all of our sponsors. And Gareth is gonna be set up in just a second here. It looks like he's about there. We'll hand off the black mic to him and we will go. No introduction. You're Gareth, you need no introduction. I feel robbed. All right, so welcome to How I Met Your Deployment Plan. How's everyone's afternoon going? Yeah? Okay, I don't have a demo so no worries there. I won't keep you any here any longer than I have to. All right, so a bit about me, as Chris said, my name is Gareth. My social media information for Blue Sky, Macedon and the other sites can be found at the bottom of every slide. If anyone's interested in tweeting or sorry, don't use that word, interested in posting at or about me during or after the talk. So until recently I was a software engineer working on the Sol project. It's a story based on the classic tale. Company V buys company S. Company B buys company B and hilarity ensues. You may have seen it in the news recently. If you don't get that joke, sorry. It's kind of a broad joke. It's complicated. Thank you, Jeremy. I was fortunate enough to spend my days working on open-source software. If you have the opportunity, I highly recommend it. I'm also a former DevOps engineer. So if anyone is interested in doing a group therapy session outside on being on call, I will lead that. I'm also a vegetarian, but I'm not a militant vegetarian. So if I see you eating meat, I will only silently judge you in my mind. I'm also owned by two cats and a dog. So if anyone has any tips on working from home with pets, please come find me. In another life, I was part of a team that organized and started a free and open source conference here in Southern California. Some of you may be there now. Because of my involvement in the conference, I have a unique understanding of all the work that goes into hosting a conference. And I know all the craziness that's going on, craziness that's going on behind the scenes. For a fun story, ask me about the time a very large, expensive IBM server got lost at the LA Convention Center and I almost ended up going to the Grammys. If you haven't thanked the organizers of both Scale and DevOps stays LA, please do so. Just a quick warning. So no one gets upset during the talk. This is a talk on deployments. So I'm gonna talk about deployments and we're gonna get to scenes and characters and whatnot of how I met your mother. But that's later. This is a mullet talk. So business upfront, party in the back. And a quick spoiler alert. This presentation contains information on the characters and the storyline as well as scenes from the TV series which aired from September 19th, 2005 to March 31st, 2014 for a total of nine seasons. So if anyone has not seen the sign and you're concerned about spoilers, you're welcome to leave the presentation now. Since we're gonna talk about deployment plans, we should probably define what we mean by a deployment or what I mean by a deployment. For the context of this presentation, deployment refers to the process of making a software application or system available for use. It involves taking a code configuration or any necessary resources and setting them in a live environment where users can access the application or system. Deployments can vary widely depending on the context such as deploying a website to a web server, deploying a mobile app to a app store or deploying a machine learning model to a production environment. The goal of a deployment is to ensure that the plan, sorry, that the software is available and reliable and performing as expected for users. So now that we've defined what a deployment is, we can define what a deployment plan is. And a deployment plan is just that, a plan, a document or documents that includes all the details required for a deployment or deployments to happen. And it should be a living document, one that is updated following each deployment or copied from a template, updated with relevant information for that particular deployment. If you haven't documented how your deployments happen, then much like Indiana Jones, four and five, or a fourth Superman movie starring Christopher Reeve, your deployment plan doesn't exist. When someone knows is going through the steps of your deployment plan for the first time, they shouldn't feel like they have to retrace someone's drunken escapades involving tropical fruit because it contains a valuable clue to the next step in the plan. Your deployment plan should include easy to follow steps that will guide someone along the process for performing the deployment. A new hire should be able to deploy into any environment with minimal assistance, even if something goes wrong. Common problems should be documented in the deployment plan, along with the action to take if those problems are encountered. If those problems are related to the software being deployed, perhaps include links to the issues as reminders that those problems should be fixed before the next deployment happens. If the new hire runs into a problem that is not documented, that is a perfect opportunity to update the deployment plan for the future. And your deployment plan should be easy to find, stored in whatever documentation system your organization is using, whatever that might be, Wiki software, Google Docs, or stored in a Git repo alongside the application source code. Anyone in the organization should be able to easily find and review the details of your deployments both past and future. Throughout the series, there is a long running joke about the details of the job of one of the main characters, Barney Stenson. When asked by his friends what exactly he does, his standard response is always a single word, please. We later learn in one of the final episodes that please is actually an acronym, an acronym whose meaning and purpose is only known to Barney and a few others. When thinking about the contents of your deployment plans, avoid using acronyms and terms only a few select individuals know. Ensure all terms are clear and obvious in their meeting. And when we're thinking about the contents of our deployment plans, we wanna think of the five Ws and the H. The what, the who, the why, the where, the how, and of course, the when. The what is what is being deployed. This could be a single change to a website, making a new version of an application available to be downloaded, or a large deployment involving multiple pieces being deployed to multiple locations, along with the database updates, firewall changes, and load balancer updates. The details of the what should be easily discoverable. If there are version numbers, make sure those are easy to find and include them in the deployment plan so that everyone knows what version is being deployed. Details about the what are useful to include when a summary of the deployment is sent out, commonly called release notes, or a change log. Next up we have the who. When thinking about the deployment plan, it's important to both identify who is involved as well as who will be affected. You'll need to identify any and all collaborators. Depending on the scope of the deployment, this could mean include members of other teams that manage things like databases, networking devices, or security. Their involvement and expected tasks should be well documented. You'll also wanna think about the users that will be affected by the deployment. You wanna make sure that they are aware that the deployment is happening as well as it kept informed of any changes and new features that are rolling out. Keeping the users happy keeps them excited about using your software. Part of the deployment plan should also include the reasoning of why this deployment is happening, partly to ensure that everyone is on the same page about why the change is happening, but also the information will be useful when the learning users about the deployment explaining what was deployed and why. The reason could be because of a bug fix, a security fix, or a new feature. The why should also be included in the post deployment summary. And a very important detail to include in the deployment plan is where the deployment is happening. Is this a production deployment or is the change being deployed to a staging or QA environment? If you're lucky, those are completely separate environments. Is your software being deployed to a cloud provider? If your organization uses multiple cloud providers, which one is the deployment going to? Is this going to a particular region? Or is it going to a local data center? If your organization has multiple data centers, is this deployment going to all of them or are you doing an initial deployment to one location first? The where could also determine additional involvement from other teams. If this is a staging or QA environment, your team may have access to perform certain tasks that they otherwise would not if this deployment was going to a production environment. Ideally, you're using the same deployment plan and methods in all environments to stay consistent but also to continue to test the deployment before it's time to deploy to production. Probably the most important detail to include in the deployment is how the deployment is going to happen. This would mention any tools, scripts, and commands as well as how they are used for the deployment. We should also include the name or name of the team or team members next to each step. This detail is useful for everyone involved so they know who is responsible for specific parts of the deployment. For the purpose of this talk, including the how in the deployment plan is important, but the details of what the how is not. I'm not here to tell you how to deploy your software. I don't know what your software is or what it does or what your environments look like. There are many, many solutions available. Solutions like Pulumi or Dagger exist using continuous integration and continuous delivery solutions like GitHub Actions, GitLab Runners, or CircleCI is an option. Your team should pick the option that works best for your organization's needs. And I'm sure there are many, many vendors here at scale and DevOps LA that will tell you that their solution is the best one to use. Even if that solution is a handful of simple shell scripts that deploy into the appropriate environments, details of those shell scripts should be included in the deployment plan and those shell scripts should be included, should be stored in some sort of source control system and not on an engineer's laptop. Along with the steps and commands of how the deployment will happen, it is important to include information on how to test the deployment once it is complete. This will ensure that everyone is working as expected. Everything is working as expected. As new features and changes are made, the documented tests should be updated to ensure that all features are covered by the deployment test cases. The final detail from the 5W SM1H that we need to include is when. Specifically when the deployment will happen. The details of when will vary from plan to plan but most often the when will include a date when the deployment is expected to happen and a time when the deployment will begin. And if you're feeling brave and estimate a time when the deployment should finish. Some common questions that come up when talking about when to deploy. Is this a peak usage time for users? Is this going to cause downtime? How much downtime right now is acceptable? How many of those imaginary nines did somebody promise? Should we deploy on Friday at 5pm? Like most things in tech, there are a lot of opinions. This is one of the most contentious topics that is often debated when talking about deployments of software. Fights have broken out, friendships and marriages have ended. If you've never participated in an argument on social media about the best day and time for deployment or even observed one in passing, consider yourself lucky. If you ever see one starting up, perhaps just close that social media app for the day and go in search of pictures of kittens and puppies using your search engine of choice. Similar to the details of how a deployment is performed for the purpose of this presentation including the when and the deployment plan is important but the details are not. The time when you're doing your deployment should be when it makes sense for your organization, your team and your users. When thinking about the when for your deployment it's important to consider the start time as well as well as the expected end time. Taking past deployments into account to estimate the timeframe. Drawing advice from this quote from one of the main characters, Ted Mosby. Nothing good happens after 2 a.m. When 2 a.m. rolls around, just go home and go to sleep because the decisions you make after 2 a.m. are the wrong decisions. Once the deployment has concluded, hopefully at a reasonable time and any testing that indicated that the deployment was successful, some next steps including learning, alerting your users that the system or software is available and publishing the release notes or change logs to the appropriate channels so everyone is aware of the changes and new features. And then celebrating a successful release with or without a goat, your choice. Following the celebration, it's important to schedule a post-mortem to discuss what went well with the deployment and what areas could use some improvement. This is also a good time to update the deployment plan with any necessary changes from the last release. While not directly related to how I met your mother, on the topic of post-mortems, I gave a talk at DevOps Days LA a few years ago on post-mortems in general and what a post-mortem on the Death Star might look like. It seemed like it was well received. DevRel Luminary Mary Fingval, author of The Business Value of Developer Relations and Amateur Unboxing Photographer, referred to the presentation as critically acclaimed and open-source community expert Jono Bacon, author of People Powered in Art of Community and the creator of the heavy metal poke-of-fusion music genre, referred to the talk as an arresting performance. If anyone is interested in hearing more about my thoughts on post-mortems, come find me on the hallway track. So the TV series How I Met Your Mother has five main characters, Barney Sensen, Robin Shibatsky, Ted Mosby, Lily Aldrin and Marshall Erickson. We're gonna talk about these characters and look at what kind of deployments they might use. The first character we'll talk about is Marshall Erickson. Marshall is known for his kind and optimistic nature, often serving as the moral compass of the group. He's fiercely loyal to his friends, particularly his wife Lily and his best friend Ted. He's known for his quirky sense of humor and his love for puns. Marshall is also romantic at heart, often expressing his love for Lily in grand and heartfelt gestures. Marshall is a lawyer with a strong sense of justice, passionate about environmental issues and often taking on, taking a stand for causes he believes in. Despite his career ambitions, Marshall's true happiness comes from his relationship with his friends and family. Throughout the series, Marshall's personality is seen as steady and reliable, even when his relationship with Lily takes a brief pause while she moves to San Francisco to pursue her art career. Marshall is seen as supportive and waits for her to return. Considering this, a blue-green deployment would likely be the deployment that Marshall would pick. A blue-green deployment is a strategy for deploying applications with minimal downtime and risk. In this approach, you have two identical environments, typically called blue and green. At any given time, only one of these environments is live and serving production traffic, while the other is inactive. When you need to deploy a new version of your application, you deploy it into the inactive environment. In this case, it's green. Once the deployment is complete and the green environment is tested and ready, you can switch the router or load balancer to direct traffic to the green environment instead of the blue one. Using this approach, you can ensure that your application is always available since one environment is always active. You can easily roll back to the previous version by switching your router or load balancer back to the blue environment if any issues are detected in the green environment. Marshall is known for his consistent and dependable nature. Likewise, the blue-green deployment strategy aims to provide a consistent and reliable way to deploy applications while minimizing downtime. Marshall is someone you can always count on, much like how the blue-green deployment strategy is designed to ensure that your application remains available and reliable during deployments. Marshall's presence often serves as a safety net for his friends, just as the blue-green deployment strategy provides a safety net for applications by allowing quick rollback to a stable version if issues arise. Marshall's careful and thoughtful approach to decisions mirror the low-risk nature of blue-green deployments, which minimize the risk of downtime or errors during the deployment process. Both Marshall Erickson's personality and the blue-green deployment strategy emphasize stability, reliability, and minimizing risk, making them comparable in their respective domains. The next character is Robin Shibatsky. Robin is known for her love of guns, scotch whiskey, and cigars, as well as her affinity for playing ice hockey and her passion for journalism. Depicted as having a somewhat guarded and reserved demeanor, Robin often struggles to open up emotionally to others. Despite this, she forms close relationships with the main characters in the show and is seen to be fiercely loyal to her friends. Throughout the series, Robin navigates various romantic relationships, including an on-again, off-again relationship with fellow character Ted. One of Robin's defining characteristics is her ambition and determination to succeed in her career. She's shown to be career-focused and often prioritizing her work over her personal life. Despite facing challenges and setbacks in her career, Robin remains resilient and determined to achieve her goals. Looking at Robin's career journey across the nine seasons of the series, the type of deployment that would most likely go with would be a canary deployment. A canary deployment is a technique used in software development and release processes to reduce the risk associated with deploying new versions of an application. In a canary deployment, the new version of the software is gradually rolled out to a small set of users or servers before being deployed to the entire infrastructure. This subset of users or servers is often referred to as the canary group. By monitoring the performance and behavior of the canary group, developers can quickly identify any issues or bugs in the new version of the software before it is deployed to the entire user base. If any problems are detected, the deployment can be halted and the necessary fixes can be made. This approach helps to minimize the impact of issues and ensures a smoother rollout of the new version of the software. Canary deployments are often used in conjunction with other deployments techniques such as blue-green deployments to further reduce risk and ensure a high level of reliability and availability for the application. Robin's career is categorized by her ambitious and determined nature. She is constantly seeking new challenges and opportunities to advance her field much like how canary deployments aim to push the boundaries of software development by introducing new features or improvements. Robin's career often experiences various ups and downs similar to the incremental rollout of canary deployments. Just as canary deployments allow developers to test new features for the small set of users before a full rollout, Robin often faces small setbacks or challenges in her career that allow her to learn and grow before moving on to bigger opportunities. Robin's career journey reflects the iterative nature of canary deployments. As she progresses in her career, she gains new skills and experiences that allow her to take on more significant roles and responsibilities, much like how canary deployments gradually improve the overall performance and reliability of software over time. The next character is Lily Aldrin. Lily is incredibly caring and often takes on the maternal role within her friend group. She is always there to offer support and advice and comfort to her friends, especially when they're going through tough times. She's an artist and has a passion for painting. Her artistic side is a significant part of her career and she often expresses herself through her artwork. She has a playful and sometimes quirky personality and enjoys having fun with her friends and is known for her sense of humor and adventurous spirit. Despite her nurturing nature, Lily is also strong-willed and can be quite opinionated. She is not afraid to speak her mind and stand out for what she believes in. Lily's character undergoes significant growth through the series, particularly in terms of her career. She starts off as a kindergarten teacher but later pursues other career opportunities, including work in the art world. A deployment plan that would likely align with Lily would be the incremental deployment. An incremental deployment, sometimes known as a rolling deployment, is a strategy for releasing software updates in stages rather than all at once. This approach allows developers to mitigate risks, monitor the impact of changes and gather feedback before a full deployment. When performing an initial, an incremental deployment, updates and changes can be broken down into smaller manageable pieces. During the incremental deployment, developers can closely monitor the status and performance of the update. This can also gather, they can also gather feedback from users to identify any issues or improvements. Assuming the initial deployment is successful, additional updates and changes can be gradually rolled out. The incremental deployment strategy helps to reduce the risk and widespread improvements by catching problems early on and allows for iterative improvements based on feedback. Lily's personality is often portrayed as nurturing and is characterized by her caring, supportive and sometimes overprotective nature towards her family and friends. She often goes out of her way to ensure their well-being and happiness, sometimes taking on a motherly role within the group. There are parallels that can be drawn between Lily's personality and the concept of incremental deployments. This is Lily cares for her, for the well-being of her friends, incremental deployments, deployment strategies prioritize the well-being of the software by minimizing the risk of potential issues and ensuring a smoother rollout. Similar to how Lily's nurturing nature involves over time, incremental deployment strategies involve a gradual progression from a small scale release to a full deployment, allowing for adjustments and improvements along the way. Lily often adjusts her approach based on feedback from her friends. Likewise, incremental deployment strategies rely on feedback from users and monitoring performance metrics to make adjustments and improvements for the software update. Lily's protective nature can be seen as a way to mitigate risks and prevent harm to her friends. In a similar way, incremental deployments, incremental deployment strategies aim to mitigate risks associated with software updates by catching issues early on and minimizing their risk. The next character is Ted Mosby. Ted is an architect in New York City with a romantic and often idealistic outlook on life. He's portrayed as a kind, thoughtful, and sensitive individual who is deeply committed to finding true love and settling down. Ted is depicted as a hopeless romantic constantly searching for the one and often jumping into a relationship with high hopes only to be disappointed. When we first meet Ted during the first episode, he professes his love to Robin on their first date. He's known for his long elaborate speeches about love and relationships, as well as his tendency to overthink and analyze his romantic endeavors. Ted is also shown to be a loyal friend, always there for his close-knit group of friends despite their quirks and flaws. He's particularly close to his best friend, Marshall, and Marshall's wife, Lily, whom he is known since college. Ted's search for love is a central theme of the show as he narrates the story on how he met his children's mother to them in the year 2030. If Ted was to pick a deployment plan, he likely would start with a more monolithic approach. A traditional monolithic deployment approach refers to the method of deploying software applications where the entire application is developed, built, and deployed as a single cohesive unit. In a monolithic architecture, all components of the application, such as the user interface, business logic, and data access layers are tightly coupled and then tightly coupled and packaged together. The entire application is then deployed on a single server or set of servers. Updates and changes to the application can require deploying the entire application, which can lead to longer deployment times and increased risk of error. Scaling a monolithic application can also be challenging as the entire application needs to be replicated to handle increased load. Just like Ted's journey to find the one spans over several seasons, a monolithic deployment approach often involves a long and complex process of developing, testing, and deploying a large application. In both cases, there's a focus on a single large entity. For Ted, it's finding a lifelong partner for monolithic deployments, it's building and deploying a single unified application package. Ted often becomes fixated on finding a specific type of person or relationship, which can limit his options. A monolithic approach can limit flexibility in deployments as all components are tightly coupled and must be deployed together. As Ted's preferences and circumstances change, his search for love becomes more challenging. In a monolithic deployment, making changes to one part of the application can be difficult without affecting other parts. Ted's relationships often face challenges and sometimes end in failure. A monolithic deployment approach can be risky as a failure in one part of the application can impact the entire deployment. As Ted's search for love evolves, he matures and learns more about himself. Likewise, a monolithic deployment approach can face scalability challenges as the application grows and evolves, requiring more resources and effort to maintain. Just as Ted's search for love eventually leads him to finding the right partner, software development often evolves from monolithic deployments to a more modern approach for greater flexibility and scalability. The final character is Barney Stenson. Barney is incredibly charming and unconfident, often using his charisma to win over others and get what he wants. He's known for his smooth talking ways and his ability to manipulate situations to his advantage. He's always impeccably dressed in a suit and tie, which has become his trademark, believing that wearing a suit makes him more attractive and successful. Despite his many flaws, Barney is a loyal friend, often providing comedic relief and moral support, showing that beneath his suave exterior, he cares deeply for his friends. Barney Stenson's adventurous lifestyle likely lead him to favor more of a big bang deployment strategy. He might not, we may not know all the details of Barney's big bang deployment, but one thing we do know for certain, it's going to be legendary. A big bang deployment strategy is a software implementation approach where a new system or software version is rolled out to all users at once, replacing the current deployment entirely. In this approach, there is no gradual phasing in of new versions or a coexistence with an old version. Instead, the switch to the new version is instantaneous, often occurring over a short period of time, such as a single day or weekend. Big bang deployments are typically used for smaller projects or when a risk of deployments or the risks of deployment are considered low. This strategy can be more straightforward. I can't read what that says. 10 minutes, okay. Thank you, Chris. This strategy can be more straightforward and faster to execute compared to phased or incremental deployments, but it also carries higher risks. If something goes wrong during the deployment and can impact all users simultaneously, potentially causing a whole wide spread disruption. As a result, through testing and contingency plans, thorough testing and contingency plans are crucial when using a big bang deployment strategy. Both Barney and the big bang deployment strategy require a bold approach. Barney often takes risks and raises new challenges. Likewise, a big bang deployment strategy involves a bold decision to switch to a new system or software version without a gradual transition. Barney's lifestyle is categorized by quick decisions and rapid changes. While the big bang deployment strategy aims to implement the new system or software version quickly over a short period of time. Barney's adventurous lifestyle can lead to memorable experiences and changes in his life. While the big bang deployment strategy can lead to major changes in an organization systems and processes. While both approaches involve risk, the consequences of failure can be more severe in a big bang deployment strategy. Fail deployments can lead to widespread disruption and impact all users. Simultaneously. Whereas Barney's adventures typically have a more personal consequence. So some takeaways. So this is the portion of the talk that is most important if you just walked in and need some talking points for your boss to justify sending you to DevOps day LA. Also known as the too long didn't watch portion of the talk. So some of these items were mentioned during the talk and others were not, but all of them will be on your final exam which is worth 90% of your grade. At a bare minimum, document the how, the what and the where details to ensure that anyone is able to easily perform a deployment. Include the why, who and when details, where it's helpful. Consider deployment plans as living documents. Updating them where necessary after each deployment. And make sure that the deployment plan is easy to find and easy to follow. The best way to keep your deployment plans up to date is to use them across all environments. Assuming your organization has staging or pre-production environments, using the same deployment plan in those environments assures that all issues, any issues can be caught early before attempting to deploy to a production environment. Over the show's nine seasons, the characters of how I met your mother evolved and changed. Likewise, the deployment plans are going to evolve and change. Each time a deployment process and each time a deployment happens, it's an opportunity to update the process and be more efficient and fix problems that might arise. You may initially begin with a monolithic deployment, but move to a blue-green or canary deployment plan or a combination of the two. Use a deployment plan that works for your organization and your environments, not someone else's. If you're doing your deployment at 5 p.m. on a Friday, doesn't work for your organization or your team, don't deploy at 5 p.m. on a Friday. When attending talks and conferences, it can be easy to get drawn into the excitement when watching presentations on new tools and methods and stealing a desire to rep everything out Monday morning and replace it with a new shiny solution. Take the messages in these presentations as inspiration for small changes you can make, not mandates that you need to duplicate the presented solution entirely. The final takeaway is a tip here for your time at scale, assuming you're here for the whole weekend. A three- or four-day conference, if you were here yesterday, is exhausting. My advice is to stay hydrated with your beverage of choice. But please clean, heed Chloe's warning here and learn from my mistake. And if someone who offers you a key lime lacroix avoided at all costs. A big thank you to the DevOps days LA organizers for allowing me to speak this afternoon. Thank you. All right, thank you very much, Gareth, for coming and joining us. We've got a short break and then we will be back with some lightning talks at three o'clock. So please make your way back at three o'clock. Also, as a reminder, that wonderful piano you are seeing up on stage is part of Upscale tonight. You're gonna wanna make your way out for some live entertainment and some even funnier antics up here on stage with folks. So come on out for Upscale later on today. Go visit our sponsors and we will see you back at three p.m. All right, ladies and gentlemen, we're gonna get started in about three minutes or so. So in about three minutes we'll get going. Is Tony in the room by chance? Well, Steve? Yes, please. Also, I have not seen Tony yet, so. All right, we'll give it another minute or so and then we will get started. All right, ladies and gentlemen, I'd like to present our next sponsor, Sauce Labs. All right, so Sauce Labs, my name's Titus Fortner. I've worked at Sauce Labs seven and a half years. We are a platform for tests. How many people here are familiar with Selenium? We were started by the founder of Selenium. I am one of the primary authors of Selenium at this point. If you have any Selenium questions, feel free to find me. Started at Selenium, you can also do mobile testing with Appium. We support all frameworks now, Cypress and Playwright, XUI test, Espresso. Essentially, you write your code, you can send it to us, we'll execute it on our platform, you can execute it in your CI. We'll run the test, give you insights into failures, show you logs, give you information about failure, analysis, screenshots, all of the fun stuff to help you better understand what's going on with your application. How many people have been frustrated by an application this week, a mobile phone app? They're so easy to be frustrated, even bugs or just not working the right way, making sure that your users can do the things that you want to do, what you want them to do with their application is crucial and we provide the mobile devices, both emulators, simulators and real devices, all platforms and manufacturers and versions for you to be able to see results, you can also do live testing, manual testing on any browser operating system, mobile device to get the information that you need to make sure that your users have the experience that you want them to, so thank you. All right, and now, hey guys, in the back, can we mute the lectern mic please? We're gonna switch to the handheld. And with that, David with the case to not outsource your metrics. Thanks everybody, hi, my name's Dave Southwell, I'm really excited to be here at scale and DevOps Days LA, I've been coming to this conference for a while now and it's really exciting to get a chance to share this story with y'all, so I hope you enjoy it. And of course, thank you to DevOps Days LA for having me. A little background about me, I was born here in Pasadena, I've been playing with computers since the 80s when I was a kid and in my professional career, I've held just about every role there is working at various different tech companies, sun microsystems, Oracle and some other small companies you've probably never heard of because it never went anywhere. I'm passionate about solving problems and helping people bring their visions to life using technology, so with that all out of the way, let's get to the talk. So a while back, I had an interesting conversation with a colleague, a discussion was around metrics and observability, that's a subject that I'm really passionate about, at least as of recently. And over the course of the conversation, it became clear that they definitely had a preference, they had a preference for buy, not build. And I prefer to build, but I'm open-minded, so I enjoy a good intellectual discussion. But over time, it was clear that they weren't gonna change their mind and I didn't have any reason to change mind either, so I kind of just let the conversation wane. A few months later, this colleague approached me with the proverbial olive branch and said, hey, I got this notice about a meetup for an observability company, I want you to come and I want you to give me your thoughts afterwards. I said, okay, sure, why not? I've been involved in doing a lot of vendor procurement and things of that nature for a while, so I'm familiar with a sales pitch. And this was a sales pitch, but not a very good one. The audience was mostly developers who didn't have much knowledge at all about building an observability stack in-house and they were captivated and I couldn't blame them. Before I really got into metrics and observability myself, I was in the same boat. Look, I'm building an application, I just want a place to get my metrics out there and see what the heck is going on. Lowest frustration method possible, please. But they were captivated by it as I think a lot of us are. Some of these observability platforms are amazing. They have amazing technology and capabilities, they're fantastic. But one of the things that really stuck with me and bothered me was that they were playing off fear a lot in their sales pitch. Fear of, oh, if you run that yourself, it might go down in the middle of the night. Those things, it's open source, you never know. I mean, isn't that ironic? You, of course, absolutely do know because it is open source. But it didn't stop people from nodding along and saying, ooh, yeah, yeah. So I could feel like a sense of dread coming on that I was gonna have more of these conversations with some other development peers who would say, ooh, we should buy this, why don't we buy that? Why do we do all these things this other way? And that was unfortunate. So let's talk about it. Let's talk about the case to not outsource your metrics. So just a quick little background for kind of who the target audience is on this. It's all creatures, great and small. Passion projects, startups, even enterprises. And what do I mean when I'm talking about scope and terms, metrics, counters, histograms, gauges, logs, anything that you use to construct an SLI, an SLO, or an SLA with? I assume we're all familiar with these TLA's. Yes, lovely. There's at least one person who likes the TLA. The joke. And what do we mean when we talk about outsourcing? Pretty obvious, right? You're not deploying your own time series database, your visualizer, whether it's Grafana or something else like that or your alerting system. You're not deploying things like L Victoria metrics or Prometheus, et cetera. All of your metrics and logs are shipped off to some one of the many different vendors in this space. Let's lay out the case and see what you think. Number one reason to not outsource your metrics, and this is actually in no particular order. It's just the order that it came out. Dog food. I'm actually not a fan of verbizing words, although I know that's not very vogue for our industry. A lot of words tend to get verbized, like Googling, which I had a problem with for a long time. Yep, I'm that guy. It's great to be able to acquire and build the skills yourself to run your own metric stack. And that I've found really empowers a strong DevOps practice within an organization too. You don't have to rely on somebody who's outside of the company to talk to you about metrics and how to instrument code. You can talk to your colleague who's right down the hall, who's probably in the same meetings that you're going to when you're developing your code and get real time feedback. They know the application as well as you do when you're building it, at least close to as well. So I think it's a strong way to encourage a DevOps practice. Availability, this was the fear, uncertainty, and doubt that I was talking about earlier that was being pitched by this one particular platform. Don't fall for it. I have real world experience from myself and lots of other colleagues. You can run your own metric stack at massive scale and these things are very reliable. It may be, it used to be the case at one point in time that that wasn't true, but in my experience, it's been dead reliable. Sure, there's failures, but have you looked at an SLA for some of the observability platforms? They're actually not that high in terms of nines if you can even find the definition, which you probably won't find unless you end to enter a contract with them. By that time, it's too late. High fidelity, you're probably familiar with this too. Usually you're forced to make a choice. Oh, do I instrument this? Do I not instrument that? Do I use a custom metric or do I just take one that's there? You don't have to make that decision if you've got your own stack that you're running. You don't have to down sample ahead of time. You can down sample it later. There's no excuse to not instrument every part of your stack. Lastly, cost. I'm sure you saw this one coming. We've all seen those stories. Oh, I got a really big bill at the end of the month that I didn't expect. If you run your own stack, it's a lot more predictable, and you can really optimize and tune it to your exact specific use case. Okay, so I don't want anybody to get the impression that I'm trying to make an argument of this or that. It's not this or that. We're spoiled for choice. We have the choice to use both if we want to. There's nothing that says that you have to use entirely one or entirely another. It's a false choice. Don't fall for the false choice. And if you're not familiar with how to run your own stack or build your own metric stack, it's easy to learn. There's a ton of resources out there to bring yourself up to speed and it really doesn't take a lot of time or effort. And some of the solutions that come from observability platforms are amazing. They can do the kinds of things that, well, yeah, if you've used them, I'm sure you already know. And that's it. That's my case for not outsourcing your metrics. Thanks for listening. Thank you very much. That was an awesome talk. All right, and while our next presenter gets set up, we actually had someone drop out. We haven't seen them today. So we're gonna pull someone forward. It's gonna be Michael Stonkey, but in the meantime, while he gets set up, I'd like to present one of our wonderful sponsors, Sigland. Hi, everyone. How are you doing? Thank you, thank you. Are you tired of your expensive observability solutions? Is your observability infrastructure getting too large? We have a solution for that. We have built Siglands specifically to solve that problem. We have built this database, Observability Log Management and Tracing Database from scratch. It's 100x more efficient than Splunk. It's 1000x times faster than Elasticsearch. 54x times faster than Clickhouse for Logs use case. And we also did a comparison against Grafana Loki, and we are 90,000x times faster than Loki, specifically because Loki doesn't support aggregations. Give us a try. Your laptop can do pretty much eight terabyte per day of volume, even the smallest laptop. For that same volume, you need at least 50 such machines for Splunk and Elasticsearch. We can scale from one terabyte on one virtual CPU, all the way up to one petabyte on 800 virtual CPUs. If you go to our blog, Siglands.com, you will see all this performance benchmarks, public benchmarks that we have done, and the one petabyte test that we have done to compare for yourself. And we are open source, so Apache 2.0 license, so feel free to give us feedback or try us out. We came out of stealth three weeks ago, so we are very new, and we love your feedback on whether this solves your problem or not. Thank you so much. Thank you very much, Siglands. And now, Michael Stunky. All right. Hello, everyone. It's dark in here. So sorry about that, but I have no control over it. So, I'm Michael Stunky. I'm the vice president of engineering at FLOX. We are a startup that is trying to solve portable developer environment problems. I'm not gonna talk about much of that in this talk, but if you wanna check us out, FLOX.dev, we launched our product on Wednesday, so you could be like in the first 48 hours, if you really wanna be. So, in my career, I worked in big enterprise, and then I went to startups that primarily did developer and operational tooling, so I've been in this space for, we'll call it 15 years at least, and I was definitely doing some of that stuff at Caterpillar as well. In the last year, I went to FLOX after I was at Puppet and Circle CI, so I'm gonna talk about experiences from all of these places and how I've worked with engineers and executive teams to justify investments that people are making. One of the things that I did when I was at Puppet was I worked on the state of DevOps reports. Who here has read a state of DevOps report before? All right, who here has just heard the metrics? All right, okay, no one even wants to raise hands, I'm just gonna be done with that, it's fine. No, so the state of DevOps reports stuff started at Puppet, Dora started at Puppet. I worked with some of this stuff from 2018 on as an author of these reports, talking about engineering efficiency, technology delivery, things like that. So that's where a lot of this experience is coming from. When I left Puppet, I went to Circle CI where I had this giant data set of what people actually do versus what they wrote down though what they did in a survey. So I was able to look and see, were they telling the truth on those surveys and things like that, because I could actually look at real behavior and I can tell you that there are some sizable discrepancies. If you have ever read one of these reports, these are like the four main metrics that everybody says you should be measuring and this is how you know if your investment in DevOps is working. And today they might say if your investment in platform engineering is working or developer experience tooling or whatever it is that you've named it, still basically the same set of problems. So this is a page directly out of one of those surveys. And then you have a page that kind of describes how they cluster group everybody and tells you where people are and who's elite and who's not and who's in what tier and all of this. And so you start zooming in and you're like, okay, I was running an engineering team in 2022. And we filled out our own thing and we ended up here in this elite 26% best of the best. We're badass, let's close up shop, let's go home. Awesome. But I said I was gonna talk about how do you know your efforts are working. So when you have engineering, there are basically two types of things that everybody cares about. One is metrics of how the product is performing and what's going on. Those are usually more owned by product management but product engineering sometimes owns them. That's great, nothing wrong with that. And then you have the internal side of things that are going on, whether it's platform engineering again or DevTooling or DevX or whatever, it's gonna be a lot of things. But the things that you're measured on are basically availability, the cost, the security and the throughput. Those are basically the things you're measured on as an engineering leader. Then you can have like sentiment and adoption which are more product management for the platform. They like do other developers like to use your platform? Do they enjoy it? Do they actually use it? Things like that. If this wasn't much longer talk, I would go in depth on every single one of these. But because it's not, I'm gonna pick the one that I think is actually the weirdest which is developer productivity. And as an executive, I have to work with developer productivity a lot and I think about it a lot. And everybody that's ever been a developer says please don't measure me, I hate it. It sucks. And in some ways it does. But as a leader, I have to say you're spending how much on payroll and what are we getting for it? So that's why we have these conversations. So like I said, I was running this team in 2022 and we did our own assessment and we ended up in this huge elite tier because again, we're badass. So do I believe that I was in the top 76% or I was better than 76% of all organizations at least and I was in a cluster with the top 25%? Not sure I do. Okay, so why is this telling me this? How is this information coming up this way? Oh, I see. For the primary application or service you work on, blah, blah, blah, blah, fill out a whole bunch of other information. Cool. How many people work on more than one thing? Everybody. So when it says for the primary application or service, which one did you think of? The good one or the bad one? The best one or the average one? The one you were happy about today or the one you were mad about today? It doesn't, I don't know. I have no idea. I have no way of knowing this. So this is completely based on the person filling out the survey. It's based on their opinion, sentiment, mood, everything all the time. That's what a survey is. That's why they're worked this way. Writing these questions is hard. I'm not trying to invalidate the survey. I have written these surveys, but they are incomplete measurements. They should only help you as a guidepost. So like I said, we were in this elite tier. I was pretty proud of it. I sat down with my TCO and he's like, do you feel like we're really this good? No, no I don't. I don't feel elite. Elite's such a cool word. It's like you're the best of the best of the best. You're top gun, you're doing all these things. I'm not elite. Why? Why don't I feel elite? What even is elite? Like let's just back up for a second. If I'm gonna be elite at technology delivery, what does that mean I'm gonna do? So I went and had a long think about this. And I think if I was elite, if my engineering team was elite, it would be all about moving fast with confidence. I can deliver what I want at a very rapid clip and it's not gonna break things. That sounds elite. And also sounds pretty simple, but simple, not straightforward. So the other thing that I always have to do with these metrics was talk to the rest of the executive team. The CFO, the COO, CEO, all those things about what's going on in engineering. The payroll is lopsided in that you're spending usually way more on developers than you are anywhere else in the business. And so you better be able to justify this stuff. So I started thinking about this. Okay, if I'm gonna have an audience of executives and I have to have engineers that feel like they can move fast with confidence, how can I solve both these problems at once because I don't like doing more work than I need to? So what are the categories of things I care about? Is the engineering throughput good? Engineering is a lot of times like raw horsepower. Product management's the steering. Engineering is the engine, just the accelerator. So that's the way I look at it a lot of times. Is the throughput good? Is the throughput meaningful? This is actually the thing that the executive team cares about. They don't care how fast you're going. They care about, are they getting the thing they asked about? The salesperson cares about the next feature they're trying to close a business on. The CFO might care about the thing that they think is gonna tip over the renewal, whatever. So if I add more engineers, will I go faster and can I prove that? CFO needs this. So I have metric thoughts on a layer quick. Quality, people, throughput. These are the three types of themes. So let's get into throughput measurements. Deployments per week. This is deployment frequency. It comes from the Dora metrics. People are like, hey, how many times do you deploy? And the answer is if you can deploy when your business wants you to, you have achieved this metric, move on, it is unimportant. Does adding more developers mean that you can go faster? Hmm, so if my deploys per week were, let's say 12 before, I had a new developer and now I have 13, that's more. That's good, right? Well no, because I have more people and I've only got one more deploy. And that's why you have to normalize this in some way. And so when you hire groups of people or you have attrition or whatever, this number should still say relatively the same or improve or show off your efforts of what you're doing to improve throughput. This is my single favorite metric in all of engineering in case you're curious. So you can see I've mapped it here. Christmas time, no one works. Every other time people are doing awesome, it's fun. There we are. So I have PRs per engineer and deployments per engineer. They map corally. Narrow versus wide work. Narrow is work on the product. Wide is stuff around the product. Wide might be terraform, might be documentation, might be tests, might be things that don't ship out to the customer. You update monitoring configurations, things like that. What percent of the work do you think is wide versus narrow? In my experience, about 30% of work is narrow. Most people want way more than that if you talk to an executive team. They get very upset when they think that only 30% of their payroll is working on the outbound product. Then you get to people. Are the systems easy to work on? If you go to one service, work on the next service, does the dashboard look the same? Does it deploy the same? Does it install the same? Does it set up the same? Is it written in the same language? With the same tool chain? Is it anything the same? Oh my God, no it's not. So, we want fungible engineers. I'm an engineering leader, all I've ever wanted was fungible engineering. As an engineer, that is bull crap. There's no such thing as fungible engineers. But I can make things more fungible than they currently are. So, the Mythical Man Month says I can't just add people when things are gonna get better. I read it. It's a decent book, it's fine. So, how can I make the systems more able to pick up those changes in people? I can get them easier to work on. More people can work with them. I normalize things. This is very important. This is also the first step you need to take to improve any outcome in the DevOps space of metrics. Like any of those Dora metrics, the fewer variables you have, the better all of them will go. So, this is good stuff. So, what percentage of your code is provided by your platform? That's a great metric if you can measure it. How many libraries are being reused? What stuff is not unique in each service? That is so awesome. Not easy to measure. I did it anyway. So, we wrote some code to do this and we started to figure it out. Hey, 26% of our stuff is covered here and then later it was 78% because they adopted a new set of libraries that our backplane team had written which was developer enablement. We wrote a whole bunch of stuff and it went through every repo and it gave you a score of how normalized was it, how in line with it, our standards was it. And when they were out of line, you had a dashboard and people would be like, well, I don't want to be at the bottom so then they start fixing things. Awesome, pressure works. So, how quickly are these people productive? We're on people metrics, how quickly are they? I measure the time to their fifth poll request because the first one, I added an SSH key to Terraform. The second one, I updated some documentation when I was onboarding. The third one, probably that as well. Fourth one, might be okay but I'm gonna go to the fifth one just because I think it's probably the right number. It's probably meaningful at that point. And then I come back to deploys per developer per week. If I hired a big group of people, I will actually see the deploys per developer go down for a little bit because they're onboarding new folks but then I should see it come back up and I should actually have more throughput because all of those people are now productive. And then you get to quality, successful deployments per week. I hope you're measuring this already because if it doesn't work, it's broken. Rollbacks, roll forwards. And then understanding all these changes that are happening, something breaks upstream, downstream, what's going on for you. So, we have release frequency, releases, rollback rate, and success rate. We have a metric of layer cake of quality people and throughput. So then I sorted these in my brain because I was trying to make a chart and I wanted to actually be able to explain this to anybody who wasn't me. So, I have baseline DevOps metrics. I don't feel elite. I wanna move quickly with confidence. I have three themes of metrics. I've gone over nine different metrics of things that you can measure that are beyond the original Dora sets of metrics. If you don't have the original set of Dora metrics, I do recommend collecting them. I do think they're a good starting point. I just think there's so much more beyond them. I think fungibility and understanding changes going on around you, I don't have good metrics for yet. I keep working on it, I keep trying and I don't have the ones I want yet. I also think that I have categories, some of these, in ways that may not make sense to you. You might think normalization is a quality metric and not a throughput metric. Fair enough, we're just labeling. So, back to this. We have what you're measuring on an engineering, which is the four things on the left side and you have two product things that are developer-centered and developer-adoption. I want developers to adopt a platform. I want them to enjoy it, but I also need to show that there's an ROI for it to my executive team. That was not supposed to do that. I'm gonna pretend you didn't see it, it's gonna go fine, there. So, in summary, what do you want your metrics to tell you? You better figure that out before you start collecting them. What decisions will you make with those metrics? Also important, and can you correlate, can you do things in your department and see that it correlates to a metric moving if you intentionally try and increase throughput? Do you see that? Do you measure it? Do you know it? Have you joined with product management to talk about it? These metrics are right or wrong for your company or your department? And then, deploys per dev per week, like I said, is the first metric I would add after you have a baseline of Dora metrics. If you wanna learn more, I've written lots about this. I've talked about this at length. I have a lot of ROI stuff. I love talking engineering business. I'll be out in the hall or whatever. This Twitter article from 2015 is my favorite developer productivity article I've ever read. So, if you haven't ever read, let 1,000 flowers bloom. I really recommend it. Anyway, I'm Mike. I work at Flock's. Try us out with just GAID. And thank you for having me at DevOps Day LA. Thanks, Chris. Thank you very much. And with that, we've got a little bit of a break. We will be back here at 4 p.m. with Mandy Walls talking about awesomeness. I'm not sure what exact it is yet, but it's gonna be awesome. Oh, chaos engineering. You gotta come back for it. So, all right, we'll see you all at four o'clock. Thank you very much and see you in a few. All right, ladies and gentlemen, we are gonna get started in 10 minutes. Just the 10-minute warning will be getting started in about 10 minutes. All right, folks, we're gonna get started in about five minutes. Just a heads up, it'll be another five minutes or so. All right, this is your one-minute warning. Once again, your one-minute warning, we will get started in about one minute. All right, and SolarWinds is our next sponsor on the Sparkly DevOps microphone. Thank you. I'm honored to have the Sparkly microphone. Hi, everybody, I'm Bryce Mott. I'm a solutions engineer over at SolarWinds. This is my first time at scale in DevOps days. It's been a really cool round of applause for everybody putting this on. It's awesome. I hope I get to come back. We're here showing off several of our products, but mainly our new observability platform, which is a SaaS-based product. SolarWinds has been around since like 99. We have tons of on-prem products that people know us for, networking and infrastructure monitoring, and now we've combined that with a lot of our other SaaS products into one platform so that people can not have to manage the infrastructure, but still gain visibility into their whole stack. So if you have some time to come by to the booth and check it out, please, we're right outside the doors here on the right side, and we'll be raffling off a Yeti cooler at 545 so you can enter your name into that to win the Yeti cooler, and I'd be happy to answer any questions you might have about the type of things we can support. Thanks. All right, and next up is Mandy Walls with Plan for Unplanned Work. Let's start my timer, so. All right, yes. Hello. It's nice to see all of you. I don't always get to this conference. I've been here a couple of times, but it's always fun and I recognize a bunch of faces, and it's super good. So I'm glad you're all here. I hope you have a great weekend. I have to leave tonight because I'm going to Paris next week for KubeCon because there are a bunch of jerks that they put their thing back to back with this one, and that's a lot of flying. Right? So I am Mandy Walls. I'm a DevOps advocate at PagerDuty, which basically is a fancy way of saying dev rel for a sass, right? Because it is what it is. I'm always up to chat about all kinds of stuff. If you wanna hit me up, I'm LNXCHK on most of the socials. I'm mostly on Blue Sky right now, but I'm also around on LinkedIn. And if you're old school, you wanna email me mwalls at pagerduty.com. It's totally cool. So, what am I gonna talk about today? PagerDuty, who knows PagerDuty? Who's currently on call? Bless you all. Thank you, thank you. So PagerDuty sort of out as alert management kind of stuff, right? We page you when something goes wrong, and part of that is how you then handle those incidents and how you get better at that, right? Because it doesn't just happen organically that your team is coming in from all kinds of other experiences knows how to respond to an incident. So how do you do that? How do you get better at it, right? We want to improve incident response. So we need to practice. But to practice incident response, we gotta have more incidents. Once more incidents, right? No one else wants to have more incidents. So you have to figure out how do we go about building this muscle, building this practice? So folks don't freak out when it's time to respond to an incident because things will break, right? Who's never broken something? Like, you know, like, I don't recommend doing it on purpose unless you're gonna do it this way. So, PagerDuty has a long history of this practice. I've only been at PagerDuty for years. The company's been around about 14 years. And the engineering team's been practicing this in various ways, most of the life cycle of the company. So we call them Failure Fridays or Failure Any Day, and we'll talk about that in a minute. But it's a very explicit practice that the engineering teams take on as they're pushing new features into production. They wanna know what's gonna happen if something bad happens in the environment. So who's run Game Days? You have a practice like this, a couple of folks. Yeah, not too many right now. And like, folks are kinda like, you wanna do what now in production? You wanna break things? No, don't. So, we want to, right? But we wanna do it with a purpose. We wanna be intentional about what we're gonna do. Now, the way some of this chaos engineering stuff that started back in the old days, right? Some of you were still in middle school. Some of us had not gray hair at the time. That Netflix was kind of notorious for this. Somebody would walk through their data center and just pull cables, right? Amazon too, right? They'd walk through the data center, pull a cable, somebody has to figure out what happened. Hopefully it's not payroll, right? That they're pulling, but something went down. And then the rest of the team would have to figure out and coordinate and work through fixing it and dealing with that, right? Most folks are not doing that that way anymore. We've now kind of distilled the practice down. It doesn't look more intentional that we can be very specific about and have more of a sort of scientific approach to it. So, we borrowed this definition of chaos engineering from our friends at Gremlin. And their definition is chaos engineering lets you compare what you think will happen to what actually happens in your systems. You break things on purpose to learn how to build more resilient systems, right? Amen. So, we're out there and trying to figure out when I do change something or something changes around me, what happens to the things that I've created that I've deployed and how to work with that. But it's very intentional. And it becomes a way of validating all the assumptions that you have about your ecosystem. You have an engineering cycle. Your writing code, it's going into production. When it gets into production, how's it going to behave? You've got like, you put it out there, runs in a steady state for a while, and then we're gonna say, okay, this new feature is running pretty well, but what happens if the dependency it has goes down? We think we have done some defensive coding. We hope we have, but we're not sure 100% that what we've got here is going to work. So we hypothesize about what might happen. Then we can run the experiment, we can verify that, and then hopefully we can put those production improvements back into the life cycle. We'll talk about that as well. So we're hoping to make the practice part of our regular distribution of the code that we write. So we wanna be intentional. Our goal is really to learn. We wanna know exactly how things are hopefully gonna work when they get into production. It's hard to know exactly, though, in a microservices environment. So there's always gonna be things that are gonna be in flux. Microservices environments, I feel like they're kind of like stepping into a river. Philosophically, you're not stepping into the same river twice, right? The water has rushed by you. And that's what's going on in microservices environments as well, like all those other hits and customers and requests are all long gone. So we wanna learn about what's gonna happen when things go wrong in our environments. Our goal is to overall improve our reliability. We wanna improve the resilience of the experience for our customers, right? It's all in service of making sure that our customer experience is maintained even when things kind of go wrong on the back end. There's lots of ways to think about that and to do that. But we're not after doing this because it's fun, even though for some folks it is totally fun. Like I think that dude over there, he's like totally into it, right? It's a really good time. But for other folks, it can be pretty stressful. And they're like, okay, we need to have an end goal, right? And it is making sure the customers are happy. Even if you've got internal customers, like if you're working in IT and you wanna do these kinds of things, it's totally great. Now, when we've talked about game days and some of this stuff originally, they were very large kind of events. It's kind of like a carnival, right? When folks were still in the office and it'd be like a war room and everybody'd be in there and then there'd be pizza and it'd be very exciting. You don't have to do that. You don't have to schedule a big all hands kind of outage to work with this kind of practice and use it to improve your systems. So our teams are in the process of devolving a monolith, right? Like I said, Pedro, he's like 14 years old, so you kind of know the vintage, right? And as things get moved out of the monolith, all of those engineering teams have the opportunity then to test their sort of little happy place of heaven, right? And do all their tests as well. And you can do these very, very niche, very, very short, very, very focused because we're after this sort of scientific hypothesis and we're being very specific about it. So this is an example of one we have a bot that helps us out with these. And we have a channel in our Slack where anybody can follow along with a Failure Friday if you're curious about what's going on, right? You should wanna see how it goes. This is a short one. It started at 107 p.m. It ended at 131 p.m. The team went in, they started their test, they learned the thing that they wanted to learn and then they shut it off. So it didn't have to be a big carnival experience. It just had to be what that team needed to get out of whatever they were testing, how does this new feature react to this particular stimulus so they can do that in a very focused way. So what do you need to have sort of in advance, right? So I mentioned we've got a bot that helps us. There used to be a tool at Patreon called Chaos Cat which has been gone for a while and I was like, well, does it just like knock everything off the table for you? Like what's it doing? But it's been replaced. And there's plenty of commercial tools as well that can help you out with your some of your Chaos Engineering practices. But you need to have some things in place to get the most out of it, right? To help you actually take what you're gonna learn and then improve upon it. The first thing is monitoring and telemetry. And hopefully for the folks in here, you have this like nailed down. You have the best fucking monitoring and telemetry for your systems. But I will tell you, as a vendor, that is not a universal experience, right? So there are some of our colleagues and other companies, they're still at this part. Hopefully you're not. So you've got monitoring, telemetry, maybe you're doing tracing, you've got your observability tools. And because it's not gonna help you figure out, hey, this is what's happening when the database is slow and I'm not getting any signal out of any of the systems. So I have to have at least something to validate that, oh, you have a database, we made it slow and somebody's gonna get told, right? You wanna have some kind of response process. What kind of alerts are coming out of your monitoring and your metrics question? Where is that going? Is someone getting alerted? Does anybody, it's a tree falling in the forest, but it doesn't come to page or duty so you don't know, right? And then you wanna talk about your workflows. And this can be different for every team. And you can also have like sort of more engineering-wide practices for like major incident response and things like that. But you wanna sit down with your team and talk about this is a test we're gonna run, here's how long we're gonna run it, here's what we're gonna do. And then you'll have to get to a point where you acknowledge that you're gonna maybe learn something that you're gonna have to evaluate. Maybe you need to make some improvements. Is your product team even on board with that? Are they gonna take your recommendations? And that's a whole other discussion to have. We'll talk more about that in a minute. So part of this can be incident response right, there's plenty of other vendors who can talk to you about monitoring and telemetry. We do incident response, so as we're practicing with our failure Fridays, our failure any days, part of that is bringing in some of our incident response practice. Does anybody have like an explicit major incident response practice or anything like that? All right, you guys are my people. Okay, so for the rest of you out there, get an incident response practice. But what you want to have is a way for folks to know what to do when something goes wrong. This doesn't happen organically. And for some folks, I am a recovering sysadmin. Like I totally get it. I'm your second AOL alumni talk today. So Andrew and I used to sit two rows away from each other at AOL years ago. And so for folks who have not been in one of those kinds of environments, their first outage, their brain might just completely go blank. And they have no idea. They don't know their name. They don't know where they are. They don't know what call to join. They don't know what Slack channel to join. They don't know where the dashboards are blank, right? So some people have that kind of stress response. So you want to give them a bit of places to practice so that they know here's how you get the dashboards. Here's the channel to log into. Here's how the incident commander process works. They're very friendly. No one's going to bite you. It's going to be okay. But a lot of folks don't get into that in a natural way. So you want to get to a place where you're practicing that. And these kinds of chaos engineering practices are really good for that. You are simulated in an outage. You want to simulate what happens downstream of that outage. What your organization expects you to do. They expect you to respond to the alerts, come to the channel, log on to the Zoom call and participate, right? It helps you get organized. It helps you be explicit about all of those things so that folks aren't second guessing themselves. That they aren't in the moment when it's a real incident freaking out and not knowing where things are. So when we test IR during game days, there's a bunch of different things we can make sure are in place. Because these are things that we don't want to have falling on the floor when there's a real incident or real outage. Our first is our alerts. Are we alerting on the right stuff? If I'm putting a slow query thing onto the database and I'm not getting anyone telling me that, there's no alerts coming out of that system. I'm not gonna know if it happens in real life. If I take that back end dependency offline and none of the dependent systems tell me, how am I gonna know, right? So making sure all those alerts are configured, making sure all of that stuff is coming out of your systems and getting into a place where your humans know about it is super important. And if you're not thinking about it all the time, super easy to forget, or it's not part of your platform engineering or whatever you're doing yet, you wanna drop that stuff on the floor. But you don't want that when there's a real incident. The next part's notifications. Love this one because you get on a call and everybody's phone is blowing up and it's amazing. But people get new phones all the time. Making sure they've put their new system into whatever you're using. Making sure that they've got it set so that if they don't respond to the first beep, it falls over, gets the next one, whatever it is. Mine like push notifications and a text message and then it calls me on the phone and the robot tells me what's going on, all that great stuff. So you want all of that. And you want the coordination part. Where do you go? If it's a major incident, there's a specific permanent channel and a specific permanent Zoom call to use. Otherwise, go to the incident in your incident management platform and click on the URL for Zoom or Slack or Teams or whatever you're using so that folks know exactly how to get from responding to the alert to participating in the recovery. And talk about troubleshooting and resolution. This is a really good one to do with all of your new engineers. Someone joins your team. And I have been on SEV-1 calls where there's a new engineer on some team and they're like, I've never done this before. And I'm like, shit. As the incident commander, I'm like, damn it, where's the other senior, right? So you don't want to have your new engineers get on the call and say, I've never restarted this service before. Give them an opportunity to practice restarting the services. Give them an opportunity to practice scaling up the services, turning them over, whatever kind of things you see most commonly in your systems. Do you get to the point where you need to deploy new pods or whatever you're doing, making sure everyone on your team has a safe place to practice that so that when the time comes and something has to happen for real, they're familiar with it and they're comfortable with it and they don't have any doubts and they don't have any questions. And they're like, I got this. So we're gonna set some goals. We're gonna go into this very, like I mentioned intentionally, not just randomly picking things, but saying, you know, maybe we had a bug that came through and it went out through the last sprint and it's been running in production for a little while. We've given it some time to bake in. But did we really fix it? Maybe we haven't had any customers calling about it, but we wanna be sure. So we can go in there and we can flex it. This is really good if you deploy stuff under flags and it's not actually being surfaced to the customer yet, but you can flex it with your future flags. Also good for stuff you can't replicate in your non-production environments and I'll talk about production, non-production testing in a minute as well. But you get a lot of stuff that might be impacted by slight differences. And I say slight, I know what the differences are between most people's dev integration and production environments and that's slight by any sense of the definition. But you might get to a point where you don't have a good place where you've replicated production. Maybe it's by the load or the traffic or whatever it is. You wanna do that in prod, right? So you can do that as well. That can be one of your questions, one of your goals. You can also do bigger stuff. DDoS or other attacks, right? Because the last thing you wanna do if you're under attack is like nobody in the security team knows how to like call your CDN or the stuff that's in front of your systems to have those IPs blocked and that kind of stuff. Like all that stuff can be practiced as well so that folks know exactly where those things are. Who to call, who's your contact at the service provider. All that stuff is super important. Because you know it's on a Wiki page that hasn't been updated since 2021. Is it really the right data? Is it really the right information? If you're not flexing it, you don't know. So we're gonna hypothesize. We're gonna say, all right, while we solve this thing, this behavior works this way. We think when we turn this dial it's gonna scale up automatically. Or maybe it's gonna fail over automatically. We're gonna fail over out of San Jose into Oregon or whatever. Maybe things are just gonna slow down a little bit and that's within our SLOs and people are gonna be okay with that, it's fine. Or maybe there'll be some kind of grace or replacement where we have to like completely fall things over. So depending on how much automation you have in place there might be a bunch of things that could happen sort of automatically as you're testing these and you can test all that automation as well. So for very sophisticated environments it's still super helpful to be doing these kinds of tests. And I will say that some of our environments are a little bit more sophisticated than others as things get migrated. So we have all kinds of things that people play with. Then we wanna talk about what actually did happen. Like we know what we think was gonna happen, what we hoped in our heart of hearts was going to happen, but pin out of something. So knowing exactly then what happened, super important because then you know what to look for if it happens for real. And hopefully you can shore it up and fix it in the meantime. And then you really wanna talk about what you learned. Who does postmortems? Excellent. And do you post them somewhere? And nobody ever reads them? Yes, that's exactly how they work, right? They are beautiful and then no one talks about them. When you're working with these kinds of things especially like a large team that shares a lot of shared platform stuff, like when you're learning something especially about defensive coding or graceful degradation or something in your corner of the environment, sharing that stuff out, especially it's like a brown bag or a lunch and learn or having like a community of practice or a center of excellence or whatever you wanna call it is super helpful with these, right? Because if you've run the test and I have the same stack, I shouldn't learn from your test. Maybe I don't have to run it then, right? Like I'm just borrowing what you guys did and I can move on to the next question and we don't have to repeat ourselves with all that stuff. So we get really good at talking about what you did, what you found, what the impacts are, what you're gonna do about it, all that great stuff. Some folks for larger coordinated exercises will do an entire postmortem or retrospective or whatever you wanna call it with the full documentation and the timeline and all the things that we tried so that we know what happened. And we know if we see that again in real time, what we should be looking for. Because maybe we pulled up the wrong dashboard. Maybe we were looking at the wrong data to begin with. It turns out all was over on this dashboard. Like not that I've ever seen that, right? So things happen. We wanna learn from it. I'm gonna tell as many people as possible so that everybody can learn from it. And then we wanna get all that stuff back into the product cycle. Who has a really good relationship with their product manager? Like your buds and you talk all the time. Yeah, a handful of people. Everybody else is like, not sure who my product manager is. Sometimes we talk to these people that come into our meetings. They've got slide decks and wire frames, but we don't really listen to them. Getting the stuff that you learn back into your product cycle is probably the most challenging part of this because it's not technical, right? This is an organizational cultural thing. Because most of what you're gonna learn in these kinds of tests are sort of non-feature operational requirements kinds of stuff, right? The system must keep running in order for customers to pay us for it versus it must have all these amazing features. And getting that stuff back into the product when the product manager's like, oh, I got features all the time. Feature, feature, feature, feature, feature, feature. And that can get in the way, right? So having a good conversation with your product manager about what you're hoping to learn and what the potential is to then do about it after you've gone to all this work because you've got the plan, you've got your hypothesis, and you've made all this scheduling, and yada, yada, all this stuff, you wanna be able to use that. You can also use these tests to help with your SLOs. So who's using SLOs? Man, like, come on, what the hell, right? Okay, SLOs are super interesting, right? So if you're not super familiar with them, the idea is that a lot, especially in a microservices environment, we have a lot of moving pieces and a lot of things can go sort of a little bit wobbly, a little bit, I'll tell you a story about, okay, back when I was at AOL, right? So I got a call one Thursday night, right? And Jeff, Steve Case was in Hawaii on his pineapple farm or whatever he's doing over there. And he's like, the CEO can't get to the main page. I'm like, I don't care. He's halfway across the world. Everything's fine from here, right? That's not the call you want, right? We don't wanna wake somebody up in the middle of the night for something that's weird or transient or it's like something your packet got lost in Topeka, right? We wanna get to a place where the things we're alerting actual human beings about are actual real problems. And one of the approaches to that, that's sort of emerged in the past few years, is making use of service level indicators and then service level objectives for those indicators. So your indicator doesn't necessarily need to be, we need 99.99% uptime. You're not dial tone, so you're probably not getting there anyhow. But picking the things that are super important to you, whether it's the time to respond or our time to serve the whole page or whatever it is, setting a good threshold for those. And then if you don't get too close to the threshold and some transient things come in, you just ignore them for a while, right? So you're only waking people up if you have a big problem. So you can set those kinds of goals and they're a little bit more granular than just saying we need you to be up all the time. So the thing about those is they are internal tools for your team to talk about how your service performs, how you are focused on giving the best experience to your users. And certain things can sort of fall apart and users may not care, right? If there's one little widget in the corner isn't available that nobody clicks on anyway, it's just kind of there. Who cares, right? So you can use these as an internal tool to sort of modify your own expectations of your reliability and you can flex them then during your chaos tests to be able to say, oh, you know what, we added this new feature and this new defensive coding practice that we've adopted, whether it's red button or feature flag or whatever it is, allows us to get a little bit more room out of our SLO. That sounds great. That means the next time something happens for real and incidentally it happens, we can delay that. It's not as bad as we thought it was going to be and the customers are still going to be okay. So you can use that then to adjust how you manage the environment when it gets into production. Super helpful because the last thing, like when page or duty, man, we love page and you but not really like we want to be your friend. We don't want you to wake up in the middle of the night. So we want you to sleep because we want to sleep as well. So when I give this talk to some of our customers, the first question that we usually get is, should we do this as a surprise? Be like, yeah, we got one right answer over here. There'd be no absolutely flippin' knot. And that was part of sort of the original lore, right? Of chaos engineering is that you didn't know what was going to happen. You're just going to randomly pull a cable or whatever. Most of the time your engineering organization is not really going to be ready for that. Like this is a practice that takes time. It takes effort. It takes a lot of organizational support, right? To dedicate these resources into this kind of testing. And it's gonna take you a long time to get to the point where like, you could pull anything out of your cloud account across anything in your environment and every possible team that's in your organization would be able to respond to it. Don't do this to people, right? Be very intentional and be very specific about what you're expecting from them. The other question is, when do you run one of these? And again, we're not surprising people, right? The other thing is, because we do a lot of ours very small in each individual team in the organization, and there's, I forget how many 20-some odd teams there are, they can do any of their tests any time on their sort of corner of the system. Any day, right? They can get on the schedule. It's in a wiki page and just put in what day they're gonna do what. And then it pop into the channel and everybody can follow along. Now, if you want to coordinate across lots of teams, right? That was sort of the original plan when everybody was in the monolith. Everybody coordinated together and everybody was in the war room and had pizza. Then that takes a little bit of extra coordination, right? Talking to other teams and figuring out, okay, we're gonna do this thing on Friday. It might touch some of your stuff. Do you wanna be part of it? Do you wanna be part of the exercise? Do you have other things you wanna test? And all that kind of stuff can be there too. But if you're gonna do it in production, you wanna be very conscious of how it's gonna impact your users, right? And you don't have to do these in production. You can actually run game days in your other environments to get some practice, to make sure you're flexing all the things that you need to flex, to make sure you've defined your alerts correctly and all that kind of basic stuff, and that's fine. Originally, when folks were pitching game days in chaos engineering as a real practice, they were like, you have to do it in production. I'm like, no, no, you absolutely do not. But if you're going to do it in production, you do it at a time when the system is kind of stable. Things are kind of safe. If you're working with SLOs and error budgets, you have some space there to be able to say, oh, we're only at 80% of our error budget for this month. We can go ahead and run this for a while. Don't blow out all of your error budgets on testing. That's never a good idea, because then you don't have any left if there's a real incident that happens. And then they ask, when do you not game day? Now obviously, if your platform isn't stable, you don't wanna run a game day, right? If you're having some errors and you're trying to fix them, that's great. You should be trying to fix them, right? But at the same time, your customers don't know what you're doing. They don't know that today's instability is a test versus yesterday's instability, which was actually a bug, right? So make sure that you're preserving the customer experience, user experience, when you're doing these kinds of tests. The other thing that comes up is another non-technical question because everybody loves a real earn, right? But you don't game day, bring your nuts. But you don't game day after a real earn, right? Someone's just inherited a new service. You're like, we're gonna game day on this tomorrow. And they're like, I don't even know. I don't have permissions to this stuff yet. Like they're not gonna be able to help you and it's not gonna help them, right? So make sure that the platform's been stable for a while. Make sure your org has been stable for a while. Make sure your other tooling has been in place for a while so that people have a good experience of learning from this entire process because you need to get things out of it. It is sort of, you're putting time and resources into it. You wanna get stuff out of it, right? It's not just, it's just fun, even though it kinda is. So that's the summary there. So we have a bunch of resources about this. We have, like I mentioned at the beginning, PagerDuty's been doing these for a long time. So we've got a couple of blog posts about the things that we've been doing, how we approach it. And there's an earlier post and then one from last year that sort of updates on how things change as the org grows, right? Because we have a lot more engineers now than we did when we first started. The tooling has changed because this part of the industry is maturing and it's now been commercialized. So early on, it was a lot of open source projects, a lot of ad hoc kind of things, a lot of homegrown stuff. So a lot of those things have been improved. I also have on the link there, I had two of our senior engineers on our live stream on Twitch last year to talk through some of that history and share some of their experiences with this as they've kind of matured the practice. One's our staff SRE, one runs our DBRE team. And so we went through some of the practice there. So if you're into taking a look at some of this stuff, that one in particular is super interesting because they did kind of get into some of the things they learned, some things that have been changed. We learned about the demise of Chaos Cat, unfortunately, and all that great stuff. But if you're looking for ways to get started, things to think about, things to talk about with your boss, right? If you're just like, we don't do any of this and we're like blindsided when something happens in production, right? We break it down into the components that you wanna try and improve, right? We wanna have a plan, but you can have a plan for your alerting and how you wanna make sure that it's okay. You can have a plan for notifications and making sure that everybody has got the stuff that they need and knows where to go and the coordination and all that kind of stuff. And then you can have a plan for what do we actually do when we have an incident in production. Hopefully, your team has decided or has sort of illuminated for everybody what you do when something goes wrong. Are you a fixed forward shot shop? No, are you a rollback shop? Are you a pull it and pray and hope that somebody's gonna fix it offline and then you're gonna try and backport it into the configuration management or like somebody's gonna fix it live and then it goes back into GitHub. Hopefully someday, right? So have a plan about all that stuff and make sure you're sharing it with the other teams in your organization, right? It's part of like making sure that if you have to change teams, you want things to be good over there too, right? Like you don't wanna have like some dark team in the corner like they don't know anything, right? Be intentional about what you're hoping to learn and what you're actually going to do. You get on the failure in a day call and they're just like, I don't know, should we like try and like bury the dependency or should we slow things down if we try some IO test or whatever your solution provides? Be intentional of that beforehand, especially because a lot of what you wanna do here is practicing around recent features, recent fixes, things that you are working on in service of the customer experience, obviously, but things that you've been working on most recently are really good places to pick up and say, hey, we added an index to the database. Let's make sure that this is gonna help us. And then make sure you can use what you learn. If you don't have a really good relationship with your product manager or your engineering manager doesn't have a good relationship with your product manager, help that, right? Have a chat, have some conversations. Make sure you are catching what you're learning in the kind of language that the product folks kind of understand. Here's what we learned when we tried this test and we buried this dependency, like here's where the traffic dropped off. Here's where people abandoned things, right? They just left, where'd they go? Aren't they gonna come back? Who knows? But then you can show that to your product manager to be able to say, hey, we need to shore this up. We need to make sure that this component that's very, very key to our user experience is behaving well, is performing where we need it to be performing so that you can prioritize the things that you learn off these tests. And that is sometimes a hard discussion to have. If your product manager is very future focused, you can get in there. So thank you for coming to my talk. As I said, the resources are there. If you are totally new to Incident Response, we have an amazing Incident Response Guide. It's open source, it's hosted there, but if you're into running MK docs for yourself, you can totally fork it and do that at response.pd.com. And we also have a podcast, My Team Runs. We're always looking for interesting stories from around the industry. We cover all kinds of stuff. It's not just outage of the week kind of thing, but we'd love to have folks on there so you can always get in touch with me if you'd like to share your story there. So thanks very much. Thank you very much, Mandy. All right, we are in the home stretch. We have one more talk left at five o'clock, so we've got a little bit of a break now, and then we will be back for our final talk at five o'clock. We'll see you there. All right, ladies and gentlemen, we are gonna get started in 10 minutes. All right, ladies and gentlemen, we will get started in five minutes. All right, folks, it is time for the last talk of the day. I'd like to welcome Bob to the stage, and he's gonna talk to us about some blockchain stuff. So thank you all for joining us, and Bob, take it away. How's the sound? Sound good? I'm trying to think of a good joke about how to tell about the size of the turnout. So the only joke I can come up with is this. 10 years ago, 11 years ago, and wait, let me just say something before I start. I am not a cryptocurrency speculator. I don't own cryptocurrency. I need to tell you that. But 10, 11 years ago, Bitcoin was going for a buck, maybe two, if you bought 100 shares, today, none of us would be here, all right? So welcome to the future. Welcome to the future. All right, so here's the deal. I come to, the purpose of this talk is I do not come to praise DLT or blockchain. As I said, I'm not into cryptocurrency speculation. I just don't believe in it. A whole lot of reasons for that. But I have an insatiable curiosity to understand how things work. So what I'm gonna try to do is at a very high level explain what all this stuff is about to you. So that's the purpose of this talk. It really is, I'm gonna explain blockchain, distributed ledger technology, and I'm gonna do it in terms of DevOps because it is gonna have a real big impact on DevOps. So this is the agenda, who am I? I'm gonna talk about the elephant in the living room. I'm gonna do the bottom line. I'm going to do a little blockchain DLT 101 so we have a baseline understanding. I'm gonna talk about working with wallets and smart contracts. I'm gonna do the NFT stuff. I'm gonna create a blockchain right before your very eyes. I'm gonna talk about the programming stuff. And then I'm gonna get to the DevOps stuff and the issues. And if we have time, I do have a bonus demo. I do have a lot of demos, so I'm really gonna be testing the will of the demo gods in here and anything can go wrong. Just to give you a sense of what I'm talking about and what there is out there, the pink is what I'm gonna cover today and even that pink dot is probably pretty too big. I mean, the technology is growing in leaps and bounds. It's just all over the place. So if you expect to know everything there is to know about blockchain coming in here, so thank you for that water. See me after the talk, okay? And then be ready to dedicate the next two years of your life to me. All right, so this is who I am. I've been doing technology for a long time. I am a technology writer and technology journalist. I have production experience. The way I got into this deal is about two years ago, the editors at Blockchain Journal called me up and I'd worked with some of the editors previously and other technologies and they said, hey, you need to take a look at this stuff, all right? And I was coming from, yeah, cryptocurrency, what do I know? But then again, if I'd, 10 years ago, if I bought Blockchain at $2, I wouldn't be here today. But anyway, and I started looking at it and I said, there is something here. There is something here. And so that's, again, the golden part, the purpose of this talk is to share the something here. All views expressed here are my own, not Blockchain Journal, not Scal, not anybody else. All right, so let's do some level setting. I'm gonna say some terms and if they have some sort of meaning for you, raise your hand so I know what I'm talking to, user address. Does that mean anything to you? Okay, good. Proof of stake. Okay, proof of stake, you've been minimally contaminated. Ethereum, okay, got some Ethereum. Solana, okay, we've got some Solana, good. Okay, gas fee. Does that term mean something to you? Okay, gas fee. All right, USDT, does that term mean? Okay, so you people are adequately contaminated. I need to share at this point also that I'm, it's not lost on me that today is the Ides of March and I hope to get out of here alive, okay? Anyway, let's keep going, smart contract. Okay, okay, so you have, there's some experience here. Good. Okay, how many people you've used a cryptographic wallet, MetaMask, Phantom? Okay, so you got your wallet stuffed down. I'm gonna ask the loaded question. Anybody own any cryptocurrency? Okay, so you visited, you visited the exchange. Okay, anybody know Sam Banks Friedman? Okay, oh, all right. Okay, anybody own an NFT? Anybody own one? Okay, we got one person owning an NFT gate. Anybody have done blockchain programming? We're literally written code that's gone out to the chain. What chain, MetaMask? Okay, Ethereum? You've done Ethereum, using Ethereum? Okay, good, all right. And any sort of application deployment created in app and actually had to put it out on a blockchain and maintain it. So I have something to offer. I have something to offer. All right, so now I know who I am. Okay, let's talk about the elephant in the living room. Yeah, cryptocurrency, blockchain, it didn't do too good. I mean, what happened is this blockchain stuff come out and immediately there's a whole culture that starts just doing nothing but luring people into gaming using tokens and NFTs. There's whole city set up in China called Chinatown and that's in Cambodia where they just had people under forced labor doing NFT trading. It's really ugly. And as of course as we know today, the United States Attorney recommended that Sam Friedman, the guy who got convicted be sentenced to next half century in jail. So this is serious stuff. And the fact that actually the government is saying this guy needs to go to jail for 50 years, there's something there. I mean, it's not like this guy invented a new shoe lace and he was fraudulent with it and sending people. Now I can talk about that later. Okay, so here's how the evolution of the technology works. Pretty much you had blockchain and people and granted blockchain and came out of nowhere and people made us, some people that made a small fortune out of blockchain was there more increased productivity? Who knows? But there was millions and millions of millions of dollars running around. And then what we call the blockchain bros, speculative currency. We used to be software developers now we're currency speculators and everybody started doing all that and there was a lot of activity around that and there still is and there was Dogecoin and all the dog stuff and it was not pretty. Granted, it was not pretty. And then we had the collapses. FTX is the big one. Celsius was another one. Terra, Terra and Luna where the money just disappeared. Excuse me. Okay, I can't hear you. I just can't hear what you're saying. All right, so and then we get to actual useful technology. Okay, and people say, well, what do you mean by that? And what I mean by that is this. Okay, in 1940, 1945 we just atomized two cities into nothing. Okay, that was the initial introduction of that technology. And today it's powering the world. So technology goes to an evolution where it can start up in nefarious and then it ends up useful. Okay, it ends up here. Second happened. And again, yeah, blockchain is here. Okay, blockchain is here. You got your IBM, you got your AWS, you got your Azure, you got Hyperledger which is open source. People are playing on the blockchain. It is there. Maybe I am praising it too much. Okay, so fundamentally a blockchain, anybody's used BitTorrent like Eons decades ago, it's a peer to peer technology where a bunch of computers get together and they hold data and there's no central authority and on blockchain what happens is everybody has an identical instance of the blockchain. That's essentially how it works. There's no central authority. If this blockchain over here goes down, it doesn't matter, if that computer goes down, it doesn't matter, because there's 50, 60, 70 others to take their place. So it's decentralized and then you have to come to consensus about how data gets on there. Okay, and so one of the nice analogies I've heard is just think of a blockchain as a big ass Excel spreadsheet. We just have rows and every row on the blockchain, every row in the Excel sheet is a block on the chain. And as a row goes on, that row gets added to one spreadsheet and another spreadsheet and another spreadsheet and another spreadsheet and another spreadsheet. Now, how does that happen? So let's look at this diagram. So Bob, me, at call out number one, I create a transaction and from L, L to M, I send them 20 somethings, we'll call them 20 somethings and it goes up to what's called the mem pool. Okay, this mem pool is open. When you get on the blockchain network, you get to the mem pool. Then what happens is a node will pull down the transaction, look at it and make sure that it's the person that sent it is sort of okay. And then that transaction will be converted into a block using some sort of consensus algorithm. I don't wanna go into a great deal about consensus algorithms. We can do it later if there's time, but it's turned into a block. That block is sent on to the local node and then all the other nodes know about that block and they say, okay, this block looks good and if majority of the nodes say it's good, it gets added to everybody's chain. Again, I'm being very cursory. It is actually pretty complicated once you start looking at how consensus works, but there are, consensus is known, it's much known now. The big one is proof of work, that's Bitcoin. Okay, where they do some magical mystery calculation to say, okay, I can now put this Bitcoin or block on. Another one is proof of stake where node will actually put money in and say I'm gonna put my money where my mouth is, so I'll put $2 million. So I better, at least if I'm gonna fake you out, I better fake you out for at least $5 million because I'm gonna lose $2 million. And then there's also gossip the gossip which is the Hedera blockchain works on. Again, that's a really complex consensus algorithm, but it's cute if you wanna spend time about it to see me later, and then it's proof of history. These are just one of the many. But the important thing is the block goes up, it comes down, the nodes reach consensus and it gets added to the chain. Okay, so going back to this notion of a block being an Excel spreadsheet, what happens is that they're just rows, there's just billions and billions of transactions. So if you can see the ledger, another term for a blockchain is the distributed ledger, DLT. So you can see on the left side, Bob mince 20, okay. Bob takes, oh, excuse me, Bob doesn't mince 20, there's what's called the Genesis mint. I'm like the US Treasury, I mince $20 and I give it to Bob or I give it to First National Bank or Citibank. And then Bob takes eight of his, gives it to Alice. Bob takes six of his, gives it to Mike. Alice takes four of those that Bob sent, gives it to Jane, and Mike sends some to Jane. And then what happens is you do a reconciliation at the end and you get account balances. And that's who owns this thing. I'm not gonna put a name on what the token is or the currency is, we'll call it a thing. But at the end, Bob has, Bob at the end of the wall, the Genesis block has 30 things left over, Bob gets six, Alice gets four, Mike gets four, and Jane gets six. So it really is just the transactions, add them up the same way your bank account works. Are we cool? Am I doing good on the explaining part? I haven't been at scale in a while, so I'll be on you, I'm a little nervous, this is like the comeback tour. It's okay, you can change it any moment. Wait till the feds show up, anyway. All right, so here's, these are some of them, Bitcoin, Ethereum, Hedera, Solana, Avalanche, Polygon, there's more. I'm pretty focused on Ethereum at this point. I do a lot of work around Ethereum and I banged around Solana too. And there's distinguishing factors about all of them. The big one about Ethereum, which I'll talk about in a few moments, and the others, but Ethereum supports smart contracts, or Bitcoin, blockchain doesn't. Okay, so what can you do? What can you do with a DLT? What are some of the commercial uses? And I've talked to these people. One company I talked to was called Cario, and Cario is actively in the process of making it so all motor vehicle titles on all these state governments are put on a blockchain and immediately accessible. And they're working industriously at that. Another thing that's blockchain's been used for is Canon is building a mechanism into their cameras that once you take a picture, the picture is called, the image is provenanced and shows a point of authentication. And this becomes particularly important when you're dealing with news and information. So for example, let's say I'm out here and I take a picture, I take a picture of all you all, and then some of the various people come by and they turn it that the room is empty using AI. So the need for me to provenance that original picture is important, is important and Canon's building it into their technology. So this is one of the many things when they said, do you need to look at it? There's something there, there is something there. Okay, so anybody can create a blockchain. Anybody can create a blockchain, okay? And the way you create a blockchain, is you create a blockchain, you have what's called the Genesis Event, you create some tokens or you create some coin and you put the coin on there and then people come and do their thing with it. It really is that simple. It's not that simple, but it's really that elementary. So, how about some demo stuff? Demo? Let's push the gods. All right, so this is a web storm here and this is out, all this code is out on GitHub in various places so you can download it later. And this is a blockchain I made, it's called REST Point. I'm not selling any of it right now. Don't worry, I don't want your money. But the way it pretty much works is you create the chain, you create a blockchain and then you add a, what's called the Genesis Coin, you put the first coin out there and then if you want, you have more minting events. Okay, we can look at this code, but it really is and you might say, is this code, is it, am I talking to a blockchain? And as I'll show you in a moment, the way you interact with a blockchain is through APIs and through client libraries and these are all, these are not secrets. Some of them are pretty complex, but anybody can do it. So, as I'm going to do now, I'm going to go to the blockchain tests. Again, what it's gonna do is it's going to create the blockchain, it's gonna add a transaction and then I'm gonna make 2,000 wrestle coins right before your very eyes. Ta-da, I'm now rich. I thought it was a pretty good joke. I thought it was reasonable. All right, so the point is there are 2,000 coins out there now. Now, the question is, well, why aren't I rich? Well, the original, this is just one blockchain. In order for you to get this to have legs, I'd have to get 50, 60 people to adopt it. That's one thing. Or 50, 60 nodes out there saying, okay, I want copies of this. And then, and this is where it gets a little financially, financial stuff, I'd have to be able to sell it or at least have some sort of value determination. Like, what could I get if I had a, oh, if I set up a latte shop and people wanted to just, you know, tell me something nice about myself and I gave them 2 wrestle coins, right, and I put it into their account and they come in and tell me nice things about myself, then that coin has some sort of currency. But it just not as, so it's not as easy just putting something on the blockchain. Adoption is critical. Adoption is critical, okay? But aren't you impressed? I just did the blockchain right before your very eyes. Oh, thank you for clapping. I feel encouraged. Yay! Ah, a minor blockchain, all right. All right, so this is, you can set up, anybody could join Bitcoin and join Ethereum. Pretty much what you need is you need two terabytes, you need some RAMs. That's probably not enough. That's probably not enough. And then you need to keep that machine going all the time. You need to keep that machine going all the time or machines going all the time. And then the question becomes, oh, how do I make money and how does all this stuff work? And I'm gonna get to that in a minute. And again, I don't wanna focus too much on making money, but what really becomes interesting is the notion of compute costs, who absorbs compute costs. Okay, so now I have to do a little monetary policy here. And not monetary policy, little monetary mechanics. That's probably a better term. So the United States Federal Reserve, what they do is they mint dollars, right? And a dollar is a coin, okay? And in its physical version, $1 is just as good as another dollar, right? If I give the parking attendant a dollar, he doesn't care which dollar he gets. Yes, there's an argument that every physical dollar has a serial number. But every dollar in your checking account out on your bank does not have a serial number. It's just a concept, it's what we call a fungible token. It can be used for anything, it has no uniqueness. So the same with the wrestle coin. What happened with the wrestle coin is you saw before your very eyes, I went out there, I created 2,000 of these things, and the only thing is nobody wants them for anything. Yeah, could change at any moment, check the news, all right? And this becomes critical to understand, these are coins, these are what are called fungible tokens. One of the bad things about the industry is they're not very good at naming things. They use the same term to name different things. So for example, when you say a coin, they're not talking about the physical, they're A, talking about the physical coin, or if I put a dollar, I'm talking about the physical dollar, but also the dollar is a representation of a unit of currency. It is the currency, it's a name. So things get very confusing very quickly and you have my sympathies, okay? So in this case, we're talking about a token. All right, put that aside, we're gonna come to that. Okay, now we gotta talk about actually how we do transactions on any blockchain, any distributed ledger, because a DLT that can't conduct transactions isn't. And for those of you that have worked with wallets, the way a transaction works, it's really, it's a public, private key scenario, those of you who've been doing security are familiar with it. And when you create a thing called a wallet, I'll show you one in a moment, you create a distinct address, but also you create a distinct public, private key set, which is only replicable through providing what's called a 12 word seed phrase. And that's where you hear about people getting in trouble and I'm gonna talk about the end of the talk, security risks. So I go out and decide to use a wallet that supports the currencies I'm interested in or the tokens I'm interested in. And what happens is I get a, I put in a phrase, I get a phrase back, it says, okay, I've created your unique identity, here's your phrase, put it someplace safe. And then through the wallet, I can conduct transactions. I can also do it through code and I'll show you that later on when I do some more demos. But this is where things get hairy because then people lose their wallet, okay, they lose the 12 key phrase or it's also done in a hardware version, they lose the hardware and that's it. All my asset is gone or it gets stolen or it gets stolen, the asset's gone. And that's a vulnerability and that immutability can really not be assuaged. Now, there are some initiatives coming around that are trying to do away with that. And for those in the Ethereum space, which is a blockchain, the way Ethereum works is they have what improvements are called ERCs, Ethereum ERCs and 4337 is one of them that I hope to do away with this, but I'll get to that in a moment. The way it works, create the public key, take public private key, the private key creates the transaction, send the public key in, the key comes down, they look at it says, is this the user? Then the public key can verify that it is the user, here's the user, here's the key, put the transaction on the chain and then it goes through a whole lot of consensus stuff. However, what becomes interesting is that you can use this technology in other ways other than cryptocurrency trading. One of them is to use it for authentication. One of them is to use it for authentication. And the way it works is that you can use the wallet, you say, okay, I want to authenticate using a wallet and modern browsers support extensions for wallets. And then what happens is you can say, okay, log in using the wallet and you're gonna ask the wallet's gonna say, okay, can give me your credentials, the credentials will go down to the website, website will do its hocus pocus and say, yes, I know that this is good, let the person in or register the person or do something. Now, you probably don't believe me, I will demonstrate it in a minute. I will demonstrate in a minute. But before I get to that, I need to discuss fungible tokens. Because fungible tokens are an other opportunity in the DevOps space and DevOps application space because fungible tokens are unique, okay? And what do I mean by that? Well, let's go back to non, excuse me, non fungible tokens. Nevermind, non fungible tokens. A fungible token is one token as good as the other. All right, and you've seen this before. So in the old days in New York subway system, you go to the subway booth, you give them the first, now I'm gonna show how old I am, the first New York City token I bought was 10 cents. I could ride the subway for 10 cents, but I did. So I got, I go, I give them my dollar, I get my 10 tokens, right? And then I can get on the subway, right? And so there's actually, what's interesting is that subway token booth is also what we call an exchange. It's converting currency, it's converting New York City subway currency, US dollar subway currency into New York City subway currency. Anyway, but one token is just as good as the other. So who cares? But what becomes interesting is if we look, you know, Mike goes and he buys, you know, the subway, New York City subway makes 50 tokens, Mike goes and buys 50 tokens, Goshi buys 10 tokens from the subway, goes to the token booth and buys 10 tokens. And then he has a choice, you can go to Alice and he can say, okay, Alice, you know, here, oh, you wanna ride the subway with me, here's your token, there's no currency, there's no value exchange there, it's a gift, or he can say, hey, you know, give me, I bought it for 10, give me 11 cents for the token and you can get on the subway. And it becomes interesting, but why would somebody do that? Well, imagine that you're going to buy subway tokens in the old days, right? And those of you from New York might have stood in line with his 50 people on the line and you have to, when you rush to get downtown or you, you know, Bob comes along and says, you wanna, you know, you wanna get on fast, give me 11 cents and you give him your token, you're on. So that's a utility scenario, not particularly relevant to non-fungible tokens, but let's talk about non-fungible tokens. A non-fungible token is something that is uniquely identifiable, it's mintable and uniquely identifiable. So let's take a look at this non-fungible token. And yes, there are arguments to be had, but I don't wanna do them right now. Okay, your car is a non-fungible token. What makes it a non-fungible token? It has a unique identifier. What is that unique identifier? It's the VIN. When the VIN mints your car, it gives it a VIN number. When it mints another car, it gives it a VIN number. When it mints a third car, it gives it a VIN number. Okay, each of those are unique. And when you start trading cars or selling cars, that VIN number becomes very important. So what happens is Bob goes to the, Bob goes, take Bob the automobile dealer, sells to Mike a car, he's transferring the token based on the VIN number. And then what happens is that Mike decides two years from now to sell that NFT, that non-fungible token to Alice through the VIN number. The important thing about an NFT is that it is unique and uniquely identifiable. Now think of the utility of that when it comes to maintaining employees at a corporation. What if, as an employee, when you're hired, you're given an NFT that reflects you? Now granted, yes, you can, the security people would jump up on the stage and say, you don't understand, you don't understand. And I'm gonna say you're right. There's a lot of problems that need to be solved. But conceptually, it's a viable technology. Conceptually, it's a viable technology. I'm gonna demonstrate it as we get to the end. All right, so in order to understand non-fungible tokens in the DLT environment, we need to understand a thing called the interplanetary file system. Does that determine, does IPFS resonate for anybody here? Okay, good. I have an opportunity to make another contribution. Okay, the interplanetary file system, it's a distributed ledger, but the distinction between a regular file server and the interplanetary file system is that the URI describes the content, not the location. So what that means is, let's say I have a photo of me and I go and I put it on AWS someplace or some service and that photo gives me, AWS gives me a URL to that photo. It's not giving me the photo, it's giving me the location of the photo on the file server according to the file name. If for some reason I come along and I decide to move the file to another server, that URL is invalid. The IPFS, the way it works is when you put an asset on the IPFS, it creates a hash code that describes the content. It becomes immutable and that's an important distinction. So let's see here, ah, good, another demo, because I see it's five o'clock, six o'clock and people are getting tired now and I only have 15, 20 minutes left, but I wanna show you something. Okay, and here it is. This is called the IPFS desktop and notice this thing running on my little laptop, if we were to look closely, you'll see up at the top, it's discovered eight peers. Well, this has only been running for a while. I had it running earlier and it had all 238 peers running. Okay, so it's actually going out to the network, looking at all the peer-to-peer nodes and it's actually bringing down copies of all these assets, copies of all these assets. So you might say, well, let's put something up on the IPFS. So I'm gonna try to do that. Let's see if it, okay, let me go import, let me go file. How about we did blue dot already? Oh, Jeepers. Does my wife have anything? Okay, well, yeah. My wife put all her songs up here, so they're too big, too big. Okay, hold on, hang tight, I'm getting to it. Let's go down here to documents. Okay, the dumb text. There it is, dumb text. I'm gonna take it and I'm gonna put it out on the IPFS and if you look here, you'll see there's the dumb text. Oh, no, but more importantly, if you look here, you'll see that there's a hash code and that hash code is describing the content. It's not describing the location. So let me see if I can go here, copy the CICD. That's the identifier for the asset and I'll go, I'll meet the Brave browser supports IPFS out of the box. So what I'm gonna do is I'm gonna go to Brave, use the IPFS protocol, I'm gonna go here and let's see, gee, Mr. Demo God, gee, Mr. Demo God. Oh, it's, yeah, so here, no, okay, well, okay, let's see if it works. I'm not surprised because you're gonna see one of the big DevOps problems coming up latency because right now I put it on my local machine but now it has to go out and IPFS is saying, hey, wait, where's this stuff? Where's this stuff? So Demo God, you won. But the important thing to understand here is that we have a way to put immutable assets out on a distributed network in a way that's unique and when this becomes important, when people say I wanna get a picture of Will Shatner's x-rays, which he did, right? How do I know that's Will Shatner's x-ray because it has a prominence. I put it out on the IPFS at a certain date and that content hash is immutable. And it's a, again, it's a power, you're gonna see it in a minute when I get to the demos, but it's a powerful technology with many applications. Okay, smart contracts, game changer. The same way when you do a transaction on a blockchain, what happens is you have an address, you have an address as sender, a receiver, and sender's receiver, sender, a receiver address, and then you have the location of the token or the specific address also. The way you can put intelligence on the blockchain is through this device, not this technology called a smart contract, and a smart contract is code that lives at a specific address on the blockchain and is executable. And I want you to think about it for a little while while I do the simple demo about the implications because the implications are significant. So let's see, what do I have here? Okay, so this here, as you can see, before I go and do it for real, this is a Solidity smart contract. Solidity is the programming language that's supported by Ethereum. It's not supported by Ethereum, it's supported by the Ethereum virtual machine. And I'm gonna show this in a minute because the way you interact with a distributed ledger is through virtual machines. It's very rare to interact with it directly. There's a layer that the providers put around the blockchain to make interaction easier, very similar to the way the Java virtual machine abstracts the operating system. So here, what you see is I have a smart contract, name the contracts, add operation, and all it does is add two numbers together. It adds an A and a B, it puts the A plus B and it returns the result. But what's also interesting about this, notice that there's a thing called an event, and the event is log answer, and I can emit an event, so I have a way to do event-driven programming if I want to on the blockchain. For those of you that are into those sort of nefarious practices, it's very cool. It's very cool. All right, so let's see here. Before I get to the demo, so the way it works here, and you're gonna see it in a moment, is that you have an application. The application interacts with a set of libraries called the Web3 library. The Web3 library goes to an RPC interface. The interface interacts with a virtual machine and then the virtual machine makes everything happen on the blockchain. Is it complex? Yes, but it's just so enormously powerful and it's part of the evolution of the technology. All right, let's go do the demo. I didn't get the demo up, let me show you now. Now, this is an example of how the technology is just evolving so quickly. All right, anybody here program? Program at all? Programmers? Anything? Visual Studio Code, IntelliJ, right, okay. Visual Studio Studio, okay. Let's go to here and this, this is the development and online development, 10 minutes, oh dude. All right, we'll get to this. I'll do the login and then anybody that wants to see the security issues will do. See me afterwards, okay. This is the development and online development environment facility and what you can see here is what I do. I'm adding two numbers and the way it works is I compile the code, I turn it into byte code. What I do here is I go down to deploy and in this case I'm gonna deploy it to a local virtual machine. I could just as easily deploy it out to a testnet and I'm gonna get to the virtual cycle in a minute and I'm gonna, in this case of local machine, I'm deploying it and if you look here, let me see if I can make this big. It says I can. Okay, can you see? And this is now going out to a virtual machine here and now what I can do here is I can go to here. There is the exposure of the smart contract. I'm gonna go pick a number 13, pick another number 67. I'm gonna execute the transaction, okay. Okay, now we're getting some congestion. Let's see here. All right, so here we go now. So let's see what happened. Okay, so we have, the answer is 80. Demo gods are beating me up here. Not sure why it's not showing up in the IDE, but if we look at the output from the blockchain, from this case the Ethereum virtual machine, it has a lot of information. Here's the transaction. Here's the block. Here's where the block is. Here's who sent it, which is my local instance. Here's the gas and I need to go over this. Unlike traditional computing where the owner supports the cost of the computing, right? If you're AWS, you own your equipment, you charge people to compute. The way it works on the blockchain and public blockchain is that you pay for the computing costs and that computing cost is described as gas. So this is how much gas it costs to do it. Here's the execution cost. I know I'm going very quickly because I still do have a lot, but here's the answer and there's the log output. And this is all information, which is readily available for auditing on the blockchain. Nothing is hidden. And you have to, it takes a little while to get used to navigating around and figuring things out, but it's a compelling technology. Okay, let's go on here. Again, I don't have my secondary slides, but I do want to get to the stuff. I want to do the virtuous cycle and I do want to do the security demonstration. Are we doing okay? Am I going too fast? Am I getting like too hyper? All right, for you, I was watching my cousin Vinny last night. For you that are into DevOps, this is where it affects your life. All right, so development environment. Actually, you can develop, there are plugins for Visual Studio Code, there's plugins for IntelliJ. You can do remix or you can use Solana Playground if you're coding in Solana. Your clients or your Web3 clients, you can set up geth to run local blockchains, okay? And you can also test your smart contracts there, okay? Where it gets really tricky, and this is for DevOps where people are just pulling their hair out is the immutability factor, right? Because once you put a smart contract out on the blockchain, it is unchangeable for the most part. Now, there are technologies out there called upgradable smart contracts where you can upgrade the contract. But once the data is out there, that's another story. So they're just driving themselves nuts just about how do I do development? How do I do development? And the way it works from now is you do development on your local machine, you run a local blockchain, then you go out to a testnet under Ethereum, it's Sepolia, Solana has a testnet, I'm sure blockchain does too. And then what happens is you also have to fund your usage. So I'll just show this to you. I'll show this to you quickly. The way you fund your usage even on the, no, it depends, no, on testnet. Yes, no, on a test you're funding with virtual. Yes, five minutes. Okay, I'm going as fast as I can, right? Go on, go on there, okay. All right, this what happens here is you put in your address and it will send you unusable currency to your address, all right? If you're doing real-life testnet development, what you're gonna pay? You're gonna pay out of your wallet, all right? So that's how that works. So going back here, okay, do you get it? Here's the most important thing because I wanna show you one more demo. Immutability is the bear and it's just a challenge to be overcome on the development release cycle. CICD doesn't count if you just can't upgrade quickly. The other one is latency. Latency is still a bear and security. If you wanna know more, see me outside. Happy to talk about it because there's about 22 more slides to go. But let me do the last demo here and then we'll go home and call it a day. All right, so the last demo here is called the log, is called the web wallet, poetically the web wallet login. And so I'm gonna go start, I'm gonna run the server, okay? Okay, running point 311, no surprises here, okay? Then what I'm gonna do is I'm going to go, this is my wallet here. Okay, right now I'm going, this is, you can see this is my test, this is all my test stuff. I'm on the Sepolia test blockchain, okay? And this is the address, okay, that I'm going to support. Now, let's go out here. Logos 311, ta-da, all right. So this is login with MetaMask. So I'm gonna login with MetaMask and now notice what comes up. MetaMask and MetaMask is saying, okay, it's asking for a prompt. Can I, will you let me do this? Will you let me do this, all right? And I'm gonna go sign, I'm gonna say yeah, okay. All right, now what it's done is it's taken the public key, I've sent the public key into my web server and part of the public key is also the address of the sender, okay? It's encrypted in I think the last eight bytes of the key. And so now what I'm gonna do is I'm gonna go here. I'll give you my Bob Bob WrestleMania one. I hate it when time don't have enough time for good stuff. Okay, oh, remembered me. Now let me, here, I'll give you my other email address if you wanna talk to me. And now I'm gonna submit the user profile. Now my user profile is saved in my storage mechanism. Okay, now I'm gonna log out again. And now what does it know? It knows me, it knows my key, it knows my address, it knows a lot of things about me. I'm gonna log in with MetaMask again. It's saying MetaMask, MetaMask. And if I change the count, I'd have a different experience completely. I sign in and look who's there. Now granted, there's security issues to be addressed here. No question about it. But the last piece is when you start working with non-fungible tokens. Because what we can do, and I do have a whole demo set up, I'm happy to show outside, anybody's interested in it, is you can actually, the administrators can create a non-fungible token that describes the security. You don't wanna put a whole lot of information, but I could put a token that says, my token, very special, uniquely identifiable. And it says, bearer of this token is at level two. Then what happens is when I log in with my wallet, my address, because it's unique to me, unique to my private key pair, will go out, the server will take the address, scour the block net, and say, oh, I know about this guy or this person. I know about this person. Oh, there's their NFT. And when I inspect their NFT, then I can see that they indeed have level two. And then behind the scenes, I can do all that security adjustment. So one last thing, I know five minutes, but let me just show you what this looks like. Let me go and grab one more, because I did this earlier. I know one minute, I'm so down on one minute. Let me switch accounts. I'm gonna go to another account here, 09. And now I'm going to go out to someplace cool. Testnet, right, this is OpenSea, which is a hosting service for NFTs. Okay, I'm gonna go this user here. I'm going to paste in this user address. There it is, it knows it. And look here, there's the scale. I did this one earlier, but there's the NFT that's out there bound to my user address. And then when I go into it, if I go out here and I look at details, and again, you gotta know this stuff. And there's the unique identifier. Think about that as the VIN number for the NFT, right? And I look at it and look what it has. It knows about me. It knows about me. So again, this is not meant to replace private enterprises, but if you start thinking about what this can be done for at the operational level other than just another customer loyalty program, it becomes profoundly engaging. I have more. I've used up my time. I hope I provided value to you. Thank you for inviting me back to scale. If you wanna see more, see me outside. No cost to you, the consumer. All right, thank you everyone. That concludes DevOps Day LA 2024. Thank you for joining us. We will be back here next year. But in the meantime, as part of scale, we're gonna have some live music with that wonderful piano behind me and immediately after that, it will be upscale. So Ignite Talks, it's always a good time and there will be a bar in the back. So we would love to see you back here at 630. Otherwise have a wonderful evening, safe travels and we'll see you later this weekend. This is a test. This is only a test. One, two, three. Sounds good.