 So, hi everybody. I'm Peter. I'm a Google Developer Expert and I'm also a co-founder of a company called FlowUp, a Bureau of Software Agency based in Brno. So, this time I want to talk about DevOps and particularly DevOps for small teams. And just a disclaimer to start with, this is more of a my experience. It doesn't really mean this is like the silver bullet that you need to use. Basically, whatever you like, take it and use it. Whatever you don't, just leave it be. I'll be very happy for any feedback to this as well. So, I want to start with a very simple idea and that everything is literally bound to fail. We just don't know when. And I think this is like a mantra that every DevOps or every team that DevOps is using should have that we just don't know when things will fail. They sometimes just will and we need to be repaired. And this is what the talk will be about because what we can call this is, I call this a property of fragility. Sometimes this is also called robustness. So, whatever you're trying to, you know, this is the same scale of just fragility on the left side, robustness is on the right side. But what I want to talk about here is basically that it's really hard to make systems robust without thinking about why they are fragile and where they are fragile. So, fragility has two parts or from what I know and from what I experience, it has two parts. The first one is a static part and these are things mechanical, technical, meaning whatever code issues you have, known issues, bottlenecks, scaling issues that you know of and so on and so forth. You can basically think about your recent project and see these technical issues like right ahead, like where is the to do that I have? It's like, you know, code mine. Somebody steps on it and then everybody cries because, well, somebody left something there and forgot to implement something. But this is very static. It doesn't change in time. If you leave it be, it has still the same chance that somebody will step on it as it had when you first wrote it. But then there's dynamic fragility. And this is what I want to talk about more in this talk, which has four parts. Again, there can be more parts based on your experience. But from my experience, there are these four parts, the capacity of people, the context of people, the scale of the system, and dependencies and other teams. In other words, capacity of people, how much performance or what we call performance we have when and at what stage is in the project. For example, whatever skills we have or whatever skills we gain in the process, context of people is something like active maintenance, meaning I know what I'm doing in this part of code or in this part of project. You can also see context as, I don't know, if you're maintaining or if you just wrote the code, you have a full context of the code you wrote. But if you leave the code be for three months or six months, you basically lose a lot of context. And now it takes a lot of time for you and for other people to understand the same stuff. Scale of the system is something very unpredictable most of the time, meaning you don't really know what will happen tomorrow or the other day. You just need to be kind of prepared and you will never be prepared for this, not to the scale that we all want. And dependencies and other people. And you can talk about libraries or things that are technical, but in the end there are people behind them. So the fragility of the dependencies also has the same fragilities as your own team, meaning static and dynamic. And then there's like a big tree of these things. Why I want to talk about the dynamic ones is because I genuinely believe that we are good at seeing the static fragilities in our projects like, you know, these code mines, but we are very blind to the dynamic ones, meaning you can't really tell or at least I can't really tell the capacity of my people, what their performance will be tomorrow or the day after. What the context is, did they actually look at the code or is it everything just like, you know, I wrote it, I abandoned it, and now we just sit there. The scale of the system, something we can't really predict. Well, we can't really predict that. And dependencies, which means how do other teams actually work and what they actually have. You can actually predict this. You can have some numbers if you want to. I mean, it's a very cool management job. If you want somebody to suffer endlessly, just give them to create a report of these four things. And they will probably never create it or if they created, they will be six months late, which is why I want to talk about this. But the biggest property or the largest property, why I want to talk about it is when you look at static and dynamic fragilities, excuse me, when you want to talk about or when you want, when you look at static and dynamic fragilities, you see that the static fragilities never change or almost never change. Don't scale very much. But the dynamic ones, they tend to scale when ignored. Meaning, if you really abandon something that really scales with time in months and years, where in some time, you don't really know what you did or nor your team knows. And when something happens, it just like hell just broke. And you need to fix things. And that's why these other graph and these are all visualization. This is not a particular project. So this is what you may have experienced or may have not. But the other graph tells us the story of features versus fixing or what this is what the management calls it, the phase of bug hunting. And this is what will happen or what may happen to your system when you ignore the fragilities. And most of the things just happen and most of the big booms in the systems just happen because we ignore these fragilities or the dynamic fragilities of our systems. Meaning, we were not prepared for the scale or somebody abandoned some library or some part of the system. Now we need to use that. And now there are bugs in that and nobody really remembers stuff. Or we already have scheduled everything. And now nobody really has time to do anything else yet there is a big issue in the system. And what happens like this is the reality where you may actually get back from this. If you use, if you go with the right graph, you can actually take this and it will still be okay sometimes. But what's not okay is we sometimes think we can do this, which means go above our capacity. And this is a graph that shows that we go or we try to go above our capacity, but we can't, which means even if you try, you're just doing all the things in half the quality or internal quality without quality assurance, without doing real stuff and actually creating just a legacy and burning out. This is something you should avoid at all costs. And I'm going to talk about this more and about lessons I learned about this. So as I said, you may be right now asking, so how do we track? Because if we can see, we can fix things, right? So how do we actually track the dynamic fragility? And the answer is, we just don't. And if you're right now asking like, why? It's because it's very hard to track. And it's not only very hard. You can't track that until that happens. That's why I said, if you want to give this somebody as a job and let them suffer for like six months without any impact or doing really anything, this is a very good opportunity. Because what happens is you can only tell what the scale system was at when something happens. And you need to track this for months or sometimes, like weeks or even months to make sense of the data. That's why it's not very good to actually try to make sense of the data. You can backtrack that and see what the team was at, for example, three months or six months. But at the same time, the team is totally different in three months or six months, which means it doesn't really make sense to, I mean, if you want to collect this data, you're free to collect them, right? It always makes sense to see how we behaved in the past. At the same time, you shouldn't be creating future plans based on these dynamic fragilities from the past. And so I want to end this part, this first part of my presentation with this simple quote, absence of evidence is not an evidence of absence. Meaning just because we didn't crash until now doesn't mean we will not crash tomorrow. And this is what dynamic fragilities are all about, just because you don't have evidence of something doesn't mean your system doesn't have some issue in it. And it may be your mechanical technical system or your process that you have in DevOps. So what do we do instead? We spend more time on minimizing the potential impact that the things have. So how do we do this? And these are my lessons that I actually learned doing this for five years. And as I said, these may not be all and you can take something and leave something else behind. But these are the four lessons. First, very multiple hats. And I will be talking about all of these more in a second. Second, maintain actively, don't just abandon things. Three, learn valuable things and make sure you understand the value of the things instead of looking at some other metrics. And I will talk about these later as well. And find and discover, don't stale. So let's talk about the first one. Very multiple hats. And again, graphs are only just as a visual representation. If you may have it very differently on your own project. But make sure that you know that your capacity is limited. And if you want to make things in a quality way that they don't backfire the next time you touch them, then you will need to manage it. And wearing multiple hats is actually a good way to manage your own time. Because you already know the process. So you know how much time it will actually take for you to develop the thing, to test the thing, and to release and deploy the thing while making sure it operates well in production. I know there are environments where this isn't possible, where the applications are just too large to do this. But we're not talking about the smaller teams. And this is what you should be aiming for, or what we are aiming for, to make sure that people actually manage their own capacity. Because we know what happens when people don't, and we let something like a jail velocity do that for us, which is, I'm sorry, but complete nonsense. Then people get burnt out. And there's not really much of a value in creating such a capacity from third person for somebody else. The person himself or herself needs to know their capacity. This creates a natural pressure. And this is a very cool point. Because when your team wears multiple hats, meaning your team actually has the whole process under itself, they can actually innovate from inside. Because I believe we all hate when somebody else comes in from other team or from management from the top and says, hey, you just need to use this technology and you just don't know why, why this is valuable for your team. Well, because you know and you know your context, your team knows its own context under which it operates. It's probably also seeing the things they can improve in. And you need to work on this from inside. And again, there are times when external factors or external people can really help the team understand more or make sense of what's really going on, where to make things better in the team or the process better. But at the same time, the team itself will always be the best at innovating from inside and innovating itself in anything in tools and processes. And then managing the full life cycle in your own team means you have a lower potential of impact, meaning if something goes wrong, you now have the one team or a particular people in the team that know what to do. Second, this will be a quick slide, which is maintain things actively. Now, I know this is something that for many, for many people is like, yeah, sure. That's what we do. And also in an open source tooling, you can see these, you can see these like people really trying to contribute to their own code and seeing all the issues and everything there. And just, you know, trying to stay in context of what's going on. Why do people use their libraries or why do people use their tooling? At the same time, I've seen these too many times in small teams where they get past their capacity again, and they just abandon things in process, meaning you now have features that people use, which is the better version. If people use your features, then it's a better version of that. But at the same time, you risk that nobody will be in the context if something goes wrong. So you're just like, oh, this is fine. You know, everything's on fire, but we don't know how to fix this. And some people use that. So whatever. And now it will take much more time to actually fix that. Instead, I suggest you have or we try to have the conversations of do we actually need these things? Is it actually like, do we need these things in the future are actually people using them before we try to abandon them? Because sometimes it's just a wrong assumption that people are using all these things or you are using all these things. Well, that can be very easily replaced by something else. And as I said, things we don't pay attention to just backfire. So make sure that when you look at your project that you don't see abandoned parts, because they tend to backfire first. Free, learn valuable things or learn valuable knowledge. So many times I've seen people talk about or I heard people talk about the learning curve. And maybe you've heard that many times too. People talking about, hey, this thing has such a steep learning curve. We just don't want to practice it. We just don't want to try it even because the technology is not for us. Because we don't really right now don't have a year to learn something. While they'll totally ignore the value curve or how much value the technology actually has for them in time. Because there are technologies that actually have a very steep curves. And you will need a lot of time to learn them. At the same time, they will be of a very large value in a long shot. So you should still focus on doing that. And whenever you try to prioritize things in your team, try to look at the value curve first. And the value curve actually consists of the learning curve as well, because it takes the value that it gives you in the long term and the background of your team, meaning how fast can you actually learn this stuff. Because if you can learn this stuff, if you don't have the background in your team to learn this stuff, then it won't be of much of a value either. But if you can learn this fast and it has a value in the long term, then you're good to go with the technology. And one more thing, theory and practice are the same in theory, but in practice, they are very different. Basically, meaning before you take the technology or anything and say, hey, we just want to use it because it's cool, because it has many stars on GitHub. I just hope nobody does that anymore here, picking technology based on how many stars it has on GitHub. But try the technology, practice it. Because that's the only thing you can do to make sure that your team and you will understand the technology. And the last thing, find and discover, and this is very, I'd say, very controversial, just because every time I see people talking about robustness, they talk about stability. But I would say that stability was always a ticking time bomb. Why is that? Well, that's because people that are in a stable environment tend to not care that much about these instabilities in the system. And so they are not prepared for what's going on or what can go wrong. While the people that are battling every day and trying to challenge and battle against their systems every day, they are prepared for the things that will happen in the system. And even if there are big issues, there's a much higher chance that you will actually tackle that issue when actually trying to battle all the issues all along, instead of trying to make the system as much stable as you can. And I'm not trying to say that you should make your system unstable in any way. I'm trying to say battle test your system with early adopters, trying to make sure that the system is prepared for whatever can actually come. So with that, I just want to end with this simple quote, be prepared for a war by fighting small battles. It's for the previous slide. Always challenge yourself, always challenge technologies you use and challenge the people and your product as a wall to make sure it doesn't have all these dynamic fragilities in it and it comes many times from just people. So that's it from me. If you have any questions, I'll be happy to answer them or after the decision I will be also on the Discord server so we can ask also there.