 So, it's 9.02 and the room that nobody can find is full. Welcome to FOSDEM. My name is Kusbert. I'm one of those weirdos who's been coming to this event for 20 years, which means I actually have never been in this side of the building because they only started using it this year. What I do in the daily basis, I help organizations to leverage open source to deploy software faster, to actually achieve their business goals faster. Somewhere about 10 years ago, we figured out a name for that. We called it DevOps, since then people started abusing that word for a lot of things. And in my spare time, I organize a bunch of other conferences. So, if you want more of this fun stuff, this is actually the only pitch I'm gonna do. Monday, Tuesday, Wednesday, we have this other small event in Ghent, which is Config Management Camp. It's basically the same as FOSDEM, only focusing infrastructure. And you're all welcome. We're sold out, but we don't check tickets. We basically do crowd control. So, just show up if you care about this stuff. For those who have seen this slide before, they might have noticed this new thingy here. DeliveryConf had its first edition last week in Seattle. It was awesome. And the ID currently is that we might run one in Europe next year. So, given that you folks are interested in continuous delivery and stuff, you might want to show up. I don't have much time, but I want to focus on what is this DevOps thing? To me, it's not the automation part. It is mostly the culture part. Damon and John have been talking about culture automation measurement and sharing. And there's a couple of things. There's no such thing as a DevOps engineer. If you're a DevOps engineer, your organization basically does not understand DevOps. If you think you can get certified in DevOps, you've been scammed. That's not what DevOps is about. It's also not the same continuous delivery. I think you need to do something like DevOps in order to achieve continuous delivery, but you could definitely do something completely different and have a healthy culture and still achieve your business value. But if we talk about continuous delivery and what basically is the topic of this talk, it's this. It's the ID that we can constantly deploy software, stable, reproducible, fastly, and we typically achieve that through collaboration between all of the people involved. And that smells a lot like DevOps. So, how do we get there? Well, I could pull a just humble trick here and ask, who's doing continuous delivery? Raise your hand if you're doing continuous delivery. So, who's frequently committing to master every day, multiple times a day? Okay, the hands that drop, they are not doing continuous delivery, because they're not even doing continuous integration. They're doing continuous disintegration. And you need to do continuous integration before you can achieve continuous delivery or continuous deployment. And also, there's nothing wrong with maybe having that manual step in there, if everything else is automated. So, what is this thing about? What are we talking about? Today, I want to talk to you about how to get there and how I'm not together, and I'm going to talk both about a couple of use cases, stories I've had, experiences I had with larger and smaller organizations, but also a bit of deeper dive into how to do this technically. So, what is the actual goal we want to achieve? This is part of the release notes of Deployinator, which is a tool Etsy wrote 10 years ago, and we're still not there yet, but this is basically the goal we want to achieve. We want to have the ability to deploy code quickly and safely. And a lot of people have tried to achieve that, and a lot of people have succeeded, but many more are failing. And I think partly that is because they haven't involved the operations people first. They've basically been trying to do this as something which is a developer-only story, and I don't get this, because if you look at the operational part, for years, we've been telling developers like, you're not going to push this to production, you're not going to deploy this life, you're not going to do all these things manually. You're going to have testing in place, we're going to have automation in place, and we're going to force you to go through a pipeline and automate everything. But on the same time, we've tolerated that. Ops, folks, we're like, I fixed it, or let's do this manually, or let's change stuff there, or reconfigure a network switch, or add a firewall rule, and we've been tolerating people to just do that manually. Who's still doing that? Why? The people who have worked with me know that I don't tolerate that, and I haven't tolerated that for the past decade. And it's hard, and if you look at the history of where this DevOps movement comes from, if you look at the European part, we were mostly senior Ops folks, people who had been, I mean, I've been coming to FusDom for 10 years, I did an open source before that, and we started DevOps Days 10 years ago, so we were doing more than 10 years of open source stuff, some of us 15 and more. And we were starting out from an operational point where we were seeing bad quality artifacts from developers. Teams, we were having a lot of pain with stability, with scalability, and we were like, nah, this is not how it works. And on the same side, we had the American people, we were like, yeah, but we need faster deployments, and the developers were pushing this, and somewhere in the middle we met. There's also the difference in culture between the American startups where there's Vulture Capital, which basically pushes companies to go fast and burn, whereas the Europeans are more looking at we have a sustainable business. We're going to be around in 10 years and be supporting customers. So the combination of breaking things fast, but still being resilient and knowing how to run things, that is to me why a lot of people have been tolerating Ops to do things manually, as opposed to really doing infrastructure as code, really doing automation. And I have some stories, some cases, which basically back that feeling that you need to start with the operations part first. They're not as scientifically measured as Nicole Forzun's state of the DevOps survey, where she actually has good metrics, but I've seen this throughout the journey with different organizations, and I wrote down like three of them. And I hope that's not somebody on call, it's 9 a.m. on a Sunday morning, I must be painful. I'm actually looking in a room to see if there's people from the customers in the audience. I saw somebody outside, but he's not getting in, so that's good. So yeah, let me start with a couple of those cases. And the first one is a situation where I walked into an organization and they were in complete chaos. They had about 300 nodes. They had no clue on how they built them. They had no clue how they deployed them. They had Jenkins hiding somewhere under a developer's desk. Who still has that? Not that many people. That's good. The operations people were mostly fighting fires. There was zero standardization and every artifact that was being deployed had different naming conventions, was deployed on different locations. It was, well, hell as we know it. And their tipping point was the moment when they had a really, really huge failure. And their senior management realized if we have this failure again, it's game over. So they sought help. And what we started doing there was the first thing we needed to figure out is how do you build your software? How do you make sure that your software is reproducible? And how do you actually achieve being able to build your build environments again? So we started teaching those people. We were already using purpose and infrastructure as code tool. And we started teaching those people how do you build your own CI stack reproducible so that if it breaks, you can still rebuild it and you can guarantee that the next time you spin something up, it's going to be similar or extremely similar to what you built before. And we taught those folks like testing. How do you test your things? What they were doing before was they had 300 nodes but had no development platforms. They were just hacking Puppet's code in production and Puppet applied and it broke. Oh, so the idea there was we start with building the infrastructure there in a development environment and then teaching them that you can promote things to a new environment when things work. So we split out the configuration out of the code. We started to teach them how to use the pipelines they were actually building for the developers and slowly you saw the developers starting to actually contribute code to the Puppet's code base. They were writing tests, they were writing their own modules and we saw that because they were now using the same tools, the bridge between those people was being closed. The gap was closed and the bridge was made, sorry. So we went from a zillion different GDKs to one GDK and one GBOS version. We called this project initially Project Dolly because we wanted to be able to clone the difference environments, not as in a physical image clone, but like we wanted to have the same environment for everybody and we tried to shoot for like 90% reproducibility because that was the initial goal. We knew there were going to be a bunch of edge cases. And like I mentioned, eventually we had Java developers, they were contributing back to the RSpec code base to actually having tests on infrastructure code so they were much more sure that what was being delivered was actually what they wanted. So we started with telling ops folks Automate and they learned the same tools as developers and eventually collaboration improved, progress improved and they were happy, they had much less downtime, much less issues and much better collaboration. The second case I went in was completely different. They were already doing CI, they claimed. The developers had a bunch of tests. They had some things under their hood, hiding under their desks. But the people that they called operations, they had like, yeah, but we cannot deploy this. We don't know where it is coming from. But those developers said that we have good test coverage and we have everything in place. But they still had no way into stable deployments. They could not automate their deployment. It was painful. It took me about 18 months in that organization to finally find somebody who I would consider an ops person. Who I would actually say, hey, if you're looking for a new job, I hire you. And all the other people in that room were basically people who were, they were called ops, but they couldn't figure out the difference between a root password or a username. It was that bad. It was mostly when we eventually found people senior enough to do operations work, it were the Brands, if you've read the Phoenix project. It were the people who were firefighting, harding knowledge and they were not being able to move forward. So what we did there is we found a couple of them and we moved them out of their office, out of their day-to-day job. We actually moved them to a different city. And there we started teaching them agile. We started, they were longing to do infrastructure as code for years, but they weren't allowed to. So we started them in their own group where they couldn't be disturbed by other people. The side effect was that the other people that were trying to get their resources were finally learning how to fix things themselves. And they started to build a pipeline. They started to do CI-CD on their own code base, their own tooling. And they started to think about, hey, what is our way to build something that ends up to production? And even discussions like what is production if the ops teams deliver production is that production for them or is that production for the developers? Those kind of discussions started happening. And after those first successes, we slowly started moving those people back to their teams. So they could take the practices they had learned and they could actually help their own teams back in adopting them. But that was like 18 months after we started working with them. So over a period of about six months, we ended up having like the really old-school greybeard people. The people were just doing AI acts and said, I'm going to go retire in six years and I'm not going to learn anything new. And we converted those people to actually being agile eventually. Those were people who now were running in the organization and saying, hey, you should do this canmenting because for operations it might be better than scrum. And those were the people who were actually writing code tests. Test for the code. So it took them much longer time to have teams arrive at continuous delivery because they didn't get the ops folks involved first. It took them more than a year longer. But once they were there, things started moving. So you can guess what the third case is, right? An organization that still refuses to involve the operations people. They claim to do DevOps. They claim to have been doing it for two years. And they have a DevOps team in the middle which is dictating tooling. Tooling they never use. Tooling they don't know. So the developers are basically complaining about, well, yeah, we have this tool set here, but we cannot use it because it doesn't fit our needs at all. They're broken. They don't deliver what they should deliver. And basically what they achieved is that everybody is working around the ecosystem that has been built and still doing things manually because the tool set they're given is completely broken. They're still hiring DevOps engineers and the average DevOps engineer who has a clue accepted the fact that the job title was wrong but he might be able to help stays for about two months and then leaves. Their senior IT management has left twice over the past three years. And they have a team where basically there's only junior in-house analysts that stay and all the crowd is external people and by the end of the year, by October, November, they're out of the budget so they don't write any code in the last quarter because they don't have any developers anymore. So they're going to fix this by adopting the cloud which means they have some cube deployed somewhere and they're moving their Oracle databases in there and they're wondering why things don't work. Oh yes, you still need to fill in a four-page word document in order to be able to deploy something and they still haven't involved the ops folks. Actually it's worse when I was working with them I wasn't allowed to talk to the ops folks. So if I look back at these journeys and those are just snapshots, the only thing I can conclude is the earlier you have those ops people involved the better your CI CD story is going to be. It's going to have a much higher rate of success and it's going to be much smoother. And if you think about it, there is a reason because people who run ops need to be able to support what the other people do and in order to be able to support these things they need to understand what they do and if you've never done this before how can you do this? If you've never driven a car are you really going to tell people how to drive a car and help them fix the car? If you don't know what a steering wheel is if you don't know what a gas pedal is, you won't. And you need those people involved so you can unblock delivery, you can unblock provisioning, you can make sure that everything you want is built in. And putting up a bunch of tools is not going to solve that. Shouting that you're doing CI CD is not going to solve that. So the culture hack I have for this is well, if you teach your ops folks to set up CI CD for your CI CD infrastructure first they're going to understand how this works and they're going to be able to coach and teach your developers into doing exactly the same thing. And while you do that you give them a common set of tools, a common set of terminology, a common set of it's broken, let's fix it together. So how do we do this? Well, if you look at all the organizations you see that in the CI ecosystem there is tool for version control, deployment tooling, build tooling, you need artifact repository, you need code coverage, you need test tools, you need a lot more than is listed here. And if you talk to the average ops person in a legacy organization how many of these tools do you think he's been using? Two. Two, zero, all of them then it's not a legacy organization. Then it's probably an organization that already moved further but you'd be happy if they use version control. Testing, hmm. And I had a friend who tweeted last week even that he was working with an organization and their version control was file names on Dropbox. And it is 2020. So a bunch of these tools those people have never seen but they are expected to manage them, expected to run them and they don't know how they function. And that's already the first challenge to overcome. We need to have people who understand how these tools work because they have been using them. So a lot of people still argue against doing CI because of these reasons. They don't know what the tools are about. They don't know how they work. And they're like, yeah, but this is costing time and we don't have tests anyhow so why wouldn't we spend time on this? And the biggest ones is we don't have budget for this. We don't have the money for this. Now that's because the money is hiding somewhere and you're spending it anyhow but you're hiding and spending it in firefighting rather than building something resilient. Or there's a different budget. The budget that the development teams need to do quality insurance and to do all of these things are not theirs. So it's a really painful thing. And then there's a fireman forget projects, obviously. So a lot of teams say, yeah, we do continuous delivery because they don't want to spend time in building multiple platforms. For those in the back who cannot read it, it says everybody has a testing environment so people are just lucky enough to have another platform that's not called production. So in order to get there, we need to do automation. We need to do infrastructure as code and we need to look at what we built and we need to write code. And what me code does not mean I'm going to write shell script and convert them to YAML. I want to basically define the state of an infrastructure and I need multiple components to this. I need tools that do orchestration. I need tools that do actual desired state configuration. So you're typically looking at modeling your infrastructure, writing code books or manifests or whatever you like to call them these days and have different environments for your infrastructure. So you can also promote the artifacts you deliver and ship them to different stages. And then you eventually end up with a working service which is the application code, the infrastructure code, and security and monitoring and everything is built in. This is a traditional one. There are parts with this, but lots of ops folks still need to be taught on how to do version control. Actually, lots of developers still need to be taught on how to do version control. Who has multiple branches here that live for longer than a day? Those are the people who are not doing continuous integration. Who has different tools to do dependency management? Who understands how to do dependency management? One person understands how to do dependency management. Two people. The thing is there is a huge variety of tools out there. And it's so simple if you understand how it works. So who knows what's on the screen? What's on the right side? On the right side we have something that looks like a librarian file where there's basically a module defined with a git repository and a reference. And the Ruby ecosystem has some things like that. The Puppet ecosystem has some things like that. Every single language invents their own dependency manager. What's the thing on the left side? It's exactly the same. And I hear somebody who actually saw what it is, but it's exactly the same. It's a list of a module with a reference to something in git. These are git sub-modules. They do exactly the same as the thing on the right. With one big difference. This is going to work for every single language out there. This, you're going to have 35 different languages doing 35 different things. And each time an ops person needs to help a tool, it's going to be like, oh wait, there's a different variant of how I'm doing things. Because all of those tools, all those people who have been building dependency management tools are basically wrapping on top of one basic simple component and making it harder for people to understand. And having those discussions, operations people between developers and realizing, well, this is the same pattern, but we're just using different tools. If you have that discussion, you can help each other understand it. You could use the same pattern and use this for different languages. Or you could understand like, oh, that's what you're doing. That's what you're trying to achieve. But in a lot of cases, people don't understand the goal of the tools they're using because they've never had to, unless you understand good sub-modules, obviously. So people have never done this before. People have seen a zillion branches in the ecosystem and they've never understood. And when you think about it, the branching makes it more difficult. So anti-patterns started happening. I had a really good discussion at the livery conf with the folks from Puppet Lab. About eight, nine years ago, they figured out their release management tool to do deployments on a Puppet Master was, well, we have this tool card, RTEN-K, and you're gonna have a branch for your development environment, a branch for your acceptance environment, and a branch for your production environment. And I remember being in the room with Jess Humble when they introduced that and we were like, I'm gonna teach the operations people how to do it the wrong way. But their whole ecosystem was built on, you use RTEN-K and deployments of your code base are gonna be fine, it's gonna be smooth. Only last week I was talking to the people who were now rethinking their new continuous delivery tool for the Puppet Code. And I was like, don't use the tools that already exist. Don't build your own tool. But they started to realize that maybe RTEN-K was not the best choice. Maybe there's other tools around. They could do exactly the same. So if you look at a typical infrastructure pipeline, what is in there? Configuration, why would there be configuration in a pipeline? And how are you gonna test completely different parameters for a stack? Configuration is to me actually the only thing which is not in a pipeline because it's close to non-testable. You can test the syntax, you can test whether the structure of the configuration is correct, but you're testing the code that needs to apply the configuration, you can never test the sizing exactly and the IP addresses exactly because if you can, for example, have the exact same IP stack and you're testing everything, then it's actually breaking your production because now you have duplicate IPs. So configuration is one of the most difficult ones. No. The thing that's in your pipeline is what I was trying to achieve. It is literally the same as what should be an application pipeline. You should check out your code. You should do some syntax checking, some style checking, do code coverage, do tests if depending on the language you use you need to build it, you need to build it, you need to do more tests on the artifact. You need to package it, you need to upload it to a repository and then you need to deploy it and that's literally the same as you need to do in every other language. Hopefully there's more and more testing, but yeah. And every artifact is like that. Your application code, your info code, your metadata and your tests, they all are artifacts that you want to ship through a pipeline. And you want to have patterns where you basically have development, sorry, development, acceptance, production or whatever values. You want to pull things from the internet, you want to have local mirrors for every single artifact. So, upload to a repository, test. And if you're doing infrastructure you do things like you check your papatrons if they're still clean, you check if your ansible stuff all still works, it's not breaking, you check your monitoring. It's like your smoke test with the applications. And your monitoring is crucial. You deploy a host, you deploy an application to a host, you leverage infrastructure as code so you automatically reconfigure your monitoring. You check your monitoring and it should stay healthy. And if you see that your monitoring breaks then either your application broke or your complete system broke and you should not push to the next environment. And all of this is automated because that's what infrastructure as code brings you. Those are the side effects. So, you promote to the next environment when your monitoring has decided that things have not been broken and you increase testing. This is what we do for applications, right? This is what we've been doing for applications for years. And then you go to the next target. And the next target. And the more of these environments you need to build the more you end up with how do I do this? The way we do this still and we're still really happy with this because we have a large JenkinsJobDSL code base where we do pipeline as code for our infrastructure as code. We basically have one large C job that pulls in a bunch of config files for all the projects we have and it generates our pipelines on demand based on a config file. But we have a bunch of libraries that we reuse for every single pattern. A checkout job is a checkout job because if it's Java code or if it's PHP code or if it's puppet code or Terraform code, I don't care. It is a checkout job. Everything we deploy is an RPM. Whether it's puppet code whether it's Java code or Python code, they're all the same. And the patterns I build in there like all the metadata I want to know where it's coming from, which build it was all of that information is the same. It's the same library. So what we created is a library used by both the developers and the ops folks to build their whole CICD infrastructure. And they have a shared language, a common language that works for everything. There's a mix of pipeline DSL and job DSL in Jenkins that works. But it's the same for everybody. And that's the strength. It's not one team using one thing and then the other team going like, oh, I cannot help you, I don't know and we even come to the point where we can in parallel test multiple versions of our code base because we spin them up in containers and we have different versions of our PHP stack but also of our puppet code base and their test. And we don't look at the next version because we don't care yet if it's going to break. But if it doesn't break, we know that we can upgrade. And then people come, yeah, but we're in the cloud. I don't see the difference. I mean, you're still going to write Terraform or Chef, whether you're on-prem or in the clouds. You might be talking to a different API. You might not be talking to the OpenStack API but to EC2. And if you need to configure VPNs, you're going to need to configure VPCs on the other side. You're going to have monitoring security in there. So all of those patterns which you've been doing, whether you have a Terraform pipeline or a Puppet pipeline, doesn't matter. It's the ID. You have to have the same tool base. You have to have the same ecosystem. But we're using containers. There's nothing different. You need to test this even stronger. We need to figure out better ways how to deploy this these days because Docker pull YOLO and we still have to ask all of these questions again because people who have been just doing Docker on their laptop, they're container naive. They haven't figured out that this is the gap they need to cross. So there is a huge challenge for us out there because we're an open source conference and I'm still struggling, maybe you don't, on how to do CI CD on my whole Qube ecosystem. I want to be able to spin up Qube stacks automatically out of the box, hands off from source code. Not the Qube source code, but some infrastructure codes. It's easy if you do it in a public light. Anybody got a solution for me? Way too early on a Sunday morning or nobody actually has a solution. So how do we do this? How do we do this in a container ecosystem? That is the challenge that I have for you folks. What I still see is people having 30,000 lines of bash calling Qube cuddle or calling Helm. That's not how we want to work. That's basically going back 10 years and not having tests because how do we test this? So infrastructure as code to me is the first thing you need to do if you want to achieve container delivery and if you want to have full control over the whole stack, it just got harder if you add containers to that. But it still is something that you need to spend time on because you want to be able to understand the same patterns and the same tooling that the developers that are building applications on top of it need to understand. And I hope the three cases kind of prove that. So it's time for the rest of us then. Questions? Can you prevent organizational gravity from taking over? Okay, so the question was in case two it seems to be an organizational problem. How do you solve the problem that if you move the people back in the team that they just get absorbed by the organization again? That's right. Well, it always is an organizational problem. It always is a culture problem. And the way you solve that is by having C-level buy-in. We pulled those people out and the only way we could pull them out was because C-level was actually pushing this and agreeing on this. So when they came back to their teams they were enabled and they were blocked from doing their old work again but they were supposed to help the people and help their teams. And because they had been on for close to three months, a bit more even, there were people who actually filled in the gaps of the problems they were solving before. So they weren't being bothered that much anymore with their old jobs. They were still for the highly critical things that were still being consulted and were still being asked to help but it wasn't their daily job anymore because they were gone for three months so some other people had taken over a couple of things. Other questions in the back? So the question is that if he understands correctly how to use Git sub-modules for dependency management. The answer is yes. The main reason to that is because all the other tools around basically try to reinvent exactly the same thing. The second reason to that is because if you do that you get exactly the same pattern that the developers will be using to do the same. I've seen a bunch of user interfaces built on top of these things which are all horrible. And what I've seen is once that people start doing this doesn't matter which language they use they start to care about where their source code is. They start to care about, oh I'm not pulling in random things from the internet that might be disappearing when somebody actually decides to unpublish a node module. No, they have the code locally, they can rebuild. So people start to understand what they're building and also they know that when they go back in time and check out another commit they will have close to reproducible builds. Combine that with being able to build your infrastructure and your CI infrastructure. The fact that they can do exactly the same means that at some point in time I can actually theoretically because I've never been able to spend the time on that go back and say I want to have the built ecosystem we had three months ago because I have the versions checked out and I have the repositories checked out and then I want to be able to build the software we built in. You have everything in place, it's not going to be trivial because typically you don't have enough resources and enough time, but in theory that's what you should be able to do. It works very nice with open source because you have the source and you keep a copy. This is an open source conference. Infrastructure deployment or the application deployment? Both, so the question is on this slide the container ecosystem is nowhere close to the level of automation reproducibility where I used VMs and if I want to talk about application deployment or the actual ecosystem and the answer is both. I'm still looking and I've been looking for close to four years for a fully reproducible automated way to deploy a full cube stack with everything in a picture. Most of the things that I see just pull random things over the internet totally not reproducible that's one part of the problem. The ones that do work are really cloud focused they only support AWS or whatever and not on premise deployment and then on the other side on the application level we have a bunch of package management tools that are OK-ish but configuration of the applications inside there's nothing. Helm is a package manager it's not a config management tool and that's the closest thing to reality. This image is there this image is there with this configuration and you can tweak not only in Helm but in Helm for different... So Helm is the comment line you used? Sorry. Sorry. Sorry a lot.