 All right, I guess we can get started. We're ready to talk about infrastructure as code. All right, so I'll introduce myself a little bit. I'm Keith. I'm a consultant at ThoughtWorks. We tend to help companies with delivering software and finding ways to get better at delivering software. My role within that is, traditionally, when I started seven years ago, it was around engaging with operations teams to help with that. If we're going to do continuous delivery, what do we need to do to get into production and make that path smooth, right? So we'd typically start with the development side of the organization. We're trying to bring the operations side into that. And then over time, because cloud and infrastructure automation has become more and more popular, more organizations are looking at how to, and more people and teams are using these tools, are looking at ways to use them better, basically. In my background, I've been working in IT for over 20 years across systems administration and development roles. I've always been into automating infrastructure. How many people here worked with a tool called CF Engine or aware of CF Engine? Hey, one person. So this is the predecessor to like Puppet and Chef and Ansible and these kind of tools. And so this was like one of the first tools that I adopted. Well, I first started out with shell scripts and Perl and those kind of things to automate things on physical servers. And then over time, I discovered CF Engine and messed around with that. And then Puppet came out and Chefed. So I've always been interested in this space. As someone who's crossed development and operations before, that was a thing with a label. That's always been a natural thing for me to do. So I wrote this book, Infrastructure Is Code, basically to capture. So the term had been around since about 2006. Around the time that the term DevOps came out, some of the same people kind of coined the term. It's a bit murky as to who actually came up with it. But it kind of emerged in those days. It's been a good handy label for it. So I try to kind of capture patterns and practices for implementing automation with this book, basically things that I've been talking to people about how to do. So today I'm going to talk about how many people saw my talk yesterday, a few people. So there's some, a couple of the slides are similar, because I'm kind of starting from a similar point, like setting the context of what are the kind of pressures and forces in the industry that are affecting kind of our organizations and leading to the adoption of infrastructure automation tools in cloud. Whereas yesterday I focused on a bit more on the kind of governance and change management and what it means at that level. Today I'm going to focus a little bit more on the engineering practice as part of it. So the kind of the key driver is around being faster, faster to market, faster to kind of try out new ideas and improve their release, continuous improvements to infrastructure and to systems as a whole. So this has been kind of an overall pressure that we're getting a lot of these days. And so cloud is kind of a means to help with that. So moving away from having to provision infrastructure by hands and taking quite a long time to saying we can do it automatically. And I use cloud as kind of a hook to also talk about infrastructure automation, even if it's not literally cloud, but it's all this kind of stuff of making our infrastructure more flexible. But of course, if we go faster, we have to think about the risks that are involved. So just because we can let people spin up servers for themselves and do all the things for themselves very easily and quickly. That doesn't mean that kind of the problems go away of we have to make sure that we're doing things correctly, we have to make sure that we're not gonna break something. And so how do we manage that at speed, at pace? And so one of the themes that again, I talked about this yesterday is the idea that we traditionally think of having to trade off between going quickly or doing things properly that there's, you know, you have to either choose one side or the other to kind of lean towards. And so the state of the DevOps report, it's something that comes out every year and I highly recommend having a look at it, especially if you're in a position where you're having conversations within your organization with kind of stakeholders of people who, you know, executives and so on in management around why should we be doing DevOps? Why should we be doing continuous delivery in more agile ways? Because it has research into real companies and what their kind of practices are around agile and lean and all these kind of things. And then what the results are, so how effective they are. And one of the things which comes out of this report, a really strong theme is that companies which are the most effective, that is the companies who have fewer failures when they try to release software who spend, whose staff spend least amount of time fixing things in remediation activities. And even the companies who are the most successful in terms of, you know, commercially are the companies who are releasing software more frequently and releasing changes more frequently, right? So that's kind of counterintuitive because it means that the faster you go, the more reliable things are, the better results you get. But I think it sounds counterintuitive at first but I think the reason for this was a couple of reasons and it's basically because in order to go quickly you have to be good at it, right? So if your processes are really difficult and unwieldy and lots of manual steps and things that go wrong, you can't go quickly, right? So in order to be moving at speed you have to get good processes and get good at making changes without making mistakes. It also forced us to manage technical debt because technical debt slows you down. And so in order to be able to, again, to make changes very frequently, even at the pace of say weekly, much less daily or a couple of times a day, you can't have systems which are very flaky and where it's really hard to change things. You have to have systems where the design and the implementation is very clean and easy to change and easy to understand when things go wrong. And also the broken windows thing I think is important. So there's this phrase of broken windows which came from kind of a law enforcement campaign in New York where the mayor was trying to kind of clean up the city and improve kind of the state of things. And the theory is that by fixing the small things helps improve the overall culture. And so if you see something, for example, if you see something wrong in your system and you think, okay, it's not broken, it kind of works but it's a little bit dodgy and it could, something could break in the future. If correcting that, improving that, takes like multiple weeks, it means you have to justify it and go to a cab meeting and have lots of process around that, you won't fix it. You'll put it on the list of known issues and you say, well, maybe we'll get to it later and it'll just get fixed at some point. And that list builds up. And then you have this kind of culture where it's acceptable to have dodgy code and systems. By lowering the barrier and saying, we've got the processes in place, we're getting good at change and we're able to say that thing that looks a little bit dodgy, we can just fix it and be confident that it's not gonna break something to fix it. We're confident in making changes. That just kind of improves the overall quality of the system. So it kind of makes sense when you look at those organizations who are doing, who are very, very good at IT, that this is the approach, they're making changes constantly because that's how you fix and improve and learn is by making changes, all right? So what good looks like overall, what we kind of want to get to is where we can have multiple teams working on our system and a complex system potentially with many moving parts, but we are minimizing the overhead of coordinating changes between those teams. We don't have to spend a lot of time talking about what changes we're making and that changes our routine thing that happened easily. And also that we can rebuild anything at any time. So any part of our infrastructure, we're very confident that if something happens to it or whatever it is, we can just, it's not flaky, it's not something we have to be very, very careful, don't touch it, it might break. So when I first discovered virtualization, so I used to work with physical servers and when somebody came to me and said, oh, we have this application we want to deploy and we need a server for it, I would have to provision hard, I'd have to buy some hardware, assemble it and take it down to the data center and so it was a lot of work to do and then I discovered VMware and we installed it in our office data center where it had kind of all of our development environments and non-production things and it was great. So somebody came and said, I kind of have a server to ask, sure, I can get your VM, VM for you, VM for you, anybody who wants a VM, I can just do it like that, a snap of a finger and so that felt really awesome, like Mickey Mouse and the Sorcerer's Apprentice where he's got his magical brooms that are just doing all the work for him and it was awesome and then what happened to Mickey is the same thing that happened to me, I ended up with loads and loads of virtual machines which are out of control, they're all different versions of things, I couldn't keep them all patched enough to date, it was just a big mess. So part of what happens is this thing of configuration drift and the idea of configuration drift is that even if you start with servers that are identical, so you create a couple of servers, you create them from the same original image or you copy an existing server that's running and it has everything installed, it's very convenient, over time they become more and more different because you go into, you have a problem on one server and so you go and fix it and now that server's different from the others or you have, this application needs maybe a different version of Java installed and so you go and you update that or whatever it is, things kind of over time get all out of whack and we would like to think that configuration automation tools, things like Ansible and Puppet and Chef, these will help us manage that but what happens in practice is that we're not able to kind of use these comfortably across all of our things so what will happen is, and let's say we have an application, most of our applications are running on Java 7, we have a new application that needs Java 8 but our existing applications, some of them don't work with Java 8 yet, we're gonna need to do work for that so we go into one server, okay, we have our Ansible script that installs the JDK on machines and so what we do is we tweak it to install Java 8 and we pointed out that machine that needs Java 8 and it runs against that and now we can't run this automation script against all of our servers, we have to kind of edit it or kind of manage these different versions of the code for different servers and so we kind of lose confidence because what will happen is people will do something like that and then they'll, or things will change, like somebody will go and patch a server for instance or make a manual change and then when you run the configuration tool, it breaks on some servers and on others and so you say, well I need to go and do a manual thing then to get this, I have something I have to do right now and automation is kind of broken and so I'll go and do it manually and that kind of increases that drift and the difference between servers so you just don't have the trust to run the tool and so you kind of, you don't get the full value out of the automation compared with what the promise of those tools is. So one of the things that we need to do in order to avoid this or to manage this is to run automation continuously so the way that most of these tools are designed to work is not that you run it from the command line and say I want to make a change and so I'm going to edit my playbooks then run Ansible. The way they're meant to be run is on a continuously because it has an agent that just runs all the time and keeps reapplying the configuration and so what this means is you have to be disciplined about it so if you have those two different versions of Java that you want to run, you're going to have to say okay, I'm going to have two playbooks, actually two roles or whatever it may be, two cookbooks or a parameter to those cookbooks which says these servers have Java seven, these have Java eight and so the automation can run against all of those and whenever something does break, you go and you fix it, it's like the continuous integration or whatever. When something breaks, you go and immediately fix it so that the configuration can continuously run and you can kind of keep that level of confidence. The other kind of strategy for this is the kind of immutable servers. How many people are familiar with immutable servers or immutable infrastructure? How many have heard that term? So a few have. So the idea of this one is rather than saying we want to make a configuration change, we've got a server that's running and we need to make a configuration change, maybe we just want to patch it, maybe we want to tweak a configuration setting or whatever it may be, add a new thing or rather than changing it on the running server, we build a new server with the change on it and then we can kind of test it, make sure it's okay and then redirect traffic or whatever service is provided by that server or the new server and then once we're comfortable with that's work, you can kind of tear down the old server. But the point is to kind of do this frequently that you don't have servers that live for very long and so again, you're confident that everything is managed and defined by your teams. Because one of the things that's difficult with automation and we've seen this with automated deployments and with automated testing is that doing large batches of change at once is difficult. So with automated testing, the pattern that we often see is where you have a testing team that has the automation tool, right? It's a UI-based thing. And so whenever the developers finish a batch, whether it's a release or an iteration or whatever, then they hand it over to the testing team and the testing team runs the test scripts, regression suite and they break. And they don't just break because developers, there's bugs, they usually they break because developers have changed the code in some way, which means the tests need to be updated to reflect that. And so what happens is the longer the period is between releases of software from development to testing, the more work it takes just to get that test suite up to date to where it's actually telling you whether something is broken versus things have been changed. And so, when you've got long period, like this tends to kind of not work very well and kind of people kind of give up. You have the automation effort, you have this point where you invest a lot of energy and effort into updating the regression test suite and it works. And then the next time around, you don't have time to update it and so it just gets kind of abandoned or run. So we've seen this one with the test automation and this is similar with infrastructure automation. Rather than saying, we're gonna go, we have a cloud now, we're gonna use AWS. We're gonna go to the console and set up a whole bunch of servers and figure it out and then automate all the things that we figured out. You kind of need to kind of make your changes bit by bit, but with the automation. All right, so each kind of change that you make, you start with the tool, with the automation tool and make it there and run it and see how it works. And then do the next change and the next change. And this means that rather than automation being something you're trying to kind of retrofit, it's something that you do as you go along, it becomes your habit. It's something that can take a while to get to grips with and get comfortable with. Once you get comfortable with it, it's kind of hard to go back. So when I started working with a team a few months ago that was working with Azure, Microsoft's cloud, I wasn't familiar with that platform but I found it actually easier to learn by using the automation tools, the resource management templates and these kind of things. And using the GUI to something to look at but not as a way to kind of manage infrastructure. So this, I haven't defined, I've been talking about infrastructure as code. The definition I give for this is basically using tools and practices from software engineering and particularly agile software engineering and applying those to our infrastructure. All right, and so this is kind of how I use the phrase. So the kind of very basic thing we do that is we define our environments and things about our environments and our systems as code. And so what this means is we can reuse stuff, right? So once we have a definition of this is what an application server looks like, we don't need to kind of redo it for a bit. We can just reuse the same code for multiple application servers. And so this is where when you have, for example, an operations department or an infrastructure team, they have to request, can you create a server for me and it might take a couple of weeks because they have to go and configure it to spec. You have this kind of standard server that you just, okay, yeah, fine, we can do it. Actually you can do it yourself. We'll give you a button or a script that's been well proven and we've used it for other application servers and you just run it and you've got yourself one. So it's reusable and it's consistent because you know every server, every application server you have is created with this automation and they're all consistent. And even if there's some variation in how they need to be configured again, Java versions maybe or Tomcat versions or whatever it may be, that's still within the automation. So all of our Tomcat seven servers are the same and consistent in the ways that they need to be. It's also visible in that anybody who needs to understand how things are configured, rather than having to log on to servers and see what's going on, they can just look at the code. And this applies to kind of security folks and auditors and architects and anybody who has kind of like governance responsibilities and wants to make sure our team's doing things correctly, they can look at the code to see how it actually is rather than going to look at documentation and diagrams, for example, which are probably out of date. Obviously you can put it into version control so you can go and see the history of changes that have been made to it. And it's actionable. So what I mean by actionable is if things are defined as code and you stick them into a version control system, every time somebody commits a change, you can trigger some kind of action so you can see that a change has been made. You can do things like continuous integration as I'll get onto. So the prerequisite to being able to do this kind of stuff is what I call a dynamic infrastructure platform. This is basically where you have a bunch of an API that lets you manage compute resources, network resources, storage and so on. And I use this term, it's kind of like, so it's cloud infrastructure as a service is the obvious thing that we're talking about. But the reason I use the term dynamic platform instead is because it doesn't have to be a cloud, it doesn't have to be an IaaS type cloud. You can do it with virtualization. So with even VMware, even the non-D cloud VMware, we can have scripts and stuff that make it work this way. And even with physical infrastructure, you can make it dynamic by using tools like Foreman or what have you to automatically boot hardware and install operating system images on them. So if you need to work at the hardware level for your particular system, you can still have this kind of automation happening. It takes more work. It's obviously very easy to do it with the cloud, that's kind of the ideal and lowest barriers to doing it, but it is possible to do it in various types of infrastructure as well as with platform as a service, containers, serverless, all of these things, they apply whatever it is that you're kind of managing, whatever the platform it is that you're deploying onto. So thinking about the different parts, the different kind of tooling that's involved in this. The kind of high level thing is this environment definitions or environment provisioning stuff. So this is things like Terraform or CloudFormation. Each of the clouds has their own kind of things. They've got Ansible Cloud modules and there are other tools that do this. But the point is that it's something in which you point to the cloud or your dynamic platform and say, I want these servers, I want these networking rules put in place, I want this storage created and provisioned and attached. So it kind of works at that level and so you can define an environment instance. And so there's this idea of the stack. This is a useful thing that I'm gonna use for kind of some of the further things I'm gonna talk about. So AWS CloudFormation tool uses the term stack. Other tools don't use the same term, but I use it because it's just an easy way to get your head around it. And what the point is is that you've got this project. It might be some CloudFormation templates including things that import stuff or it might be a Terraform project. But the point of a stack is that it's a kind of a collective elements that are all managed as a group. So Terraform, how many people here use Terraform and are familiar with the concept of state files? A few but not many. So the point is when you make a change to your, you get this project that defines an environment and the things in it. And when you run Terraform, it creates a state file which says, here's the stuff that's managed up in the cloud and Terraform kind of uses that to understand when you make a change to your code and you run it again. It says, okay, what are the things that are different and which parts of the actual infrastructure am I managing? So that like, if you create a server but you happen to have other servers that are created independently of your tool, Terraform doesn't say, well, that's supposed to be my server and I'll do things to it. It knows what things it belongs to. And CloudFormation and AWS, although it doesn't have that explicit state file, it still has that same concept of there was a stack of things and when you change the code for that, it knows it kind of hides that for you and manages that for you behind the scenes. So the reason that the stack is useful is when you think about having multiple environments. So we want a staging environment, we want a production environment. The first thing that people tend to do with a tool like Terraform or CloudFormation is to say, I'm going to make my project and I'm going to have, I'm going to find in here my staging stuff and my production stuff and it's all in the same project and it's all in the same stack. So what happens is, and the reason I put the unhappy face here is what happens is there's a concept of blast radius and Charity Majors is a blogger and runs a company, Honeycomb, I hope. I think kind of popularized that term for this domain. And the idea is that when you go to your, and say I want to make a change to my staging environment and I'm saying I want to test it there before I put it into my production environment, you change the code there, you might accidentally break something in production because it's part of the same stack. And so you change this little bit of code, you might not realize that there's actually an impact on this code, it's something shared between them or whatever or the name is, it's a conflict. And so it's very easy to break something in production when you don't think that you're actually making a change to production. And so kind of the next step or a better pattern for this is to make separate projects. And so again, here's my code for the staging environment, here's my code for the production environment. If I want another environment, I just copy the code into another project and each one has their own stack. They have their own state file in Terraform or in CloudFormation, they're their own stack, literal stack in CloudFormation. And then the idea is that you can make your changes to staging and apply it and you're pretty confident you're not gonna break something in production. And once you're happy with it, you copy those, that code change into your production environment. And the face here is a little bit happier or less unhappy, let's say. Because one of the reasons why we like to use infrastructure as code is we wanna get away from this issue of environments being inconsistent, right? Because we go in and make a change to staging, we say, I wanna make this change and you change the code there and it's not quite right, you're not ready to change it in production. Somebody else comes along and makes changes in staging or they fix something in production and so you get kind of these differences which means something that works and staging doesn't work in production. You get surprised. So we wanna use our infrastructure as code to make sure that we're really confident that the only differences between these two are maybe things that are deliberate differences like maybe the number of servers in a pool or the memory size assigned to things might be different. But those are controlled things. And so this, even though it's using infrastructure as code, it's still, you can still end up with that kind of mess where you're copying code between environments and you forget to change a name, to change the label of staging to production when you copy it over here, whatever it may be, it kind of leaves some room for error. So it's fairly simple to do if you have a fairly small environment with like a couple of people managing it, it might be, this can be all right because it's fairly simple. But what I tend to recommend is basically treating it like we treat an application artifact like a Java, Warfile or whatever. This idea that what we'll do is we'll have a single project and we just, when we run Terraform or what have you, we create a separate stack for each, but we pass parameters to it with a different name. And so this means we've got a single file that works for each. We can make changes to the file and apply it to our development environment and test that. Then once we're happy, then we run the command again but tell it to apply to the staging environment, apply to the production environment. And this gives us consistency and also the control across these. And what this also lets us do is to do an automated testing. So the kind of workflow for this is as somebody who's working on this code as a kind of a developer, infrastructure developer, I can work on my local machine. I can make changes to my code and I've got my own instance of my stack that nobody else looks at and nobody else cares what damage I do to it. So I can mess around with the code, apply it, apply it, and then I get comfortable with that it's working and then I can commit it. I can push it to a source code repository and then I can have a CI server which will automatically trigger and this is where I was talking about actionable before. So it automatically triggers now and the CI server runs Terraform with the code that I pushed. It creates its own stack, its own instance of the environment and can run some automated tests against it and make sure it's happy. Then I can tear it down afterwards and promote it along so we can have a pipeline that says the code that we're applying to production is applied by our continuous delivery servers or something like Go CD or Concourse or Jenkins or whatever it may be, is actually running Terraform and so that becomes, it's not somebody running it from their laptop and making changes to production. You know, it's the servers doing it with scripts that are all versioned themselves and are managed so that we know what's being done in a consistent way and we know what's being done in exactly the same way that it was in our test environments so that we have a very high level of confidence in what those changes are. And we can also obviously introduce manual steps to push a change along so if you're a little bit worried about what might happen in production, you can have a stage where like a QA stage where a person manually looks over the environment, has an environment that they can test against, they look it over and then push the button to say, yes, this is good to go to production and then it gets deployed to production. So just like with source, this was like software continuous delivery is exactly the same concept we're using it for our infrastructure code. And now we start thinking about some of the more complex patterns of pipelines. So I've been showing this very simplified thing that there's a couple of stages. And so what we can do here is we can say, we've got our application code, let's say it's a Java application, we're building, maybe it's a Spring Boot or Drop Wizard application that builds a jar file. So we have our CI stage, the app build stage, the CI continuous integration for our application code runs the unit tests, if it's happy it creates that jar file and puts it in a repository. And then similarly for our environment that we want to deploy that onto, we have an infrastructure, we have like I say a Terraform project. Again, we do as I described, we work on it, we push that thing to source control and our CI stage for the infrastructure runs and test the infrastructure. And then only when a change to either one of these has passed, we run the stage which says, great, now I'm going to create an environment or make changes to an existing test environment and deploy the application onto it and run the tests, like the journey tests or whatever it may be to make sure that it all hangs together and works correctly. And so this is something I've done, I used to do it more where I would apply the infrastructure like Chef cookbooks or whatever directly onto the environment, then I would break things because I messed up a cookbook which would mess up the servers which meant that the development teams couldn't kind of work, they had to wait for me to fix my problem. I mean, given that I was working with infrastructure, that problem that could be quite big. So I got to this thing of having an infrastructure test and infrastructure first and only once that has passed and looks good do I pass it along and potentially avoid breaking the development process. And then these things move, so the infrastructure code and the application code move together throughout all the environments and we know the environments are consistent and we've tested, once it gets to production, we know we've tested this version of the application with this version of the infrastructure and so we're very confident. And so going down to the next level of tooling, so I've been talking about that kind of environment level stuff, the cloud formation terraform, now looking at what happens with servers. So part of what's in those definitions is, it says, I want an application server and a web server and a database server, for instance. And so then it doesn't manage what goes on to those servers, it just kind of says create that kind of server and so we need a tool to kind of manage that. And this is where we've got tools like Ansible ChefPuppet, which manage what goes on inside a server. So what packages get installed, what user accounts get created, what configuration files are in place and what's in them. And so the way this kind of process works is you have like a base server image, like an AMI or a VMware template or whatever it may be. And that gets spun up and then the configuration tool runs and does things to it. Right and now you've got your running server and that's cool. And so then in terms of pipelines to manage this stuff, this is where we kind of break things down and so if we look at like server roles, where they say like, a web server, an application server, when we have an, as I say Ansible playbooks and roles that kind of configure what goes on to those things, you can have test stages for that. So rather than having to, because what happens with that when we're using Terraform or CloudFormation to spin up a whole bunch of infrastructures, that kind of takes a while, right? So that the feedback cycle is a bit long to wait for those automated, continuous integration tests at that level. So we can say before we get to that even, before we get to that infrastructure test stage on this level pointer there, before we get to that stage, we're gonna say, let's do some continuous integration testing on just our server configuration code. It's we might use like a virtual machine or we might use some, so on a few projects now we've used Docker images where we basically spin up a Docker image that has a Linux on it and then it runs our Ansible playbooks or our cookbooks or whatever it may be and then it runs some tests, some automated tests using something like server spec. And we can use a test, a tool called test kitchen, which is quite good for managing these, spin up some infrastructure, apply some things to it and then run these automated tests and kind of manage that process. And so this is kind of that that we can get faster feedback on whether we've got our server level configuration stuff, correct. And if it's correct, then we can pass it along and say, okay, now use Terraform or whatever to spin up an actual virtual machines on our infrastructure with the networking around it and run some tests that look at that level. Does everything still kind of talk to each other up at that level? Now the next thing we can do is I talked about previously where we have that base AMI that we use and we spin up our server image and then we run Ansible or what have you. Actually we can go and we can start kind of building stuff onto the server image. Let's say we have an application server, we need to install a JDK, we need to install Tomcat. Those are very heavyweight things. Every time you spin up a new server in any environment, having to run your Ansible playbooks to install those things, it takes a while. And there is also some element of risk. So even if you're, you'll break something, right? So even if you're running the Ansible playbook to install Java and you've tested it in an upstream environment, when you run it in production, it might, for example, bring in some dependencies, some, you know, you have your explicit dependency. I want to install the JDK with the JDK package says it has some dependencies which also get installed. Or maybe one of the versions of those has changed and so then it breaks something or something is different in production than what was in your test environment. So you can say instead, let's put it onto the AMI or the VMware template or even a physical kind of server image that we use. And then now every time we spin up a new server instance, we, you know, everything's already installed. And this also helps with things like auto scaling because if you're using auto scaling to bring up new servers automatically, you want that to happen quickly and you don't want to have to wait several minutes for all these configuration things to be done to the server. Yeah, so Packer is kind of the, pretty much the only tool that does this. It's certainly, it's by far the most popular. And what it does is it, you have like a server image definition. So Packer is a Packer template to JSON file you define what's the base server image. This is probably like an OS installation image or a stock image from, you know, with Amazon's AMI kind of library. So it says, you know, spin up a server with that and then run some things. And that can be things like you're answerable in your cookbooks or whatever it is, or it can be shell scripts. It runs those things on this kind of temporary server image. You can kind of do some tests to make sure it's okay. And then you turn that into a new AMI. And then you can kind of run tests on that. You can actually have a pipeline for that. So what we do here is we say, got a server image pipeline for our application server AMI. And that runs Packer to build the, to build a new AMI with a version. So you version these things so that you can kind of have, you know, changes to it that are versioned and tracked across time. And then you can like say, now what we'll do is we'll spin up a server image from that new AMI version, run automated tests on that. So here, this is showing here where we've got essentially like our playbooks or whatever that install Java and Tomcat onto the server. And then we actually spin up a new, once we've created the AMI at this point, we spin up a new server. We run the tests and make sure it's happy before we then pass that on and say, run our info test stage as we said earlier. Maybe we still have some extra cookbooks that have to run on that to do some runtime configuration. But the point is that we're kind of testing everything incrementally and bringing it together. So now, one of the big challenges that come up when we talk about things like using pipelines for infrastructure and defining our environment as a template, that things get quite big. So if our production environment looks like this, having a single stack, having a Terraform project which defines all of that stuff and then creates it and manages it in one big stack, even though we have our staging and development environments that are identical but separate, it's quite messy. So it takes a long time to test. And so one of the barriers to doing those automated testing and the pipeline that I talked about is it takes 60 minutes to spin up all the servers and everything that represent our whole production environment. And it's a very kind of fragile and difficult process and it becomes unwieldy, you can't really do that. And so this is coming back to the definition of a stack as a whole thing. So the solution to this to make this more tenable and more workable is, yeah, and the other thing is that you can break things within that stack. You have multiple teams working on this stack. So we might be changing something which might have an impact on somebody over there. And so we end up having to have a lot of coordination overhead between the teams working on our environment. So what we wanna do instead is break it down into multiple stacks. And this starts looking familiar. So if you're, you know, and from the software architecture world, this is like we've got a monolith and we need to break it down into microservices, right? It just happens to be infrastructure. And so that we wanna be able to change each of those things independently. And yes, there'll be integration between them. So this kind of thing over here might have, you know, might integrate with an application running over in this stack, or the infrastructure might integrate in some ways, but the boundaries become more clear, okay? So we've got, you know, it becomes more visible what the dependencies are between these stacks and we can manage those the same way that we manage dependencies between applications. And we can define those boundaries to minimize changes that go across those stacks. And so again, similar to infrastructure or certain applications, you know, so as an example, that a typical boundary in the infrastructure world is between those kinds of tiers of, we've got our web servers that are running with a firewall in front of them. And then our application servers are in a separate subnet or VLAN, so firewall in between them and then our database servers and another one. And so this is kind of some natural boundaries from the way that we structure our networks. But it turns out that's not great boundaries for managing infrastructure definitions. So that's useful because from a security point of view, you wanna make sure if somebody breaks into our web tier because that's what's public-facing and you can kind of connect to a port straight into that or through a firewall into that, if somebody can compromise that, you wanna make it harder for them to jump down to your application servers and then jump down to your database servers. But when you make a change to an application or to a system, you're often having to change things. We're making a change to our web server configuration and our application and maybe database. And so that if you draw your boundaries for your infrastructure that way, that means we have to change three different stacks at once to implement a single change, a feature or whatever, a feature change. And so then you've gotta coordinate between those and worry about what order of things run in. And so instead what you wanna do is draw the boundaries so that you look at your typical patterns of change in your organization and figure out what boundaries make sense. And this often also falls in with Conway's law and this idea that your organization structure will kind of align with or needs to align with your application infrastructure, your system infrastructure or architecture. And so you kind of need to think about all these things. What are our team structures? What's our desired system architecture? And then the infrastructure needs to kind of map to that. And so you can still implement those physical boundaries. So even though we've got this, these infrastructure changes which control web servers and application servers and database servers for one application, say one microservice maybe are all the infrastructure for that microservices and one stack, we can manage those together. But we can still have the network boundaries in there from that security point of view. So it's not like we have to, those are two different concerns which kind of map different ways, it's important. And then for each of these stacks that has its own pipeline which manages the changes to that stack. And so the team that's working on that can kind of push their changes through and test it and even test integration in other environments with other stacks for other applications. And so you can have that going on. And so in terms of sharing components, the way it looks is we've got three different application stacks, each of which deploys, or let's say they're microservices, right? They deploy into the production environment but they share like the AMI. So they all use that kind of Tomcat application server AMI built with Packer there. And this starts to help with the kind of, those issues that we have around sharing code and versioning and all of that. So somebody makes a change to this, to the Tomcat server because my team needs a tweak to the way Tomcat is installed. That will get fed into a pipeline and if the test is here past, then we know it's good and we can push it on through to production for other applications too. But if it breaks, we can kind of stop it. And so maybe the first application still pushes its change through to production but now we can have this conversation around, okay, what do we need to do to make sure that our application is gonna work? Which things do we need to change in order to make that work for everybody? And then another pattern for sharing things across is where actually you have things that are shared but deployed separately. So in some cases things, networking constructs, things like subnets and VPCs and so on, if you're familiar with these, it's useful to set up like one instance of that then have different applications deploying into that. And so you can have a separate pipeline where we make changes to our network infrastructure, shared infrastructure. We can push that through its own pipeline and into production independently pushing things through applications through. And this is to try to reduce the coupling. And as with applications where you do that microservices deployment patterns where you're not testing necessarily everything together, you're pushing into production, you need certain levels of maturity. So things like integration, you might have like consumer-driven contract testing where we say in our pipeline here, we're gonna test things that we're gonna run, test these people have given us to make sure that we're not breaking something they rely on. You also need kind of monitoring and you need kind of release patterns. So blue, green deployments, canary releasing and all those can come into play and be useful to make this work and give teams independence but still have confidence that you're not likely to break something. So how do we know that we've got this working well? So basically if you can make changes to your infrastructure to your systems in general easily and routinely without it being a big trauma or a big project that we need to kind of stop everything and coordinate, that's what you're after, right? And teams shouldn't be spending time waiting for approvals or resources, right? So if we need to create a new service and we need to create a development environment for that, we should just be able to kind of do it, right? If we get these kind of pieces in place, it should make it, everybody should be relaxed about that and not worried about, oh, we can't let the development team create a development environment. Because instead, they're gonna use the same code that runs through pipelines and is using the standard, again, application server images or whatever and any things that they need to do that are unique, we're gonna kind of put through all of that process. So it's all managed and comfortable, but also self-service. So just these are a number of books that I found useful, some of them influenced what I've talked about today and some of them are things that I think can be useful. So particularly around things like databases, so I haven't got asked about databases and I haven't managed to sit into the talk and talk about how we handle that kind of stuff, but there's a fairly recent book on database reliability engineering which talks about DevOps and databases and continuous delivery and all that, how it all fits together. And yeah, there's a number of other books here which are probably familiar to people, so. I think we have a few minutes left. Well, yeah, about a few minutes I guess or so. So if anybody's got a question or two, okay, server configuration tools, yeah. Which one will be more seamless? Okay, so. Should I repeat my question or you've got? I'll kind of summarize if I can and that might help also to make sure that I'm getting it. So in terms of those various tools, we've got various levels of tools that I talked about and so where do they kind of fit in together? And so very, very roughly, I talk about infrastructure, our environment provisioning tools or environment definition tools. This is Terraform, CloudFormation and so on. These are things which can say, spin up servers and networking and they basically talk to the cloud platform directly. And so kind of a question off of that then was, so in terms of if you know what cloud providers you're using, which tools are appropriate? So essentially each of these cloud platforms tends to have their own tools. So you've got AWS CloudFormation, Azure has ARM, Google has their deployment manager and so they each have their own kind of specific tool and so then there's also Terraform which works across them and Ansible can also do this kind of thing and Chef has tools, Puppet has tools. So I tend to find that like what I would tend to go for is if we're fairly confident in that we're gonna just use this one cloud platform, it might be a good idea to use that tool. It tends to be kind of the most well supported and so on, although that's not 100% recommendation. Terraform tends to be my default because it works across. It doesn't work across in a completely independent way. So you can't basically define what a server looks like with Terraform code and then run it on any cloud and it will work, right? You have to define it separately. So Terraform is kind of a thin abstraction layer. So what it gives you is if you're working across multiple cloud providers, okay, you can't write code once and run it on all of them, but you can say the tool is the same and the way it works is the same. So I don't have to use two different tools for that same function. And that can be quite nice and useful. And also it syntaxes quite nice. It's open source unlike most of the tools from the cloud providers. So you can kind of get in there more deeply for some of the other platforms. So Terraform has great AWS support. Its support for the other platforms is not as great. So I would kind of explore. I wouldn't commit to any tool before exploring and getting used to it and understanding the limitations. So that would be my recommendation is not to get tied into a tool too quickly. So one last question. Sorry, I had a question up here. So the question is, can you justify investing in a pipeline for infrastructure code if it doesn't change that frequently? So I would say if your infrastructure doesn't change very frequently, that's you may not need to make that investment. I think that's fairly rare these days, especially if you're using kind of cloud platforms. So like you give example of Linux code. So if you're only updating your Linux servers once a year, I think you're very dangerous from a security point of view. You need to be doing patching and those kind of things very frequently. And pipelines actually help with that. So if you have a pipeline that rolls out changes. So I normally patch my servers, at least weekly if not every day. And I can do that with the pipeline whereas if I'm using manual processes, I can't. And I can test while I'm doing that. So I would say in most environments that I've worked on, especially in cloud, things are changing so rapidly that for me it's difficult to justify not automating it. And also from the security point of view and compliance if that's a concern, these are really useful tools for that or really important patterns. So I guess that's it. Thanks a lot for coming out.