 Hello everyone. Thanks for joining me today. I believe this is like the last session of the summit, so I'm actually surprised with the amount of people in the room. So today I'm gonna talk about the Terraform and how we use to provision our infrastructure on top of OpenStack Cloud. Quick introduction, my name is Mikira Gubenko with Mirantis. My main focus is automating things, like provision of infrastructure and application deployment. So let's quickly review the agenda for today. We'll talk about what is infrastructure as code. Everyone is probably already doing it, but I'll just remind some key principles. I'll introduce Terraform as a tool we use in our team to manage infrastructure as code. We'll talk about how to actually write the Terraform code, what building blocks it has. State management and team collaboration is a topic that we were struggling the most in the beginning, so I'll address that. I also describe how we structure our Terraform code for multi-environment use case, and of course nobody's perfect, so I'll talk about the dark side of using Terraform. And we'll wrap up things with the Q&A session. All right, so what is infrastructure as code? I like the definition from Keith Morris, who's the author of the great book with the same name infrastructure as code. He states that it is in an approach to infrastructure automation based on practices from the software development. So basically you can manage your infrastructure the same way you manage your software projects. The key principles are using the configuration definition files where you declaratively describe how your infrastructure should be looking like. Things like VMs, images, networks, volumes, et cetera. Version control should be the source of the truth for everything. It has all the possible benefits, things like it's easy to track changes, easy to collaborate, easy to roll back, et cetera. You always test your application when you roll it to production, and the same approach should be applied to the infrastructure, so every change should be extensively validated. And you should push your changes more frequently. It's always easier to provision couple VMs now than provision them later with a whole bunch and other changes. What this gives us is we will be able to build our environment the same way as assembly lines manufacture products. We will have consistent and predictable result, and our environments will look the same exactly as we wanted. Since no human intervention will be needed, the process will be fast and efficient and repeatable. All your infrastructure pieces should be easily replaceable and disposable, so there is no things such as don't touch that server anymore. And since 100% engineers I know hate to write documentation, the documentation will be captured in form of definition files, code snippets, and scripts. So what are our options of applying infrastructure as code on OpenStack Cloud? The recent OpenStack survey shows that OpenStack CLI clients are still the number one approach to provisioned resources. So basically developers put this OpenStack server, create a noble boot commands into the ad hoc scripts, put them in the repo, and that's how they provision their infrastructure. But these scripts miss core features like idempotency, templating, simultaneous run, dependencies, et cetera. Tools like Ansible and Chef are more focused on managing the configuration files so we're talking about the servers that are already provisioned. OpenStack Heat Project is closer to what we were looking for, but unfortunately it's locked into the OpenStack Cloud and I think many of you already have hybrid clouds and if you want to use AWS or Google Cloud, you will need another tool for it. And it also does not separate the planning phase and execution phases, so you cannot validate your changes before you apply it. And then there is Terraform, which we'll describe a bit deeper. So what is Terraform? Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Configuration definition files provide Terraform information on what resources it needs to provision for a single application or your entire data center. The basic workflow working with Terraform looks like this. Basically it consists of three steps. First step, of course you'll write your code. In other words, you declaratively describe what resources for the environment you need, things like VMs, load balancers, networks, et cetera. Then you run Terraform plan, which will generate the execution plan that it will need to run to reach the desired state that you described. And then you run Terraform apply, which will basically go and provision those resources. And after it provisioned those resources, it will create a so-called state file where it will store the information of what resources it created and what's basically managed through Terraform. All right, now brace yourself. A lot of code slides are coming, so I hope you will be able to see them just fine. So imagine our Hello World application. Terraform parses .tf files within the directory you run it in. So we create the Hello World .tf file and with this content in it. First, we specify the provider, in our case OpenStack, of course, where we supplied parameters such as Keystone, URL, tenant name, tenant password, username, et cetera. For the resources, we edit OpenStack compute instance, which is basically AVM, and supplied it with the things you need when you boot VM through OpenStack, things like image ID, flavor ID, keeper, network, et cetera. In addition to that, we created a resource for floating IP and specified the pool where to allocate it from. And the last we created association between this floating IP and the VM we created. Now we can run Terraform plan, which is going to go and compare the local state of things with the state of things in the actual cloud and create execution plan on how it will reach the desired state. So in our case, it suggests to add two resources, floating IP and association with floating IP, because the instance was already created before. So after that, after you reviewed this execution plan, you can basically apply this. So it will go and create those resources. That easy. So what building blocks does Terraform has? The first thing is providers. As I said before, this Terraform tool is vendor agnostic, which means you can use endless number of providers, things like AWS, Google Cloud, Kubernetes, Bitbucket, et cetera. So you can basically manage your multi-cloud, multi-environment application through the single tool, even from the single repo. Resources are the components of your infrastructure. As I said, for OpenStack, you can create instance, floating IPs, sender volumes, et cetera. Variables define the parameterization for your Terraform code and Terraform supports strings, maps, lists, as possible types of variables. The outputs define the way to highlight some of the variables, to highlight some of the variables when you run Terraform apply. So basically, in our Hello World example, I want to know what floating IP will be assigned to the EVM, because I didn't know it before I ran it. The provisioners are the way to run some scripts after you provision certain resources. So for example, if you provisioned the VM, you want to bootstrap it, or you want to clean up resources before the VM destruction. I would say the most useful ones are remote exec and null resource, and the remote exec will run the script on the just provisioned resource, for example, VM, and the null resource can run those scripts on any resource within the current environment. I think that Terraform provisioners won't replace configuration management tools, things like Ansible and Puppet, but you can easily hook them up using the specified provisioners. And then there is modules. Modules are the self-contained packages of Terraform code, which are used for code usability. If you work with at least one programming language, you know what modules are for, and Terraform modules can be stored in different sources. The most popular one, of course, is local deer, and you can also grab it from GitHub or S3 object storage. And nested modules are supported too, so you can nest one module and the other, so to structure your code even more to organize it better. Let's look inside the module. It's the same code we used to provision the resources, but we added some additional variables to it. And after this, we can execute this code multiple times just using the module statement and as the source specify this module directory. And what it will do for us, it will create these three HAProxy nodes and seven node nodes using the same piece of code. All right, state management and team collaboration. As I said, we struggled with it at the beginning, it was our team, and I'll explain why. So as I said, Terraform apply creates a so-called state file where it stores the resources that are managed through Terraform. So in case you are working alone on the code, you can just go with the local file and it will work fine. But once you add more engineers to the team, there's two problems. You will have to sync your state file between you two, or how many people, and so everyone will be running the Terraform apply against the fresh version of the file. The second problem is you want to make sure that you don't run Terraform together at the same time, so you will have to create some kind of locking mechanisms to prevent that. And within our team, we evolved this rule. So if only one or two person work with Terraform code, we suggest just to store it in the object store, yeah, Terraform supports remote storage to store object store, to store the state file, and object store is one of them. You can use Swift or Amazon S3. And what it gives us, it will be a single source of truth for everyone, so you don't need to sync that file between team members. It always stores in the same storage. It also can be versioned and encrypted. Some of you might ask why don't you just use Git? The problem with Git is that it's easy to forget to push or pull changes of the state file, and it still will end up that you are running Terraform apply against the obsolete version of the state file, which can break your infrastructure. So if you want or two people, you can use it in object store and coordinate between you two how you run it and when you run it. So if your team is bigger, maybe two to five people, I suggest to coordinate using the change requests. So basically everyone in a team aware of when you run it and what changes you introduce into the environment. And if your team is even bigger, I suggest that there should not be any human intervention in running Terraform apply, so you can use any CI system for it. In our team, we created this Jenkins pipeline that based on the commit will go and run Terraform apply and stop right there. Then operator will go review the changes in the Jenkins so he will review the Terraform plan. And if he approves the change, Jenkins pipeline will go and apply it. So that's how we sold it for our use case. And we were playing with the system for quite a while, like building some workarounds here and there. And then this happens. So in 0.9, HashiCorp introduced state locking in Terraform. So this basically it gives you ability to local all the rights if somebody already run in the Terraform apply. But unfortunately not all the back ends are supported yet. So if it's not supported, and I believe the Amazon S3 is the best place to store it and use locking in DynamoDB. With all that said, I still believe that CI system is the better choice to go right now. It's more reliable and safe way to run those Terraform applies. So with storing state file remotely and building those Jenkins pipeline, solved us some problems, but another problem we were having is isolation. When you first start with Terraform, you're tempted to create all of your configuration within the single file, all the set of files within the same directory. But what it will do, it will create the single state file for all of your infrastructure. And if you are working with the multiple environments, the whole concept of the environment that it should be isolated as much as possible. And if you are running the same code for every environment, it's breaking that rule. So we decided to go with this directory layout. We basically create the separate directory for each environment to store the code. But in addition to that, we also isolate services. So for example, our MySQL service has its own infrastructure state file. The front end up has its own state file. What this gives us is that if someone makes a mistake with the code, you can break on this single service and it will shrink your blessed radius of the damage. Okay, the dark side. So while working with Terraform, we discovered some disadvantages or things we didn't like. So as I said before, the blessed radius is a big thing because if you manage all of your infrastructure through Terraform, you basically give your keys to your kingdom to the Terraform. And if you screw things up, you can basically break your whole infrastructure. So isolation is one of the steps to solve it and to address that. It's still young. So you'll hit the box here and there and the compatibility between even minor versions, sometimes not that good. So you should consider that when you upgrade your Terraform binary. I personally believe that Terraform has steep learning curve, especially because of the interpolation syntax it has. I didn't even touch it in my presentation. So go figure out yourself. I think it will surprise you. Which brings another point is that, yes, you will have to learn additional language, which is HashiCorp configuration language. Terraform tends to store every possible piece of resources it provisioned to the state file. So you will have to think about how to encrypt your state file so those sensitive credentials will not be seen by anyone. And the last part is, of course there is no rollback. So you should double check your changes, made some code reviews and be careful when you run Terraform applies. Because it can break things up. That's it from me. Thanks for your time and I'm ready for some Q&A session and microphones right there. So you said there's no rollback. Is rolling back your plan file, your Terraform file, and then applying a second time the other gotchas there? So by saying no rollback, imagine if you change the image ID on your resource, it will go and just reprevision those VMs that we're using this resource. So if you provision 10 VMs with the older image and then you go and replace the image and run Terraform apply it, it will basically wipe all the VMs. So that's what I was saying when I said like, there's no rollback. So you cannot rollback wiped VMs or volumes. But yeah, the one thing I like about Terraform that you can easily apply incremental changes, which are like, if you need additional VM you just go to Terraform code and when you run Terraform apply it, we'll incrementally add this VM without touching the previous resources. So what happens exactly if the resource, the virtual machine is not mentioned in the state file? Terraform just won't touch it, right? Yes, the state file just describes of the resources what Terraform should manage and if resource is not in that file, it won't touch it. So let's say if you are provisioning a brand new stack every time you do the release and then after that you are wiping the old stack, it's less scary solution. Can you reiterate the... Well basically if your deployment is done by provisioning a brand new stack next to the previous stack, this step does not hurt the previous stack, right? Yeah. And then once you are confident your new stack is working you can destroy the old stack. Yes, that's one of the ways to address that. Okay. And how does destruction go? You just run Terraform Destroy and it will ask you, I remember two times, like, are you sure? Are you sure? And then it will go and wipe everything. So that easy. Thank you. Unity async. So there's a couple of things there. There's like, I don't remember the name of the tool. I'm just curious, like, your practical experience. My practical experience, we just used this CI pipeline with the Terraform plan and that's how we test it. But there are some tools that can address the testing itself. You're probably talking about test kitchen. Yes, yes. I think I mentioned one of the advantages is the multi-cloud provider support. How portable is your plans from one cloud provider to another AWS to OpenStack? Unfortunately, you will have to manage all those environments with separate code because the resources names are different for each environment. So for OpenStack, the resource name will be OpenStackInstance and for AWS, it will be AWS instance. So it's not portable. But it will still, it still gives you ability to store everything within a single repo project. Is it not possible to... I believe not. That's too bad. Hi. Another question about the update. Does Terraform have an ability to apply something like a slow rolling update to clustered services? Or is it gonna go out and just try to update all of them simultaneously? I believe you can specify, when you run Terraform apply, you can specify as additional variable the resource you want to run it on. So basically, if you want to run the update on the set of the machines, you can do that using CLI. Can you have it automatically roll through or do you have to do that as an operator? I don't think so. You probably will need to create some kind of a wrapper that will go and do like canary deployment or some sort. Thank you. You probably forgot to mention that Terraform is able to import resources from your current stack. But it's very painful operation, I don't know if you've experienced with this. But if you have any stack of any size, you have to import every single resource manually. Yeah, that's also one of the big things because it's usually easier to build a new stack than import the previous resources to Terraform. So... And there was a question about the resources that don't exist in your state file if a resource already exists but doesn't exist in your state file, you probably want to import it inside the state file. And I also wanted to mention that the feature, the thing you mentioned that if you don't run a plan before apply, it can have unimaginable consequences and that makes Terraform basically unusable for CI-CD pipelines because what we do is we run Terraform plan in our GitLab pipeline, then the pipeline is paused until someone comes, examines the... Yeah, that's the same idea we use. So you don't have any solution for this? You still need this guy that checks the pipeline? Yeah, yeah. Okay. A question on the state file. You said that it will store your credentials in plain text in the state file. Does Terraform have any built-in method to encrypt or decrypt the state file or is that something we have to handle as a separate job? So I didn't try it, but there is some plugin on the GitHub someone wrote to somehow encrypt it, but I didn't try it yet. I don't, I mean, it was doing it like a couple months ago, at least for AWS resources for sure. For Amazon, I don't remember the component like a SQL database. It was storing the credentials inside. And there is a bug for it and it's still not addressed. Thank you very much. Thank you. My other question is, whether you can have runtime configurable features, like for example, virtual machine would generate a secret key and then the secret key would be passed to a different virtual machine configuration, or maybe even on the fly during the application, go to some URL, pick up something from there and then use within the configuration. So as I said, in my presentation, there is thing called provisioners, which you can put everything there, like it's basically bar script or something and you can go and do anything you want, even on the local machine, like on the machine you are actually running Terraform or remote resources on the machines that was just provisioned. I believe that's one way. Thank you, just to cook one more. So Terraform can do both virtual machines on OpenStack and for example, Kubernetes containers. Yes. Regarding the locking with the DynamoDB, is there any plans to use something else that might be more OpenStack friendly than having the AWS? As I said, just use this CI system like Jenkins. Would you say that OpenStack integration with Terraform is good enough? Is it stable, is it production ready when you want to create something it's actually created? Yes, actually it's good and we use it in our production and it has, I would say it has everything needed usually. So things like instances, load balancers, security groups, keepers, images. So it supports all resources in OpenStack. Yeah. Okay, thank you. I wouldn't share this opinion with you. I didn't say it's production ready because at least you cannot import OpenStack instances. It's still not implemented. But if we are talking about like a new environment. And during I think two or three months I'm using it, I hit around seven or eight bucks that I reported to Terraform developers but to say something good about them if you report any issue to GitHub, they usually fix it in one or two days. But sometimes when you see the fix, you really say to like, oh shit, they committed 100 lines of codes and merge it in several hours without any review and they pull it to two master and then another day they find another bug. It's like, we call it a hipster company because those people that are writing Terraform, it's right in go. And I don't want to sound like someone who doesn't like Terraform. I like it very much, but it needs one or two years to finish. Sure. And it changes rapidly. And as a side note, I think like their main use case is AWS and AWS support is like phenomenal. It's stable and all resources are supported almost. So, but yeah. Any other questions? Thank you.