 For our next talk, we have Kamalika Majumdar from ThoughtWorks. She's going to talk about testing infrastructure code using TestKitchen, Docker, and Chef Zero. Yeah. Am I audible to everyone? Okay. So, introduction. I'm an infrastructure automation specialist and a senior consultant working with ThoughtWorks. I started as an assist admin at network administration, managing networks, and then moved to automation in cloud and private data centers. And that's what I do today. You can read my blogs at kamalika.net. So these are the topics you're going to cover today. Why TDD or test-driven development for infrastructure code? And what is TestKitchen? And how do we provision instances on demand with using TestKitchen? How do we configure them using Chef? And then how do we actually test the setup? And then I'll have a quick live demo of the entire process. So what are the problems that we face on a regular basis or more often on our environments? Manual setup of our servers. Each server has its own specific configurations. Then, so what is the solution to that? Automate that server setup so that we can reuse it. And those servers really become commodity items. And you don't need to worry about if a server goes down or you have to scrap out a server. No track of what changes has gone into the servers. So how do you solve it? First, you automate it using scripts or configuration management. And you actually version control those scripts. And then you track the changes that has gone. The most common problem that most of us face, code works only in local host, not in production. You may hear your developers talking. But it worked in my machine. How did it fail in production? So solution to it, host production like environment on your machine or locally. Then testing takes forever. So what actually builds up to that is that you don't have a scalable environment on demand. You can't really bring your machines or instances up and destroy them or recreate them faster. So for that, you need a scalable environment on demand. And that's when you can test your code faster. And that brings up to shorter deployment cycles. Infrastructure as code. You might have heard about it. What is infrastructure as code? What does it even mean? So you automate your server provisioning. You automate your server configuration. And you then version control that using gate or subversion. You tag your environments based on the roles they are playing. It can be a dev, QA, staging, production. And then you can then forget about what changes has gone in because you are tracking those changes in a version control system. So that's what you are treating your infrastructure as code. Like you treat your developer application, you check in to a version control system. You follow the same workflow for even your infrastructure or the DevOps code that you are writing. This is a sample infra code. I'll be talking more about Chef as I'm more comfortable with Chef. I work more on Chef. And that's why... So this is how a typical Chef code will look like when you apply it to your infrastructure. You will have your environment script which will actually point to your machines which are in production in QA or in Dev. Then you have roles based on each of your applications. You have a web server, dv app server, proxy server, whatever. And then these roles will have a set of scripts that has to go through to configure your environments. And those scripts are called recipes or run lists will be applied on each of your actual machines. And then you have data bags which are like a secret store which you can use for your keys. Your user IDs, your passwords, your URLs that you don't say... Suppose you don't want to check in to your version control system like that you want to disclose. You can encrypt them and save it in data bags. So why test driven development for infra code? Now today we have talked about automating your environment and how fast you can get your Dev environment. And what we are missing today in a day to day basis once we are done with our automation or our configuration management once we have written your puppet or Chef scripts nobody really cares about testing them. But do you follow the same workflow for your app code? No, right? You test your app code. There might be a bunch of tests going in. You have integration tests, functional tests. So like you need tests going on. So why not for your infra code? Because that equally goes hand in hand with your app code. So the problem, code works in my machine and not in production. That goes away. Code works in my machine as well as in production. Quick feedback for testing infra code. So the faster you can test your infra code, the faster you can deploy it into your staging environment, the faster you can deploy your app code. And you will get a faster deployment cycle, right? Shorter release cycles. So using this workflow, we could actually bring down our deployment cycle which was like once in two weeks to almost like every day in a week. There were modules that were deployed twice in a day even. And then the deployment time itself reduced for say 30 minutes and it reduced to five minutes for each app. And while testing it, we could get a faster feedback as well. Some real world scenarios which you need to test, especially for your divorce or infra code. Changing web ports. You might have configured your web servers using Nginx or Apache in port 80. But you don't want to use the default port 80 in your production. Suppose you want to use port 443 or 8080. What do you do? You go and change your parameters there. But do you test it with your application? You should. Adding new app metrics. Many of you monitor your application metrics using say Nagas or Ganglia. But where do we make those changes? We make those changes in our config management system. You add those metrics in your config management system, then you test it if it is failing in your current environment. And then you release it or check in to your version control system. Changing app user. You might have started using your app and your app runs as root user. But that's not a very good practice, right? You have to have a service user because there are security concerns. So you want to change it from root user to a service user. Same thing. Change your configuration script. Test it. So likewise, these are some examples which might need testing for your infrastructure code. So this is the example pipeline that will look like. You check in your code to get. It comes in your test it in your acceptance test environment. And then you promote it to your PET. That's what we used to call for our prepod or staging environment. And then it would go to production. Is this clear enough? The screen. So that's our CI pipeline view. That's the go CI. So that's how after you create your pipelines, that's how your workflow works, right? So in order to follow that, what did we use? We used test kitchen. So test kitchen is a test harness that has been built. And it provides you various drivers or plug-in for various cloud and private provisioning systems. So it has drivers for LXC, Docker, EC2, and likewise. And it has support for config management system. Right now it's only Chef. So it works with Chef server, Chef solo, and Chef zero. And it also provides support for various test frameworks like bats, users, and so on. So today I'll focus more on the driver with Docker. So we did try with LXC. But most of these have been written in taking Linux kernel 3 in consideration. So they recommend Linux kernel 3 and above to use. But we didn't have Linux kernel 3. We were using 2.6. So there were bugs in the code. So our setup worked with Docker. Though it recommends to be on Linux kernel 3, but it worked fine on 2.6 with no changes to the code. You can definitely, if you want to fix it, you can commit to the test kitchen code. And for config management, I'm going to focus on Chef, more specifically Chef zero. The reason why we picked up Chef zero was that we wanted to test exact production like setup on our local machines. So in a production like environment, we had to centralize Chef server managing various machines or Chef nodes. And it was not really scalable to set up a Chef server on your local machine or two. We didn't want to upload anything to the Chef server unless we test it. So we picked up Chef zero, which is a mocking Chef server. So each with each test, it will start up a Chef server service locally on that instance. And then test all parameters. So things like various queries, like suppose you are setting up a database and MongoDB cluster, you are querying your database nodes based on the Chef environment level queries. So those things are really important when you have a Chef server. So since we wanted a very faster feedback, we use Chef zero, which is like a mock Chef server. So now I'll show you a demo on my machine. So I forgot to mention it also has a plugin for Vagrant. So if you're using Mac and you want to use Vagrant, you can use the Kitchen Vagrant plugin, which will make use of Vagrant and spin up your instances. But we wanted to use Lexi containers for faster deployment and faster environment on demand. So we use Docker and as Docker is based out of Lexi and you need to have a Linux machine. So this is a Vagrant box in my machine and inside that I'm going to show you the demo. So any Mac users spin up a Vagrant machine and you can continue your work. So this is, so Test Kitchen is a Ruby gem. You can install it using gem install and it needs Docker to be installed. And then the Docker plugin for Test Kitchen. So this is the project directory that typically for my test. So it has cookbooks, it has environments, it has rules and it has tests. So now Test Kitchen works out of a template, which is a YAML file. So this is how this YAML file looks like. The driver section is the, you can say the provisioner option that you will have to mention, whether it is Lexi or Vagrant or EC2 or there are a couple of other public providers as well, OpenStack. So the next option is the provisioner. So in my case, as I said, it is Chef0. So this is the provisioner part that you'll have to mention. And then you define roles and platforms for your instances, which are in the back end. These are like Lexi containers which are being spin up. So for example, the platform I have chosen is CentOS. You can choose Ubuntu, Red Hat. And then the suits. The suits are actually the roles played by those instances. So in my case, web server. And then you specify the run list as we specify in Chef. Like what are the actual scripts that are going to run, right? I'm going to install Apache web server into it. And the roles path will be kitchen roles and the database path, etc. So kitchen list is the command that will show you what have you configured here, like the status. So in place of Docker, if you're using Lexi, it will show you Lexi. If you're using Vagrant, it will show you Vagrant. And likewise for provisioner. So if you can see the web server is right now converged. So I'm going to destroy this. So if you see in the back end it is running Docker. So there are no containers running here, right? So now I'm creating the web server here. So basically what Test Kitchen does is it is using the Docker file, its own Docker file. And it is creating the machine. And then it is running couple of standard tasks that we need to configure. Like SSH generates a key and creates a user. And then it creates a folder. Its own home folder. And it also creates a temporary folder where it will load all the scripts from the current working directory. So once the machine is created, and if you do kitchen list, you see that the status is created. Now the machine is ready. You need to run your scripts onto it. So we say kitchen converge, right? So what it will do? First it will copy all the cookbooks, data bags, and all the required scripts onto that machine into a temporary folder. Which is usually where temp kitchen. And then it will start up a, so it says preparing environments, preparing roles, and preparing cookbooks for, and it is transferring those files. And then it will start up a mock Chef server in port 8899. And it will create a client.rb and a server.rb. So server.rb is actually, it is listening to itself. And the path is for all cookbooks and indexes are given in that client.rb. So, and then it will run Chef Client. So Chef Client will run with respect to the Chef server which has been started up. So it runs exactly similar to how a production Chef environment will look like. So you don't do any changes to your code while testing it. You test the exact same code that you're going to run on your production machine. And that's the entire idea of it. Same code should be tested and deployed from the dev box to your staging, QA, and your production. So, yeah. So as it said, it will start a Chef zero server. And then it will run Chef Client. Yeah. So a couple of things that we added into this. So once we converge, it's always not necessary to write tests for your Chef code. If you have written, that's very good. But if you have not written any tests and if you're confident about your scripts and then you can run these converge, and after your converge is complete, you can be confident enough that the changes that you made, for example, as I was talking about Apache or web server port changes or adding a new metrics, those can be tested. And so something that we used in... So what was happening? Our RPM server was located in some data center and it was taking a really, really long time to download those packages from our RPM server. So what we used? We used a local HTTP proxy within the same vagrant machine. And then, so as you can see, the Chef run is complete. And I want to show you that time. How much time did it take? One minute. And it can actually be enhanced. It is working out of the Wi-Fi which we have got in this auditorium. But if you have a local mirror, it can be run within seconds. And you can use a local proxy or a caching server and you can point your machines to that proxy server so that each time you run it, it takes less downloads time. We actually had our tests getting over in 60 to 80 seconds for a much bigger set of applications to be installed. So I wrote a small test just to show the test kitchen's test feature. So let me just show you. I installed Apache. I'm going to just test if Apache is running and where its home directory is. So for that, so previously the status was created. Now it is, sorry, font. Is it visible now? Let me highlight it. Visible? So previously, after first step, we saw the status of the machine is created. And now it is converged, which means Chef Client Run has been successfully gone in that machine. If it had failed, it would have shown unconverged. And you can also run test kitchen in a debug mode. I'm not running it in debug mode because it shows a lot of output and which will not be very good with this display. So when you say kitchen, so I ran kitchen test before. So all these steps can be done in one shot using kitchen test. But what kitchen test does, it will clear the existing instances, run test, and then it will destroy the existing instances. So if you are not really worried about the instances where you are running your applications, then you'll have to show that. So yeah, let me destroy it once. So another thing which we also use here is that a Docker repository. So we built our own Docker images. And then we created a local Docker repository. And then we checked in our images to that. So that really helped us because the unnecessary time it used to take to install Chef and SSH and creation of folders and all. So you don't do it every time if you're using the image. And it would be a lot more faster if you're hosting it on your same network. So you don't need to go to internet every time. So you have to do one time pull of that repository from internet and then keep committing your images to that. And so the reason why we did that was that we had to automate this setup inside a data center which didn't have internet access. And so we had to host that repository in that data center itself. So yeah. So I'm doing a converge again because the image had gone up since I did a kitchen test. So another thing I wanted to show. So if I run Docker PS, this is the image ID that it is showing 1385. And if I go here, so you can actually see that the same image. So yeah. So there are a couple of things that it provides. You can it's not by default it looks up at the directory from where you will be running kitchen commands from but you can override the various script paths that it has given. So like you don't want to keep your cookbooks on the current working directory but you want to keep it somewhere else. You can actually use those overrides. So yeah. So once you run your test, it will show you verified. And then you can check in your code and then it can move on. So just one more thing. Yeah. So by default the test that it looks up is under test integration or acceptance. And then the name of your instance that you specify in your kitchen ML file. Right. So there's something wrong with that. But yeah. So after this demo, this is what we achieve and what we the problems that we spoke about while we started right. Most stable and tested build in your Chef server and production like environment on your machine. And you can have as many Docker instances as you can. It can range from two, three, two hundred Docker instances. So consistency across all environments. You just a parameterized IP addresses or names and everything else is same. Automated and scalable environment on demand. Really tested configuration management production live from three weeks to one week to multi deployment per week. And then overall testing time a faster feedback for developers and QS as well. So questions. So can kitchen be integrated with any of the CI tools? Yes. These are just commands. You can script it out and put it in your any CI tools. As long as your CI tools are like this, this is a gem, right? As long as your CI tool has, you know, run any even shell script is good enough. Okay. In fact, we did that the pipeline that I that we showed that is taken from a red hat VM where test kitchen is running. And we check in our code, you know, the pipeline that I had shown this one. It's actually taken from a virtual machine where discussion has been configured and the test run and the acceptance test pipe like the infrastructure script. It goes to acceptance test pipeline and then it goes to actually the it is promoted to the environments and those machines. Yeah. Yeah. So I have two questions. One, why are you using Chef so Chef zero because Chef zero is a memory based and Chef solo is the one which does the server in the client based. So when you run on a Docker, the Docker takes the container with some memory. So again, your Chef zero, you're making the memory taken by the Chef zero as well. So it's it's something like if you say the 200 dockers to run, it takes lots of memory on your PC. Just to correct you, Chef solo is not a server in a client. It's like a noodle setup. And for Chef solo, you'll have to write specific solo related scripts wherein when you host things in production, you will be having a Chef server. Right. So why to have a solo code for your testing and a server code for your production. No, I think I think Chef solo does the same as what you have written for the. No, the reason I think so the reason I think you're asking this question is vagrants. Chef solo works equally with a server code and a solo code. That's something vagrant has done some internal wiring. But if you look at a pure Chef solo code that you have written, you have to write solo dot RBN. It's a different setup from a server and notes. So that's the reason we use Chef server because we don't want any changes to the code that we will be deploying into production. So what's the reason why you're using Docker for that? Because because you're using Chef zero takes the memory and then Docker again takes the memory. So it's a load. It's it's just faster testing. Right. So it is just a process which will be spent in that different different environments you have. Right. Let's take the same test case have to be done in Fedora Ubuntu and then CentOS and also when you do a multi flavor. Then you have a test Chef server and you don't use Chef zero. This is for the developer machines. And in some cases test machines where you don't have that much scale. No, but the kitchen. I mean, but the chef cookbook, you write it in a way that multi platform based, right? So the test case have to be run as per the multi. Yeah, so it's just the provision that you're changing. You're not changing anything in the code. It's just the option for in the kitchen ML file that you're changing. You don't change anything in the code. Just change the options. Yeah. The options as per your setup. Cool. Can you show like an example of like what a typical test looks like? Sorry, kitchen test. Yeah. Like say like if you're asserting that say a server is running on port 80 what what would a simple. Okay. Like how does that look? If it's going to take. Well, it might take some time. I can show you. Wait, I'll have a video somewhere. Just give me a moment. Could you take it offline? So when you run a test, suppose I'm using bats script for testing, it will set up your required test environment and then it will show you. Running test suit one test zero failure. Finish verifying test server. That's how it looks like. So if you have written tests for your chef or puppet code that gets run from that. So you should have it in the current working delivery. So there is a folder path, but you can override it from the kitchen. Okay. The actual code you are asking about. So it's, it's, they don't follow any particular convention for that. It's the simple test that you write for any code. So they will, they will install the dependency, whatever is required for your test frameworks onto your machine. That's all it does. So just specify. So it will, it will see the file format what you're using and then it will install the corresponding gems or whatever it needs. That's how it's. Sorry, we're out of time. Yeah. Thank you. Thank you.