 Hi, I am Aditya, I work as Cloud and Infra engineer at Browstack.com and I am a contributor to Fedora admin team. Today, we are going to talk about building orchestrations using Ansible. In particular, I am going to cover a few topics, the challenges which we face as system administrator, as a DevOps person. We are going to see what are these challenges, how do we try to handle these challenges and why the traditional methodologies like scripts, bash or Python or Ruby fail to scale over a period of time. And we are going to see some new stuff that has come in a couple of years ago, I suppose, stuff like Puppet or Chef. We are going to talk about them a bit. And then I am going to move on to Ansible and why Ansible is better than these two guys or other guys out there. I am going to talk a bit about how do you use Ansible to run ad hoc stuff or scripted stuff. I am going to show you a demo. It depends on the internet. I really hope it works. And then we are going to talk about some good stuff about Ansible, like why decentralization is important and why and the areas where Ansible actually fails. So it is not a perfect tool, but it is good enough for most of the cases. So what are the pitfalls at Ansible and how as a new person who is coming into Ansible area, how the person can start from the scratch or from existing tools like Puppet or something. Right? Okay. Right. So that is usually me, the guy in the wheel. As DevOps, you have new servers coming in every day. You have new applications getting deployed, new features built in, right? Updates. And you have to act very quickly on those things. You have to provision stuff really fast. You have to give your developers or your users as quick response as possible. And with, you know, nowadays we have hybrid infrastructure. Half of the infrastructure is in data center, co-located data centers. Some of it is running out of OpenStack or EC2. The way to handle these things is a bit different. The way you handle dynamic nodes of EC2 or OpenStack is somewhat different than the way you handle stuff within static data centers, right? So that's one of the major problems you face. You have to worry about your initial configuration. Say I gave you an, I asked to, you know, get an engine server. You have to make sure that your installation goes smoothly. Your configuration goes too smoothly. If you're running maybe a Rails application, your passenger or unicorn coupling with IngenX goes smoothly. A lot of things, entire user management, your application management. And then tomorrow if I decide to, you know, spawn another IngenX or another Rails or whatever, then that should reflect, that should be exactly same to my existing machines. I do not want half of my machines to run, you know, IngenX 1.0 and half of my machines to run IngenX 1.4. That can lead to nightmares, seriously nightmares. So that's what, and this entire thing, it keeps on happening. So it essentially becomes a loop. You'll keep on doing this over and over again. If you do not have a good tool, this can be like, you know, your 24, 7, 365 job, which is, again, not that good. So solutions. When we think that, okay, I'll do that. Now solutions come to your mind. So what are the solutions that come to your mind? I think the most natural thing as a DevOps that come to my mind is, okay, I'll write a script. I'll just write a small script, maybe 10 lines, and then I'll be done forever. Well, that never works out. That 10 lines eventually grows. It keeps on growing, growing, growing. And then I realize, okay, that was a mistake. Maybe I should, you know, have used Puppet or Chef or something like that, right? But then the next thing that will come in my head is Puppet. Would it scale? Would it be able to handle all of my infrastructure? What if it fails? What if Puppet Master goes down? What will you do then? What will happen? I don't know how Chef works. So honestly, I won't comment on that. But what if my Puppet Master goes down? So how many of you have used Puppet? A lot of you guys, right? The initial signature that happens? Can anybody recall how that happens? The initial signature when you just boot a fresh machine, right? You have to verify with the Puppet Master about the signature. So what will happen if your, say, IP changes? Or maybe you bought a new machine, but now you want to make it identical, but signature is not matching. You have to do manual stuff, right? Which is, again, a management nightmare. We want to keep the human out of picture. We want machines to do everything, right? So it will, again, become a nightmare about scaling and about ease of use and all those things. So specifically talking about scripts, it looks dirty. How many of you have worked with Pearl? Oh, great. Okay. So can you verify that first fact for me? If you guys have used Pearl a lot, I mean, I'd be really happy to, you know, get some stats on that. So people usually start with scripts saying that, okay, that's maybe the 50 lines I'm going to write, or 100. And then it eventually keeps on increasing, which, you know, kill you in good amount of time. Code repetition happens a lot when you start writing scripts. So this happened to me a lot. I started writing some Python motor script for AWS. I kept on writing. And then after writing around like 10 to 15 scripts, I realized that almost all the scripts which I have written have those few lines of code which will do auth, which was really, I think, stupid on my part. I should have maybe created a library and just did import auth library, something like that. But while writing those first initial few scripts, you will never realize that, you know, that code repetition should have been avoided. So you will think, okay, that's only two scripts which I have to do ever, three, four. And then before you know it just becomes a huge mammoth, right? So that thing should have been avoided. Then you have to remember order of execution. The reason I'm saying this is if you guys have, again, worked with Rails or something like that, you need to make sure that you install stuff in the right order. So you have to make sure that you first get Nginx there, then you get Unicorn or Passenger there, then you get your application there. And in between somewhere after your installation of Nginx before installation of Unicorn, you have to get your database there. If you are using multiple database like v8 browser stack do, both Mongo and MySQL, then you make sure that both of them are there before you actually start the application. And if this order gets mixed up, you will need those sleeping bells, trust me, because nothing will start. It'll give you funny errors and then you'll keep on debugging. And lastly, writing scripts. If you don't write documentation, you are in for a big surprise after six months because most probably you will not be able to read your own code. That happens with me a lot. Right. So now Puppet is the de facto answer for most of us, Puppet or Chef, is the de facto answer for the shortcomings of scripts which we have used in so many years. But when we talk about Puppet or Chef, it needs an agent to run on client, Puppet client on the client machine. And that agent has to verify with the master first before doing anything. Another shortcoming, I'd say, is that Puppet is based on Ruby and Ruby is not shipped with Linux by default. So you have to install Ruby before you install Puppet and that step cannot go in Puppet, of course, because your Puppet agents are not there. Both Puppet and Chef have their own DSL. So it becomes a bit difficult to do parsing and all. The DSL can only be understood by the Puppet or Chef on their own. You cannot write an external parser probably to understand that kind of stuff. Again, my experience with Puppet has been a bit rusty. If you guys have any criticism on this one, I'm all ears. Anything till now? Yes, again, that doesn't help with the installation of Ruby. Agreed that you can do no less Puppet and you can save maybe the first point. Signature problems with AMI. You have to verify with Puppet master. So you will make an agent less Puppet and create an AMI and then you also have to, as soon as your AMI boots, you have to take a Git pull to maybe get your entire configuration. You have to do all those steps beforehand. Before your server will become usable. Anything else? Cool. So now then, with keeping all these shortcomings in mind, let me introduce you to Ansible. Ansible is based on YAML. So YAML is easily parsable. It's a serialized language. It can be parsed by any third-party library you can think of. Most of the languages have a YAML parser. It uses open SSH as transport. You do not have to worry about that key signatures and all those things because as long as you can SSH to the server, that's it. That is all is required to run Ansible. Parallel-ordered execution. One crib, I know it's my personal crib with Puppet. Most of you guys may not agree with it. So my crib with Puppet is that its order is not serial. It can run anything after anything before anything. Now you will say that use notify, use before. Why? Somebody trying to say something. Puppet 3 salts it to a certain extent. Maybe the newer one and also to a certain extent. That doesn't really convince me. I want it to be solved for all the extents. Maybe they have also realized that it should be serial and eventually it will reach there. But till it reaches there, I may as well move on to something else which already has solved my problems. And again, the best of it is that there is no requirement of any agent on my nodes. My nodes, as soon as they are booted up, they are good to go. I don't need to install in something like Ansible agent or thing like that. That helps me a lot. Just so that we go ahead, I want to tell you about the scale at which we are currently using Ansible. I am using Ansible at these two places, both at Fedora project as well as at browserstack.com. At Fedora project, we are currently using it on 300 plus Linux servers. Most of them running Fedora or REL. And there are 100 plus stacks. When I am saying server, I am including virtual machines. Not necessarily, they are all physical servers. So that's one thing. Browserstack.com has 300 plus next servers right now. Not necessarily all of them are Linux, but it's a mixture. And we have around 20 plus stacks there. We are able to peacefully manage our infrastructure using Ansible. So at this scale, I can guarantee you that Ansible works. I haven't tried it beyond 300, so I'm not really sure. But I think it should work. We haven't seen any issues. That said, we have only run Ansible on all the servers simultaneously on a very few occasions. Most of it is like a patchy stack, so I'll run everything on a patchy stack. Which would be like 15, 20 servers at a time. We did run on all the servers during Heartbleed because we had to update OpenSSL on all the servers. And that went pretty smoothly, not a lot of issues. Okay, that's the scale I'm talking about. Okay, how do you install Ansible? Again, on your Ansible master node, which could be your laptop. You do not need to have a static server for Ansible. I run Ansible from my laptop. Anybody who is capable of taking a Git pull and running either of these commands, they can run Ansible on their machines without any problem and do production-grade stuff. You do not need to maintain a separate puppet server. So that's yet another advantage, which I would say. This I have already mentioned. If you can SSH, you can run Ansible. As long as this bare minimum requirement is met, you can run Ansible. You don't need to manage separate ACLs. Yet another problem with maybe scripts and maybe your configuration management tools is that you need to have a certain set of ACLs. Otherwise, things can go haywire. For example, the client which we have for puppet, it runs as root. As soon as something malicious, something bad has gotten into your manifest or something has entered your repo, your manifest repo, it will be executed on all the production servers and that too as root. So that's a very, very major security problem. With Ansible, you don't need to worry about it because Ansible runs as you, your user by which you default login. If you have the privileges to execute that stuff, then only Ansible will be able to execute that stuff. For example, if maybe as user Aditya, I cannot create more users. So if I run Ansible as user Aditya, it won't be able to create more users on my remote machine. No matter how many malicious or bad code I have written in my Ansible repo, it will not be executed until and unless the right guy is executing it. Who has the privileges to do that on your remote machines? Cool? Any questions so far? Okay, somebody has a question? How do you set up your initial SSH access for multiple users? Right. Especially when you're doing it at a much higher scale, for a couple of machines, it's easy to do manually, but like you said, you have 300 plus servers. How do you manage that? Give each user their SSH key, then they should be able to login with Ansible using those SSH key for their users. Right. So let me see. How we do it is, for example, in BrowserStack, we bought a machine from EC2. That EC2 MI has one key. A great. That one key has pseudo privileges. So I will run Ansible by that user to create all the other users and put the keys there. That solves my problem. I didn't install any agent. I had to do almost nothing. Just Ansible run and that solved everything. All my user management is done. My package management is done. My application is there. It got started and probably if I can, I think if I can try, I'll put that stuff in Route 53 as well. So I'm actually basically good to go as long as I have booted a machine and run Ansible on that once. That is it. This is tied up to AWS. Is there any generic solution that people can use? I'm not able to see you, but okay. With respect to AWS, so we do not use generic Ansible modules. Ansible has recently come up with the AWS library which specifically deals with AWS in a very, very good way. I personally haven't used it, but I am assuming that it will be way better than what we are doing. So this stuff has been written very generically and we are doing a lot of generic stuff. Ansible, there is a specific library to deal with EC2 guys. I'm assuming that should be very nice. I had a question. How do you push updates to a specific set of servers? How do you discover inventory? I'll take up that as a demo. I'll show you how to do that. Okay. So you dynamically discover. Dynamic discovery is something which is, I think which would be a part of EC2 module. Say for a specific application type. I think it would be a part of EC2 module. What right now we do is that we use tagging and we discover it on the basis of name tags. But I think as that Ansible module of EC2 comes out, that will be inbuilt in it. So if the instances are retired, they need to be updated with tags? Right. But when you boot your machine, I think that usually is a part of automation. At least that is the case for us. When we boot up a machine, that entire thing, tagging, selecting instance type, selecting region, zone, everything is a part of that booting up phase. So that usually... So when you are asking a command to be executed? Yeah. To start a machine you have to. Ansible cannot start a machine. How do you push updates when you don't know what are the servers beforehand? So that's what I'm telling you. If you do not know the servers beforehand and if it's an environment like EC2, usually you don't know servers beforehand in an environment like EC2. For a colo environment, you'll know it beforehand. If it is an environment like EC2, what you can do there is you can probably use name-based tagging, EC2 tags, one thing. Another thing is that in inventory, you can use rejects. So if all of your machines come up like web1.abc.com, web2.abc.com, then you can just use rejects. And that will also solve your problem. Beyond that, I think if your case is something very specific, then maybe we have to do something from here. For every command that you run, it's going through SSH. So for every command it will try to do SSH each and every time. Isn't that an overhead? Yes, there is. That is an overhead. But what happens is that there is something known as Accelerate Mode. Now what happens in Accelerate Mode is that the connection is not established again and again. So when Ansible starts there on your remote machine, it opens up another service on some port. I'm not able to recall, but it will open a service which will take further calls. So that service will keep on working between you two guys. And there will be no connection re-establishment and tear down for 30 minutes, which is configurable. By default for 30 minutes, you don't need to re-establish the connection. That entire communication is AES encrypted. So you don't need to worry about security as well. Accelerate Mode is not on by default. You can turn it on because not everybody likes to launch another demon. At Fedora, we have Accelerate Mode on most of the stuff. Very rarely we turn it off. Hi. Does it work only with SSH or anything else like Marsh or something? So as I said, you can do your config management from your laptop. Think of it as you're in a Wi-Fi network. You start doing your configuration. So maybe if it switches an access point. So Marsh is basically where it allows you to do the roaming as well. So does it work on Marsh or does it work on only SSH? The remote server is running open SSH. The remote server needs to run open SSH. If it has a Marsh, it doesn't matter. You are talking about a scenario where remote servers does not have open SSH? Yeah. I'm not entirely sure. Never tried that. I never saw a server without open SSH. I don't know. I'm not really sure. As long as it follows standard SSH handshaking and everything, it should work. Mostly SSH is just a transport protocol. So if that transport is able to connect you to the remote server, the rest is in Python, which is bundled by default with all the Linux server. One question. So Python also should be installed, right? I think it's 2.4 at least. No, no. Is that also a prerequisite that need to be there? All the Linux servers are having Python by default. If a server doesn't have Python, I think Yaman app will stop, which anyways is disabling everything. Is there a way for Ansible to know? Suppose you said you have 300 servers. I assume you divided into clusters, say 100 of this type, 100 of that type. If a machine goes down, is Ansible no? No. It's not monitoring that thing. So next time, suppose you want to run a command on one command on your cluster of type A. If one of the machines has gone down, Ansible will go and... It will do its stuff on the rest of the machines. That one machine, it will fail. It didn't block or anything, no problems there? It will just fail. The rest of the servers will have whatever you wanted to do. So you have to use some other method to tell Ansible that this machine is now gone. Yes. I'm assuming that your monitoring would be smart enough to know that a machine has gone down. Once your monitoring detects a machine has gone down, maybe you can pull that particular machine out of your inventory. And once you have pulled it out of your inventory, Ansible will not execute on that. I'll just show you how to do inventory file. Target host, it runs a separate thread. So even if one target node is down, it's not going to block anything else. It will just fail for that particular machine. Restore will go over. Anything else? Okay, cool. Okay, so now it's demo time. Let's see. So for the demo, my target is to get you to this screen. Okay, this is the default WordPress screen. Create a configuration file. I'm going to boot a machine in front of you. And I'm going to, while I start Ansible, I'm going to walk you through the entire Ansible playbook which I have. Consider playbook as, you know, analogous of modules or manifests which you have. Ansible is capable of running both scripted stuff, which is Ansible playbook, and it is capable of running ad hoc commands on demand. Okay, so say if you want to do, you know, yum update, open SSL. You don't need to write a playbook for that. You can just do it via command line on all of your machines. What I have planned is that I will go till second last step using a playbook. And last step I'm going to do it via ad hoc so that you get a taste of both of them. Okay, right? Sounds good. Okay, so we want to reach here from ground zero. Okay, let's minimize this. Okay, next thing I'm going to boot a new machine. So, okay, nothing happened. Oh, no, it's doing that. I'll take the smallest possible machine and I don't know if you're saying I must write a hostname. Okay, so while this is going on, let me show you the... Okay, is it visible till the very end? Can you guys at the end see what's written there? Okay, okay. This is what Yamel looks like. I hope it's readable, right? So what I'm trying to do here is I'm trying to run all of the stuff on demo hosts, which we will define once we get the machine up. And we are going to build a WordPress stack. What does a WordPress stack requires? Can anybody help me with that? If you want to get a WordPress up and running, what do you need? Apache, PHP, MySQL, right? These are the three things which we need, Apache, PHP, and MySQL. So we are going to get all of these and we'll see. Okay, okay. Can someone read these few lines till PHP, MySQL, and D? Can someone who doesn't know Ansible explain what it's trying to do? Anyone? Because this guy, Ansible runs Python, right? So I need Python to interact with MySQL. I'm going to build my databases from absolute zero. Okay, so yeah, anybody who's willing to explain what I'm trying to do there? It's trying to install Apache, PHP, and other dependencies. It's using the YAM package manager. The state that it needs in the end is installed. It needs to install stdpd, PHP, MySQL server, and MySQL Python package along with PHP, MySQL. Right, that's absolutely right. Did you know Ansible before? No. But you were able to understand what I'm trying to tell you, right? Yeah. So that's what I'm trying to say, that using a good language, using a good DSL actually helps you reduce documentation a lot. Right, so that even a first-timer new guy who hasn't worked with it will be able to understand what's going on. Okay, we have the machine up, and that is my IP. Okay, so yeah, I'm supposed to show you an inventory file as well. Okay, so this is the host file. I'm going to define a host here. So what I'm doing is it follows any format. If you guys have ever seen a PHP file format, PHP configuration, you basically you can create clusters which are enclosed in a square bracket and you can provide the machines underneath. This can take regex. This can take domain names, not necessarily IPs. You can put anything there. You can put multiple servers in one cluster. You can put one server in multiple clusters. Both works, right? So if you have a machine which, which acts as, you know, my SQL, which acts as maybe our readers or Mongo or everything, you can define separate clusters and put a single IP on all those clusters and manage them separately. Okay, so that all is supported there. So I'm just going to save this. Okay, right. So I'm going to run this answerable playbook, the one which is written here. While that playbook would be running, I'm going to explain you how, how this all will work. So keep a, keep an eye on the upper tab, upper smaller area, because I'm actually, yep, it's working. Okay, the internet is a bit flaky, otherwise I would have shown you both of them separately, but it's really slow on the internet here. So, okay, not good internet. Yep, okay, just flaky internet. Okay, so it's just going to execute those steps, which we are going to talk about here on the tab down here. Okay, I need more volunteers. Can somebody explain what this section will do, the next section? Anybody? Yep. With the creating user and granting him privilege. No, no, okay, I'm talking about the section above. That one, start the HTTP. The first. Starting HTTP domain and MySQL domain. Right. And then, then granting privilege to a root user from, from your local house. Right, right. So, again, so, yeah, the, the section there, do we have a pointer or something here? No issues. So the section, the top most section is going to start the HTTP and MySQL services. The section, this, this section is going to start the HTTP and MySQL service. The section here is going to update root password for local house root account. So when you get a default MySQL install on most of the machines, it's set with no password, right? There's no root password and of course, that's a security hazard. So we're going to set a root password. This password will be picked from this variable. MySQL root password. This is how variables work in Ansible. In YAML actually. So I just defined that variable here. You can pull out variables and put it in a separate file. Just for the sake of demo, I have put it here, but you can split this thing into various files. You can put the variable somewhere else, template somewhere else. You can put, you know, pull out the entire thing. Maybe put YAM dependency somewhere else and services somewhere else. Actually, it's a good practice to do that. It becomes more flexible. But for demo, let's do this. Okay. Can someone explain me the section right here? Meanwhile, you guys are seeing what's happening there, right? Just finished. Can someone explain me the section which is here? So you're copying the MyConfig file with the root password credentials to the source of slash home of that user. And then you're giving a permission, right? Right, right. So you're taking a template and then you're making it applicable to the slash root dot MyConfig. Right, right. Correct. I am using template. In template, there is a variable defined just like this. And the value of the variable is picked from the above. So I want to keep my passwords in, you know, as few places as possible. So I will ideally define a variable file where I'll keep my password and maybe keep that file out of reach of anyone else. And then I will make Ansible use templates and variables and, you know, apply those to wherever is required. So one question, I mean, like, if that is the case, then why, I mean, like, in Chef, you create a template of that particular thing and you fix a variable and then you write it in the cookbook. And then you, when you deploy it, it does it automatically, right? So what, I mean, what's the reason you using you hiding it in this folder? I mean, in the user folder, and then you're sending it to the root folder of that. Okay, this, this location is just a bit. I could have used any other location. Sorry. Nice music, man. Sorry, I didn't get it. I'm saying, are you talking? Are you talking that why I'm putting it here? Yeah, that's right. That's a random location. I just felt like I'm not the not with that. I mean, like the dot of my dot again, again, random. You could have done ABC Ansible rocks and it would have just worked fine. It's it's mostly for clarity. It has nothing to do with how it works. Generally, Chef cop, I mean, keeps these configuration files as a template, right? Right. So when you deploy that, this, this configuration file of what the template is there will will get deployed like a configuration file on the destination server. So, so, so does Ansible goes in that fashion that, you know, with. Ansible is, yeah, so Ansible is flexible in that sense. It can go that way. You can have a dedicated directory where you keep all your templates. You can, you know, put a dot template extension or whatever you want, or just like this. I didn't put any extension because I don't care. It's it's flexible to that extent that as long as this path exists and this file exists, it will work. Okay. So that's not a problem. It really depends on convention by default. What you're saying is actually good convention. The template should be at one centralized place. Yeah. Mostly with their modules. Okay. Mostly with their playbooks to be very, you know, sound Ansible. Okay. You have to you should technically put them with their modules. Okay. Thanks. This is usually actually the default location. So I've used this. I thought it will be more clear, but I was clearly wrong. Yep. In case I want to do a partial update to any of the configuration files, like only a particular username or password. I want to use the default configurations, but just update a few parts of it. So how's it possible over here? It's, I think it's very similar to how you do it in a puppet or chef or something. You just pull out the file, make whatever changes you want to do as put them as variables as templates or whatever. And then only if the changes are to be done, then only the changes will be propagated. If the file is same as the server file, it will not go there. In puppet, there is a tool that that's called orgies, which is used to analyze certain configuration files and change them. And there are multiple what they call as lens that are available for different configuration files. For example, there could be a lens for my.cnf, which understands the MySQL configuration file. And you can take it from a template, use this lens to then modify it based on what your requirements. Similarly, there are lens for other commonly used configuration files, maybe smb.conf and various others. That's a very useful tool in puppet because then you don't have to write your own logic to insert your configuration data in these templates. Are you familiar with any such tool in Ansible or do you think how would you do that in Ansible without writing a lot of code yourself? First question, no, I'm not actually familiar. Actually, I wasn't familiar about the lens itself, but that's a good tool, I suppose. At the moment, no, I don't think there is such tool for Ansible to generate configuration. How we have done it in past is that we have had templates and we just push those templates and those templates are only pushed if there are changes. So just like Puppet or Chef, it analyzes the diff and if the diff is there, it pushes the configuration. About generating configuration, I am not sure. I don't think there is a related tool specific to Ansible to do that. But clearly, I might be wrong. There are so many of those things out these days. Not that I'm aware of. Really sorry. Anyone else? Should we move to the next section? This list doesn't execute. Actually, it doesn't execute. I don't know. Can someone explain me this one? I think I have overdone it. So this is what was being done. We installed everything. We downloaded WordPress, Apache user, owned the directory. I'm just going to show you one fast, quick command, one ad hoc command, which I promise, and then we'll wrap up. So this is how you run ad hoc commands. I have to allow basically port 80 for anything to work there. So I'm going to just allow port 80 using Ansible ad hoc command. I'm going to do it for all the machines that are part of demo cluster and I'm going to run this command. While this is running, this is all which we discussed. Just some pitfalls of Ansible. It's new. So as someone pointed out the lenses and all, there are not many auxiliary tools available around Ansible at the moment, but I'm sure we'll catch up. There's a smaller community as compared to Puppet and all, so it takes time. Windows is not supported. I think Puppet and Chef have started supporting Windows, so that's one drawback. If you have Windows in your infrastructure, then there's a problem. VMware ESXi is a problem because nothing works there. No Puppet, no Ansible, no Chef. If you want to start, just install Ansible, start with ad hoc commands and check out the tutorial at getting started. It's a very basic tutorial to help you a lot. Okay, the command worked. Success. Right. What I'm going to do is, again, I've lost my mouse. Just going to copy this IP and right. So we have gotten there. Cool. Okay. I'm already running. I think I've eaten up five minutes of somebody else. So if you guys have any questions, any doubts or anything, I'll be available downstairs. You can talk to me.