 Hello guys, let me introduce Fabio Alessandro Locati and he will be giving a talk about automating your infrastructure with Ansible. Hello, so I'm Fabio Locati and I work in Red Hat as a consultant and I mainly do consultancy on Ansible. So hopefully my clicker does work. Yes, so let's start with a very quick introduction. So I started to work in IT in 2004 that is 13 years ago now and since 2013 I started to use Ansible. Why I'm saying this? Because what we have seen in the last 15 years is that actually the IT needs have changed a lot and I think that this is critical to understand why we are using Ansible and why we want to use Ansible and maybe not other options. So surely we have reasons for having a configuration management system. So the first one is auditability. If you have a configuration management system obviously you can actually look at any given time what is your configuration on the server. This is obviously assuming the fact that you have not manually changed the configuration on the specific server. But if you have not done that actually you have the whole recipe to create that server and therefore you can say that service work like that because this is how we built it. Then we have another problem that is job-hoping. So people tend to change job more quickly. 20 years ago people usually tend to stay longer on the same job. Now it's more rare. So you have a big problem of lost knowledge when someone leaves the team. So if you have something like a configuration management system you actually have the full documentation of your infrastructure as code. This also has other advantages as we will see. Another advantage is speed. 10 years ago it was pretty common to actually design a new infrastructure, a new service and then buy new boxes from some vendor, Dell or HP or whatever and then wait a couple of weeks that they can be delivered and then cable them. So another weeks and then install all the software and then do all the other stuff. So it was like in one month process maybe more. And then it arrived VMs which is far quicker. You want a new machine, you just provision it. And then we are now on the cloud age where we have cloud loads that change the number of machines they are running on based on the load and other factors. So they actually change the number of nodes that you are managing in a matter of minutes or hours. So here we arrived to the scalability problem. How can you actually scale to that level? How can you configure 100 machines in 10 minutes? Because if you have a very small website that at a certain point get referenced in some big newspaper or paper, sorry, news website, obviously you want to scale very quickly to meet the demand. So you have a scalability problem. And also in the cloud you scale horizontally. So before if you wanted to scale you just thought you know more RAM or CPU and that was it. Today you add more boxes so you have more boxes actually to manage. So this one also became a big problem that is also known as you know the number of hosts that a manager can manage which is a few years back when you had like five mainframes and your whole business were running on that. You were actually managing five hosts. Today you are managing thousand of hosts but you cannot have the same ratio of technical people to hosts. And today there is a quality of services of service which is expected. 10 years ago, 15 years ago, well it was 10 years ago when I was 18 actually I had my first credit card and my credit card company did not allow me to do any transactions from midnight to six o'clock in the morning because you know the system were doing stuff. And today that kind of you know the downtime is not possible because business would not allow it. So automation is clearly the answer. So automation has infrastructure as code or data as we will see. This means that it's very easy to read for humans and to read for machines. And also the big advantage of this is that the code itself is the documentation of the infrastructure. We always had documentation for the infrastructure we did obviously because you want to have documentation of that. The big problem is that very often those documentation were like five, ten years out of date because no one bothered to actually update it when they changed the infrastructure. So that was not very useful. If you are using the part of infrastructure as code, since your infrastructure is the result of your code running, that means that that code is exactly what your infrastructure is. This obviously simplifies the auditability problem a lot. And in theory you could achieve a point where you have no users, you know, no real people actually accessing those machines as root. And this would grant you, you know, that no one has connected to the machine and effectively changed the configuration in it. And also this grants you that everything that happens on those machines is actually logged. Because we have a big problem that if you have a single user on a system that is root, he will actually be able to evade any kind of logging or, you know, it will be very hard to actually have some kind of logging that he cannot evade. But if we don't have any root users, well, that problem goes away. And obviously it's pretty easy to scale out. You just reapply the same code to every new machine you create and, you know, you have more machines that are equals to the one you already have in the clusters. So there are a few concepts that I want to go through before going forward. The first one is the idea of agent. Mainly we can divide the configuration management system in two big families, the with agent and the without agent ones. So the agent pool, like puppet, for instance, have this demon that is running on every single box. And this demon simply checks in every end time and asks to the master, do you have anything new for me? And if yes, they apply it, if not, you know, they wait for the next check-in. The agent less systems usually have the master calling all the systems and saying, you know, you have those updates too. Now both approach have advantages and disadvantages. The big advantage of an agent is the high performance during the execution because all the execution is actually done locally. If you have like a DNF install or a YAM install, if in a case of agent full state, you actually have the agent asking to the master node what do I have to do? And the master node says, you know, you have to install Apache and locally the agent will install Apache. While obviously in the agent less model, we don't have this, but we have the master node that will contact the SSH usually the OD nodes and says you need to install Apache. And it's a there is an SSH connection that says alive for the whole length of the process. So obviously we have a speed advantage with agents. Second advantage is that the connection between all your hosts and your master hosts is client managed. So this is pretty standard, you know, HTTP was in the same way you have the server and then all clients connect to the server, which is, you know, normal, I would say in the industry. So it's a very well known concept. And if you go to, you know, a network administrator or a security administrator and say, I want to do this kind of thing, he knows exactly what you're talking about. But this model also has some disadvantages. The first one is that obviously, due to what we have just said, you need to put your master node in the least secure part of your network or you put holes in your firewalls, which is probably worse. But I mean, that node actually contains the whole configuration of your whole environment. And therefore you don't want it to be in your least secure part of the network, you would actually want it in the most secure part of the network. And also resources are used even if there is no updated to be done. So if you think about this, on average, you don't roll out one update every 30 minutes on your infrastructure. While Puppet, for instance, by default, checks in every 30 minutes. And this means that obviously, every 30 minutes, if you have 1000 hosts, you will have 1000 requests to the master host, you know, is there anything new or not? And the majority of times that answer will be no. So you're wasting resources there. And we have said that in the agentful case, we have the master in the least security secure natural segment. In the agentless case, we obviously have it in the most security, most secure segment, because the master will need to initiate the connections. So the connection that will be needed, you know, will be from the master to the saves. So the third big disadvantage I think that an agentful system has is that actually, if you have 1000 nodes, you have 1000 agents. If you have 1000 agents, that means that you have 1000 demons to take care of. That means that, you know, all those demons can crush. And therefore, your machine does not get any updates or that kind of stuff, because the demons is not up anymore. And also, we have a chicken and egg problem, because since someone had or somewhat, that demon has to land on that machine, or you install it manually, or you bake it into the image, or, you know, you need to do something weird. Actually, I have seen a code written by a consultant, an unsymbol code to deploy puppet agents, which is, I think, weird. But still, people do it, because if you want to have 1000 of puppet agents deployed, and you don't have an automation system, how can you do it? Surely not with puppet, but, for instance, with unsymbol, you can. So the other principle I want to talk with you about is edempotence. This is very important in automation. Now, you can do automation without edempotence, but it will be a mess. Now, the idea of edempotence is that in operation, after you have done it, there are some operations, sorry, that after you have done it one time, if you redo them a second time, they will actually have no effect. So, for instance, if we have a number, we multiply by one, it will have no effect, because that's how much work. And we can do here the same thing. So the idea is that we have some operations that can be actually edempotent. For instance, if you think about installing a software, as I install a software, that is not edempotent, but if I try to achieve a state, which is I want that software to be installed, that is actually edempotent, because the first time, you will need to install it. But from the second time, you will actually not do anything, because software is already there. So unsymbol push the idea of infrastructure as code a little bit further, because in fact, we speak about infrastructure as data more than as code, because does not really look like code. It's actually a YAML file. So one of the big advantages is that it's really easy to write. YAML code is like JSON, but easier because you don't have all the parentheses to match. It's even easier to read because it's not code. If we have like Puppet, where the majority of infrastructure is actually Ruby code, that is kind of not that easy if you don't know Ruby. If you know Ruby, everything is fine, but I mean, here we are speaking about a level of simplicity, where even a non-technical person like an external auditor can actually understand somehow what's happening there. And only the fact that we don't want to, you know, version everything. I have seen in the beginning of my career, someone that actually put the whole Etsy folder in SBN. I mean, that was a great idea, because you have everything version, but that was also a problem, because when you do an update, you know, you may have some changes there, but they are just trivial for you. And also the fact that Etsy is a pretty big folder, and like 99.9% of that stuff is completely useless for you. That makes that system not perfect. So this is an example of a YAML code, a working one. So this one is a playbook, an unseable playbook. So we declare which host we want to hit with this playbook. Become true means that we want to escalate to pseudo, sorry, to root as pseudo. Tasks are the actions we are going to perform. So the first one has a name. So that it makes even easier to read that ensures that my SQL is installed. So we use the module YAML, name my SQL, that's the name of the thing that we want to deal with, state present. So that's pretty straightforward. And again, if we want the user Tom to be present, same thing, ensure Tom is present. User, this is the module we are going to use now, name Tom, state present. So as you can see, this is pretty straightforward. I would say even a non-technical person with a five-minute training can actually understand what is going on. Provis will not be able to write YAML code or unseable code, but it will be able to understand it. So basically, unseable is written in Python, first of all, which can be an advantage or disadvantage. This is kind of a big advantage because it's flexible. It's not bash. And Python runs pretty much everywhere. We have Python running on Windows now. We have Python on pretty much all Linux systems. If you consider that, for instance, Anaconda is written in Python, the Fedora Red Hat Installer. So that is like, you have Python there. And all the other Linux systems have Python and a lot of other non-Linux unique systems also have Python. So that's good. It mainly works in push mode, so agentless. It can also work in pull mode. There is unseable pull, which is a weird thing, but it's not agentful. It's even there, but it's different. But I would say 99. something percent of the usage of unseable is actually in push mode. So big advantage is infrastructure as data, as we have seen YAML is very straightforward. And therefore the learning curve is very gentle. It's very easy to learn to write YAML code. You can actually, I was speaking with a customer a few weeks ago, and he was like, oh, yeah, I have started yesterday to look at this YAML unseable thing. And in less than one hour, I was actually running my first unseable playbook that was actually doing something useful. So that's, I think, a very good one. Very easy to set up. You just install on the master unseable, write your playbook and run it. You don't have to do anything weird. And also it's, I would say, a balance tool. It's like, it's not perfect, probably, in all scenarios, but it's very, very good in all scenarios and in many scenarios is, I think, very close to be perfect. So it's like from provisioning, like terraform kind of provisioning to stuff like chef kind of, you know, multi-thing. So it's like, you know, very balanced. So some disadvantages. No tool is perfect. The first one is that we don't have perfect introspection tools yet. There is a lot of effort here. So hopefully in few months we will have better introspection tools. And second one is the fact that the community is young. And I think this is a very big advantage because if they have been able to actually create this in few years, what will happen in five years time or, you know, that kind of thing. But yeah, it's kind of a problem because you find way more puppet documentation than unseable documentation simply because, you know, puppet has been around for more than double the time than unseable has. So I was speaking with a person a few days ago and he was like, yeah, but if I want to start to use unseable or if I want to use unseable in my infrastructure, what are the steps? What people usually do. So first one is usually write few actions in unseable playbooks. That is like, you know, you need to create a user and instead of going around and create a user, you just get unseable thing, a script, a very small script that, you know, actually creates a user for you. And then maybe you go a little bit further and then you create a set of basic things like NTP configuration for your environment, basic SSS, the kind of configuration or like, you know, basic things that you usually configure on a new machine. And then many people start to roll out one machine completely based on unseable, you know, or one kind of machine. Usually it's the easiest one. And then, you know, you move forward with migrating all machines to unseable obviously you can stop wherever you want or you can do, you know, whatever order you want, but that is how usually people do. And I would say as far as I have seen, the majority of people come all the way down because they like unseable a lot in the first few steps so they go ahead. So I think that's very good. So yeah, let's go with the daemon. Hopefully it will work. So this is my Cloudflare account. I have one domain set up and this one is my DigitalOcean account. Now we are going to start from here. As you can see, there are three boxes here which none of them are .fire.beats and here we have a .fire.beats domain. So what we are going to do is actually I will run the code and then explain it mainly because it's a little bit long. The first execution, since we have the two provisioned couple of machines. So what we are going to do while it works is because suddenly I cannot see the same thing that you see. So okay. So this one is my unseable project we are seeing. I hope it's big enough. Okay, good. So we have a few files here. So mainly we have some folders. We have an unseable gfg file which is configuration for unseable. You can have it, not have it if the configuration by default is fine for you. Then you have an inventory folder in my case could be a file or whatever where is the place where you put inside it your host and the other things. And then you have your master.yaml file which is usually your entry point to deploy your whole infrastructure. Then some playbooks which sometimes can be thought as manual actions if that is a thing that you want to do. Other times can be you know the deployment of a single kind of host or a single host. You know it depends. I have for instance a provision for digital ocean and cloud form. A teardown for digital ocean and cloud form and a first run script because on the first run I want to clean some configuration that I don't agree with that are in the default one. For instance by default it arrives with a root account. I don't really want a root account. And then you know groups I will deploy two web server and one DB server. So we have one yaml file for DB group and one for web. And then we have well I read me guess what. And then roles roles are like playbooks but a little bit they allow you to to split things and usually the big difference between roles and playbooks is that roles you really want them to be them potent. Playbooks can be them potent can be naughty them potent that's really up to you. But roles if they are not even potent that will make problems for you. So I have three roles. One is common that will be deployed on all machines. One is DB which is only for databases and one is web which is only for web hosts. And then I have a vault.yaml file which is encrypted and contains my keys for digital ocean and cloud cloudflare API keys. So let's see. Yep it worked. So if we see what look what happened here. This one was my playbook provision the DOCF. So the one that will provision actually the machines. So it starts doing the setup so it gathers few data around the local machine in this case because it was running on local host as you can see here. And then it reads the file the vault file where I have the encryption keys sorry the tokens for the DOCF and then it ensures that my SSH key is already on digital ocean because I want to provision machines that do not has password login but directly SSH keys. And then I asked digital ocean to create three machines for me and it worked. And then I create the domains on cloudflare about those. Those are basically I want to have DB01.file.bits WS01.file.bits and so forth. And then I point up app app.app.file.bits to the two frontends. And then here I do a little bit of cheating which is I put the host resolution also on my PC because otherwise I should wait you know few minutes for the DNS to actually propagate. So if we look let's pick this one. So if I refresh the page we will see that you know the three the five totally new DNS appeared here so that was part of what we did and here the three new machines should appear as well. Yes here they are. So we have provision effectively an infrastructure which is not huge but still. And we can now go forward with the following step. So the following step will be the first run. What we are doing here is mainly connect to all machines and ensure that the user ansible exists and then make the user ansible actually accept our SSH key and then put ansible as sudoer. This because obviously if we are going to use the user ansible to install stuff we kind of need it. So the the following step will be about actually creating the whole infrastructure. Before doing it I would like to well let's kick it and then we see it. So what it will start to do let's see the file it's include two playbooks. This include playbook A and B will actually you know perform the first one and then the second one as you can expect. So if we pick the first one and look at it we'll see that for the host web we are going to connect to it with the user ansible and then apply the roles commons and web. While the other one the DB that was actually the second line of the previous file it's pretty much the same but it connects to the host DB and applies the roles common and DB. So what are those those roles? If we go in the roles folder as we have seen before we have the common DB and web. If we start from common tasks is the sub folder that contains actually the task files and the first one is you know ensure that NTP is present and then configured with a template and then NTP is running. So let's look a little bit about that template which I think if I remember correctly that is not actually a template. Yeah I don't it's not really a template because I don't have any variables into it but I could make some variables into it if it would make sense. My is on the web and it has unlimited access to the web so it does not really make any sense to me but it could help some people to have you know local NTP and maybe you have two different data centers which you know have different NTP servers one product center so you want to have a parameter there you don't want to our code and have two different modules there and as you can see the code is pretty straightforward and yeah I mean here we end after configuring NTP we install Farworld if it's not yet installed and we ensure that it's running which means that we enable it as system CTL enabled Farworld and we started so system CTL start Farworld and it's all done in here and then we ensure that SSH can actually pass through the Farworld that rule should be there by default but it's better to ensure that you know everything that you need to be there is actually there so here it ended it has already provisioned all the hosts as well so we have seen the common one which was the first part and then we have the web part and the DB part I'm not going to look in that files but those files are actually on github so you can go and inspect them as much as you want so basically now we have provisioned database and two web servers so how can we check if everything works all with seal obviously if we do a seal up dot file dot bits actually the one of the two web servers just replied to us and this one is ws0 if we ask again oh same one okay ws0 1 also replied and as you can see this one is a list of databases I have in the newly created database and you know they are the same they are speaking with the same backend database that's pretty much what you can use as you know a general recipe for the majority of websites out there you know with h8 kind of because the DB is not h8 but so I would like now to tear down the whole infrastructure which is pretty straightforward I have a tear drop here drop okay destroy script and I just run it and it will destroy everything for me now as you can see I copy and paste things from the read me file the reason for that is first of all because I know that I will surely put some typos in it and then try to fix them but second because the first and the last line were pretty long if you have noticed the provisioning one and the tear down one that because due to the fact that we are using ansible volt which is encrypted and by encrypted I mean that the vote file is this it's as 256 encrypted but since as is symmetric encryption you need the actual key so in in ansible we can specify a file that contains that key so that everything is key then everything you know everything is safe so if now we rerun this use after the tear down obviously it will fail probably in a little bit of time for a timeout but it will fail and if we look at here and update the page we'll see that all machines that we provision actually disappeared and in the configuration sorry cloud for cloud flare thing also everything we created disappeared so as you can see we have created an environment managed it and destroyed it in very few minutes so going back to the presentation if I can find it okay so some resources the first one is this ops sorry wrong link but it's the slide I think this one should actually work even if the date is wrong and then there is the demo code that the one I showed the whole ansible thing is there probably you will need to actually put your own keys there for digital ocean and cloud from cloud flare and official documentation of ansible that's very useful it's very complete it's not perfect but it's very complete some videos about ansible and things you can do some wait papers about ansible and ebooks as well so just last thing before going to the question part if you are going to go and grab the source what you will notice is that actually there are two vault files one is volt dot yaml which is encrypted and the other one is volt dot yaml dot example and in the volt volt dot yaml dot example there is actually the example of what your volt dot yaml should look like before encrypting it so thank you and if there are any questions yes okay so the question is how does ansible work in the back end since as we can as we have seen we enabled say forward and then we open the port so the answer is ansible runs one action per time the reason why that actually works is the fact that forward it does not cut the connections that when it starts does not cut the connections that were already live so ansible is using the same connection and therefore there is already the ssh connection there and so that connection extra actually does not get dropped yes so the question is ansible is python 2 mainly so are there any plans for python 3 since python 2 is going away i would say hopefully soon but yes there are plans and i think that i have seen a mailing in the mailing list a couple of days ago where there was toshu saying that all the main modules are actually working on python 3 as well python 3 is not yet supported for the whole infrastructure and everything mainly because we are supporting very old version of ansible side of python 2 and very old version of python 2 and python 3 do not really work very well together but yes it's expected to arrive soon i don't know why when even because i'm not actually on the development side of it i'm more on the usage side of ansible there is actually a module which is virtual environments the end that already manages a different version of python and so i guess it will be implemented in a similar way that's my guess yes sorry i have not heard you sorry about the the speed of agent in my disadvantage slide let's see so this slide or the ansible one the ansible one okay okay so the question is what do you mean okay so the the question is what does not very good introspection towards means basically when you get an error you get something like you know in that line there has to be an error kind of thing and then if you go on dash pvv which is see all the logs that comes from ssh you arrive to the opposite where you see every single problem of ssh and basically too much information there not enough there so it's like an effort trying to gap that bridge and try to have some useful errors without too much ssh stuff into it so the question is some modules actually require some libraries to be present python libraries to be present how can those libraries can be managed if not installed globally so basically i think it depends is the answer i would say generally you can use ansible to install the modules the libraries that you will need so you can target all only the hosts that are going to actually perform detection at the moment the fedora installation installer for instance of ansible actually does not require any of those libraries so if you install ansible i would say on the target yeah i mean you can write them in the playbook you can actually you know the step before using that modules you can put yum and then name whatever library state present and that works and for instance there was before the question about python shoe and fedora there is actually a piece of code that you can run in row mode from ansible to actually install python shoe on fedora course even before trying to reach python and that kind of thing so last question so the question is few years back reddit bought ansible and when that happened it was announced that ansible core and ansible tower which is the web interface would be open source are there any news so no i don't have any news i am not in a place where i can have that kind of news but ansible core actually was open source even before the reddit acquisition and is still open source and ansible tower i don't know i mean it will arrive i'm i'm sure about that and also there was ansible galaxy that was owned by ansible before joining reddit was closed source and it was open source couple of months ago so thank you so what you usually do is put if you have like a three-layer three-segment kind of design of your network where one is public one is like dmz kind of thing and one is private you usually put your ansible one in the private one so that from private to private it's fine from private to dmsz from private to public is fine so everything kind of works while if you do it with puppet you have to put it in the public one because otherwise the the machine on the public segment cannot access your puppet master if it's in the private network i mean you could put it in that dmsz if you want but even there you still are opening a fire a firewall hole between your public and your dmsz which is probably not nice yeah okay it's so helpful in the network yeah yeah