毋elj They would be talking about ansbrushes and most specifically how he to automate some tips and tricks that I picked up over the course of the last few years around how to think about automation in a way that I think they get a little bit more sensible or sane so I've been roughly 20 years in IT at this point 10 years I have been useت uns Expect for 10 years Ofther and I will be talking a little bit about the project that who is actually my first project, so it's a ten-year-old project, but still, I think that we have done some things right, felt a lot of things wrong and I had a lot of learning from it. And now I work in reddit as a specialist social architect for Ansible. So the project that started my experience with Ansible has been a project that was a little bit messy. Let's put it this way. Zato sem vzal v tem družu, da je to počkala, da je to komplex, da je to počkala, da je to problem, ali je to počkala, da je to počkala, da je to počkala, da je to počkala. Zato, da je to počkala? Prvno, smo se pravno izvahili na javno objev, kaj je tudi tudi tudi tudi 300 glasbši instalacije. Glasbši je aplikacija vsega, kaj je tudi vsega vsega vsega. Zato, da je bolj, je, da tudi tudi tudi tudi 1 aplikacija per kustim. Tudi tudi tudi tudi 300 kustim, kaj je tudi tudi tudi tudi tudi 300 kustim. Unfortunately, it wasn't the same version of Glass Fish, because they started at 4-0, and then some customers had some issues with 4-0, and they started to roll out at 1 to some customers, I mean, obviously to the customers that actually complained about the issue, and then they found other issues, so they started to roll out at 4-1-1, but they're only to the customers that have issues. več smo je Úgostje. Kaj je odličaj na pen, vznujem, da apliko so odličaj, da ni glasbila, kotSS, če je je vznjutil, bo ne vznuteno večil. To je zautokratil vše tvoje je groundingo, že tega aplikacije tega mimo aprèsnijo odlečnejo, pri ježi pri moženje, a potem vznut so 50k in ledje. v kompletivnih števstvih situaciju, na Centros 5, vsega na Centros 6, vsega všeč nekaj, in 300 nekaj instansovih mi SQL, ker, da bo, da nekaj, jih je vsega aplikacije per kastelj, da nekaj je vsega mi SQL per kastelj. Nekaj, zelo pogledali mi SQL z RPM in nekaj ni vsega mašinje. In jaz sem zelo, da taj požibili proti dobro 60, zelo izmah mi je poživila. A, da je ovo počel, tudi sem mojazil 5 skripov per skor. Točno se vedno začeli. Tako, da je dobro vsega, rešar, točno se. In točno so ni vsega. In jaz si nekaj, če ste prišli izvajati vsega in jaz se potrebno viče. As you can see, it's not a very linear situation, but I am sure that with some slight differences, but still same concept, you can find in your organization exactly the same situation. I have been a consultant for many years, I've worked in Reddit as well for many years as a consultant, as a presale person, and I've seen a lot of different cases where this is roughly the same situation everywhere. So we started to search for a solution for this problem, and we were looking for an automation system. That was the only thing that was clear at the beginning, and very easily we started to add requirement, the first one that had to be simple, because why add more complexity to something that is already kind of complex. Second, it has to coexist with what is already running, because we are talking about a business that is in production. All those customers are running these applications in production, and they are running their whole company based on this application, because this is a mission-critical application for the customers. So everything needs to be smoothly moved to automation without any interruption or issues for the end customers. It does need to keep the same security model, because we are talking about PCI DSS applications, so credit card applications that manage credit card numbers, therefore if you change the security model, you have to re-audit everything, which might be a little bit more complex than we would like to. And it has to be kind of self-documenting, in the sense that at least we have to have just one thing, not two, code and documentation, so it has to be together, code and documentation, so that effectively we can be sure that over time we don't have a drift between the documentation and the code. And this was a requirement that I added for the customer, in the sense that it was not a requirement for them, but it became clear to me that that was a very important point, it had to be edempotent. Now, what do we mean by edempotency? I think that edempotency is a great concept, very absurd sometimes, but very useful. So it's a property of certain operations that allows you to ensure that only the first time the operation actually changes something. From the second time onward it will not be changed with some asterisk. So, some examples of edempotency, x equal 100, no matter of the value of x at the beginning, after the first run it will be 100, and from that moment onward it will always be 100, so effectively it will not change. Another one is x equal x at the zero. It can be any number, then it becomes one, and then stays one. So, going into the IT sector kind of thing, echo a string to a file replacing the whole file, so only one major sign, it's edempotent, because no matter on the content of the file, after the first execution it would be stable and it would be just test. Now, so, what is non-edempotent? An example would be x equal x multiplied by two, because the first time is, let's say, one, and then become two, and four, eight, and so on, and so forth, so effectively it's not edempotent. And another example is very similar to the previous one, but we have two signs of major, which means that effectively we are just appending to the file, which means that every time we execute the file will have one more line than the previous one. Now, there are some major cases here to be aware of about edempotency, and the first one is yamupdate. Yamupdate is not strictly edempotent, because it relies on repositories, which is the one that you are fetching your RPM from, and if your repository actually changes the content, so if there is a new version available of your RPM, then you will have a change. So I would not exactly define this as edempotent, but obviously edempotency in math is everything is easy, it's either right or wrong, in the real world it's a little bit more mixed. For the same reason, yaminsol, something, can be edempotent or not, based on the state of the repository. Now, all these can be fixed in a way, or mitigated, if you manage your own repositories, so you don't use RedS repositories, Fedora repositories, but you use something like Satellite, for instance, that would allow you to actually check and be the one that creates the changes in the repositories, so even then it's not edempotent, but still it's something that you can control and manage. WGET, same thing, if you WGET you can have different issues with edempotency, the first one is that the source changes, which might not be ideal, and the second is data corruption, WGET should fix it or fail, but still it can be a little bit tricky to speak about edempotency in a situation where you WGET from an unknown source. If you WGET from your own server, then it's easier, and also if you are managing your own server and you put in the file name, also the version, then it becomes way easier to ensure edempotency. Another example of something that can be thought as edempotent, but actually it's not, we only have one major sign, so effectively in theory it should be edempotent, but we are dumping a variable, so that variable value can change. I have seen servers with terabytes of data being completely wiped out due to this issue, and not with Ansible, with Bash, but the thing was the same, because removing variable slash rf variable slash asterisk, if variable is mp, it will nuke your server, so be very aware that variables can be very useful, but also very dangerous. Ansible was the solution that we found, because otherwise it would not be installed, but still why. First of all, it was agentless. Agentless means that we didn't have to install anything on the single machines, but just create a new service that would handle some complexity. Second, it connects to the machines, VSSH, it can also connect to other, with other protocols to machines, but in our case it was only nuke, so effectively SSH was obviously the choice we went for. And does not care about the state of the rest of the machine, which means that if you tell Ansible that it needs to handle a specific file, for instance, it will only care about that file, all the rest of the machine who cares. And this really helped us to do that slow migration to automation. And a big advantage is that it applies changes in a sequential way, which means that if you look at a playbook, first thing that you see is the first thing that Ansible is trying to do, and then the second, the third, and so on. This makes the whole thing very simple. And it can seem obvious that something executes in a linear way, but actually in a lot of automation systems, this is not the case, because they try to optimize by parallelized operations, and this also creates a way more complex situation. And it has a very gentle learning curve. As we'll see a little bit of YAML code, YAML is fairly straightforward. I would argue that Ansible is not configuration as code, is configuration as data, because I would not really define YAML as code. And Ansible Playbook can be easily read by even non-technical people, so like auditors, we are in a PCI DSS environment, so we have auditors every three to six months in asking questions, and be able to tell to an auditor, this is exactly what we are doing, can help, not to all auditors, but at least to some. And it's very simple to set up. And another big advantage that it has, and it was a deciding factor in the end, is that it's a Swiss life tool. Effectively, you can use it to provision new systems, you can use it for deployment, for configuration, for whatever you want. So the initial setup is simple, we say, but how exactly simple? You first create SSH keys, if you don't already have them, you should already have them, but you can maybe create one specific for Ansible, that could be a good idea, and then you start to distribute those SSH keys, create a git repository, where you will put your Ansible code into. Yes, you can do it without git, you really don't want to do that. You create the first inventory, where you list all your machines, and you are done. You are ready to actually write playbooks. Now, which playbook should you write now? Because we have the whole environment ready, now what? So you need to select a process that you want to automate, your first process to automate. How to select it? First of all, it should not be a critical operation. If you are trying to do something critical as first thing, it will probably not go too well, try to with something very easy. It should be a very well understood process, hopefully by you, or by whoever is writing the automation, because if you understand the process, it's very easy to automate it. If you don't understand the process, it's going to be more complex. Also, because while if you do the operation yourself, it's interactive, so even if you do a mistake, the console will tell you, that's wrong parameter, wrong whatever, with automation, it's a little bit more indirect. So if you don't know exactly the process, it can become much more longer, the effort to automate it. And it should be easy to test. So whatever thing your process you select, it should be a process where you have a clear understanding of the process and how to prove that the process actually was successful or not successful. It should be or it could be at least a boring thing that you should be doing or and say, you know, I don't really want to do this, but I can automate it, which can be fun. And you should have a little bit of time. At least the first few times, the time to automate the process is longer than the time of executing the process. It should not be like millions of time longer, but still if it's a process that you usually run in five minutes, probably it will take you 30, 60, 90 minutes to actually automate it. So if you are in a rush, it's not probably a good time. So let's see the first process that we did was actually the provisioning of those application servers because we had new customers in the pipeline that had the requirement of having new server deployed. So the first part is understand what is the process. This was not the process. The process, when I ask, OK, which is the process that you are following, it was basically a 30-step-strong process. We worked first on the process, changed the process, made it sensible, and at that point we automated the sensible version of the process. It's much easier, much simpler, much quicker. So you install Java from RPM in that case, create the Glassfish user because for reasons they wanted to have a different user for that, install Unzip, download Pajara, which comes in a zip file, and that's why we had installed Unzip. Unachived Pajara, set Pajara file ownership to be owned by Glassfish and create the systemd unit to start it. This is the process that we use for bare metal servers. For share servers it was slightly different, but still same idea. So how do you automate it with Ansible? You create a playbook, such as this one, and as you can see, with very little changes in wording, you can find the previous steps after the name. So ensure we have Java installed, ensure that we have Glassfish user, ensure we have Unzip installed, ensure Pajara installer is present, and so on. And as you can see, we have a few lines in between that are the actual Ansible code, and they are very similar. I mean, obviously it has a module, it has some parameters, and so on, but still if you read one or the other, it kind of feels like the same. And if you have both available, it's way easier to understand them both. And this is the second part of the same process, and as you can see, we had seven points here, four steps here, three steps here, seven again. So we started from a process that had seven steps, we end up with seven steps of automation. This is a good thing. If you start with a process that has three steps, and then you end up with more than three steps in the automation, your process was wrongly mapped. You have to map your processes very well to ensure that your automation will follow exactly as you predict. Some consideration around this. First of all, if allowed, redesign your process before you automate, or while you are automating it, because it's going to be easier. Second, simpler is better. So if there are steps that are no longer required, try to remove them, and then eventually we add them if they are still required. And another thing that we worked on was the automation of users. So the lifecycle of a user, in every company is slightly different, but roughly it's like this. You create a user on certain machines, you add the user to certain groups, you add SSH keys for this user to actually begin, and then time passes, and at a certain point, you will delete the user on all machine that it has access to because that person maybe does not work there anymore or for other reasons. So, same here as the previous one, we can very simply automate the creation of the user, we simply have the built-in user module, we pass the name, the shell, the groups already here, as they present, ensure SSH keys are available, we copy the SSH keys to the target machines. Those are the public SSH keys, not the private ones. And then we can call it in this weird way, which is basically passing a bunch of extra variables to explicit, which is the user, which are the groups, which UID we want, and which are the target hosts where this user should be present. And we can remove users in a very similar way. Now, some consideration around this, it's very easy with Ansible to create distributed batch scripts, which is what exactly we have shown now, but that is not edempotent. So, it's very good to know that you can do it, it's very good to do it, but be aware that that is not edempotent, that is not something you should aim to, but it could be a good stepping stone. Ansible will improve the consistency of your environment, even if using non-edempotent way, but you are not leveraging the whole tool. And then automation is a step-by-step process. So, don't try to automate like this, try to automate like in the second way, which is basically try to create something easy that solves your first issue, and then over time, it will, you will improve your scripts, your code to make it even better. But don't try to big bank the whole environment at the same time, because it will not really work. So, we wrote the second version of the user automation. The first one was not edempotent, so it was okay at the beginning, but not really in the long term. So, we decided to create a file, a YAML file that describes the user that we want to have in our environment. So, we have the user file and the user admin, for instance. And we could have many more of them. And then we changed the first script that we have seen to use with items and iterate basically on that file. This allows us to have effectively edempotent way of creating users and ensuring that all users are always available in the right machines. Now, this does not cover the delete part of users. That was something that we continue to handle in a non-edempotent way, because it's a little bit more complex and there are some security concerns around removing users. So, that was not changed over time, but still, this was a good example of something that started as a script, a distributed script written in Ansible that then became something edempotent. So, wrapping up, Ansible can provide you with simple way to automate distributed processes, if that is your stage of automation. Start with some low hanging fruits and then build on top of that. If possible, rethink processes, don't try to automate whatever process someone tells you that that is the way we are doing things here. Try to understand what exactly they are trying to achieve and then create a good process to actually achieve that result. Aim for complete lifecycle automation, but work over time toward it, don't rush it. And in a way, try to work for a complete company automation, so effectively to automate all processes within a company, or organization, group, whatever, but same thing as before, don't rush it, it's an overtime process. Automation has a huge impact on people, even more than processes. So, it's critical to explain to people why you are doing changes, what will change, how it will change, what will be the impact on them and repeat these kind of things many, many times, because a lot of times people are scared by change. And when we are talking about automation, that is even more scary for people, because the first thing they think about is, great, I'm now going to be replaced by this thing and be fired. Which is not usually the case, I mean, obviously there are some cases where that is the case, but 99.9% of times in my experience, automation, such as Ansible, does not mean people being fired. Simply means that people will change their role, will change a little bit how they work, will change their processes, but over time everything is fine, no one is going to be fired due to this. And there is a big point that is critical, which is be intelligent, be specific on what you are automating. If you have a process, a long process, and that process has a certain speed, that speed is probably the speed of the slowest component or the slowest step of the process. If you automate or increase the speed of anything before or after that step, it will be just a waste. If you want to increase the speed of a process or the efficiency of a process, you have to identify the weak link and then work on that part, not the rest of the process, otherwise it's just a waste. So thank you, and if there are any questions, we have still some minutes. So the question is how big should roles and playbooks and so on be to be useful, so not too big, not too small. Well, my answer is there is no right answer to this in the sense that it's absolutely okay and that is usually how it works that you start to create a role and over time it becomes bigger and bigger, and at a certain point it's like, okay, that's too big, let's split that out, but that is exactly the iterative process you should be going through because there is no right answer of, oh, it's 57 lines because it really depends. I have seen some roles that made sense and they were written below 20 lines, others that were more than 1,000 lines, still kind of made sense, but still it really depends on your specific environment. My rule of thumb is a role should be as big as possible, but it should be able to run exactly the same in different environments without any change, so that means that if, for instance, you are installing Linux and MySQL and you are doing it in the same role, probably it does not really work well because there will be some cases where you only want Linux or MySQL, but if it's a very big role that is configuring maybe an oracle rock, then it's fine. As long as every time you install oracle, you are installing an oracle rock, so that is a little bit my thumb rule. Yes? So the question is around which text editor to use around Yaml. This is absolutely a personal choice. My personal choice is VI. I think it's the best tool ever, but I do understand the fact that 90% of people that use Ansible or Yaml code does not use VI and is very well off without it. So I personally find VI, for instance, very comfortable because I can grab stuff on the command line, on in VI, I can open multiple tabs, multiple windows at the same time. I mean, this is my workflow. I think that over time you have to find your workflow, what works for you, and it could be VS code, it could be VI, it could be something else. It really depends on how you are used to behave and interact with it. Yes? So I think the question is about what is the best way to handle a lot of playbooks that are automating a big environment, and I think the real point is it depends on which stage you are in automation. If we are talking about an early stage automation where usually you are running playbooks yourself, having multiple playbooks can be okay, I think that over time, when you reach to a point where effectively you can nuke your whole environment and recreate it from scratch just with automation and you can run automation over and over because it's completely dempotent and so on and so forth, I would expect you not to have many playbooks. I would expect you to have maybe many playbooks because you want to do something specific in some cases, but roughly your environment should be described in roles more than playbooks, and then I would usually have all playbook that simply calls every single role for every single environment and that would be also part of my CICD system where every time I commit something, it then triggers that playbook to be a rerun and therefore my environment to be aligned to my last change into the Git repo. But those are very different situations based on the level of maturity of the automation within that environment. Yes? So there is no easy way because it's basically a problem of proving the impossible of something or the non-existence of something, so it's a little bit tricky. The way we have handled that in this case was basically that in... Yeah, don't worry. So how we solved it was very simply in a very short amount of time, we had one user per machine and it was the answerable user. That was the only user that was allowed to connect to any production system. We have decided at a certain point that was the good strategy before we had for a certain amount of time a number of users that were non-root or non-sudo users, and it's an overtime process. I think that ultimately the only way to really solve the users issue is not having users at all, on machines. Yes, and that is a big problem. So if, for instance, is the absence of something within a file, then I would use a template for the file and then replace the whole file with exactly the content I want. With the users, in theory, you can do the same because users are justified entries in the user's file. So effectively, in theory, you could do the same with that, but handling users can be critically problematic in if something goes wrong or in another situation like this. So the question is about get structures for ansible playbooks and roles. If I recommend to stick with the recommended version or not, my answer is yes, I do recommend to start with the recommended layout and then start from there. Now, those are not strict rules. Some of those are. So, for instance, the folders within a role have to be named certain way and so forth, but roughly the whole organization of files is not a strict rule. And there is a reason for that because you have to be able to change it if it does not fit you. Now, if you want to move out of the standard, read the standard very well, try to understand why they did what they did and understand if that really does not apply to you or should apply to you and what is your situation. But if for you, it makes sense to have a different structure, definitely have a different structure. So, thank you. And with this we are out of time, but I will be here and probably just outside the door. Thank you.