 My name is Rami Alammi, I'm a, oops, that came before. Is it auto advancing? Okay, sorry, that was a bad one. Okay, so let's start again. My name is Rami Alammi, I'm an engineer at Symmetric Corporation here at Culver City. And today I'll be talking about one-click deployment of cloud applications using Ansible. To begin with, of course, you have my details there. If you wanna reach me on social media or the best social media ever, IRC, you can find me there. But I wanna start a small experiment. I always try to pitch to companies that are not usually, or at least parts of companies that are not usually involved in open source to try to support open source a little bit more since my manager is here, actually. Rami, can you say hi? Since my manager is here. I thought if you enjoy this talk or you tweet about this talk or you like something about it, when you post to social media, would you mind using the hashtag scale OCD as in one-click deployment? And if you don't like something about this talk, you can meet Rami after the meeting or call me in support. Both of them will help you out. Symmetric owns an orange, just in case you didn't get that joke. Okay, so currently I'm at a DevOps position I'm my team at Symmetric. And the reason I'm here, or one of the reasons I'm here is I believe in what the DevOps dude said. He said at DevOps, we are awesome. So we take the awesome that's in our head and give it to people who don't have that awesome in their head. Because one important part of being a DevOps person is sharing. It's one of the core values of DevOps. And this is one of the things I'm doing here, okay? And I actually found out who that person was. His name is Jason Freeman and he works for Stackstorm right now. So, and my slides are auto advancing and I don't know why. So this is gonna be a little bit difficult. So I work on a product that's called Symmetric Unified In-Point Protection. And yes, it is soup. So, that's supposed to be a joke. So. Oh, thank you. I appreciate it. Thank you. And it's a wonderful product. Actually, if you saw, come on. This is, sorry, I'm gonna exit the normal slide thing and use this. Okay, I hope you guys don't mind. So the product that I work for and this is the last you're gonna hear about Symmetric in this regard basically manages cloud applications. It's a brand new product that Symmetric grew from the ground up to manage endpoints that the company manages through different products, okay? And like the gentleman who was presenting him from JPL app before me, he actually has a Symmetric endpoint installed on his box. That's why you saw the Symmetric logo there. And this is the product that will end up managing all those. I also identify as a PhD student of Computer Science at USC. Can I get a fight on front of Trojan's in the room? My turn. Awesome. So I also identify there. I also attempted to reboot the USC Linux users group. So no, you don't get to, you should have said fight on. And we have some good work there. So one of the major things before we go to road click is one other thing I identify myself as is a husband and a father. Then I have my family at the end of the room there. So thank you for letting me come to scale. So on the road to one click, you need to know and understand where you're coming from in order to know where you're going, okay? Because every organization has some sort of system in the way they deploy and orchestrate their systems. I don't believe anyone here is still doing it by hand if they have more than three or four boxes, okay? And even if you're doing that, you're not cool if you don't have a configuration management system. And the growing demands nowadays of maintaining security, patching, et cetera, demands that in order for you as a DevOps person to grow up and live on and have a good life, you actually need to have systems like this in place. And if you're a DevOps person and you did not get there yet, I'll give you six more months and then we can talk. So your goal is to actually sit down, have a monitor in front of you and start looking at everything as it goes along. So the actual idea is you have everything pegged out, all your code, all your infrastructure ready, and all you need to do is just watch it go through and see the green okays and everything is nice, go forward. That's, you can say the engineering fantasy or what you're looking forward to. So who knows what this is? Yes? Excuse me? That's Morpheus, where is Morpheus? He's in the construct, right? Now this is where one click starts, okay? One click starts in the equivalent, in the cloud equivalent of the construct. Basically a big one with nothing. You have no cables, you have no networks, you have no routers, no switches, not even power. Well, cloud doesn't need power, so let's forget that on the side, okay? But you don't have anything, and that's the true nature of one click, because if you see that everything I have is gonna work like it is, or it can stand up an environment, which is a big deal in a lot of places. Stand up an environment from scratch, just using code that can be triggered only by one command, that's where you need to start from. You need to start from the construct. But in reality, even the construct is not completely empty, right? You still have a TV that you can use to see the matrix. Now does the TV show the matrix in the green, squarely characters or in full color? Does anyone know why? The nerds in the room. Why they can see it in full color and not as they see it in the screens in the Nebuchadnezzar? Okay, that's homework. So regardless of what happens, you still need some components that you can not always start from one click, okay? You can incorporate their own code that you can use to stand up at any point of time, okay? But still, they need to be independent from your infrastructure as you go along. So basically what you're asking for is to go from this to this, to this, well this is a GIF that should be working actually, okay, where all the guns are coming in the background, to this where, whoa, I'm up. And actually the first time or first two times you do it, people will be four a.m. in the office looking at each other, wow, it actually works, I can't believe it, okay? And it's a good feeling. And it's a feeling of engineering triumph and we all at DevOps, we actually need to have that. Can you hear me really well? So one of the things, I used to be a teacher at USC. I used to TA a little bit, so I'm used to talking to people. What I'm not used to is not people talking to me back. So I would love to have this more of a discussion kind of thing and we can speed up and slow down as we go along depending on how the time is. But more value will come out of this talk if we start, if you engage with me. So feel free to talk back to me and ask questions. So we're gonna go through some steps that will help take you through the one click process and ending in some things that you would like. So first of all, you need to understand where you are deploying. One of the main things that you need to know is in terms of the sizing of your tenants. Now some of the mistakes that I've been seeing talking to people in meetups and things like that. All your environments are bundled in one account or they're not segregated logically from each other. They're actually at some point under the same network that are not even subnetted away from each other. So that's something that you really need to keep in mind. Also you need to maintain access control to your environment. So you may have some people who can see the VM list of VMs and routers and switches, but they can edit them. But you may have engineers that you don't want even to go close to that. There are engineers that you want them to be able to see logs for production, but not the productions VM themselves, and you want a combination of all those. So this is another thing that you need to keep in mind when you do your one click, because as you configure your environment, you need to make sure that these access controls are in place and not because the code base or the current state of the system doesn't support it. You are forced to give SSH keys or passwords to people that technically you should not give to them. Also, there are some resources that need to be taken care of. Like for example, package managers and code repositories. Are they shared in both environments? Is the same package repo that you use in prod is the same one in stage, or is it the same one in development? That's something you need to consider because mistakes can happen and will happen. And we need to take care of that. Also, you need to make sure that you segregate your private and public networks. Now, I'll be adding extra emphasis on some private networking and some of the aspects that may not be standard for the usual deployments that people have, just because those are some of the things that at Symantec we emphasize, making sure that everything is isolated two or three degrees of checks and balances for anything that we do. So it may be a little bit too tight for your taste, but at least you can go back and make it easier from there. Does that make any sense? Any questions so far? One question, please. Yes, no? I'm gonna start pointing out, yeah. I need to build servers. How you do what? Well, the build servers need to push to the package repositories, right? So if everybody shares the same, you need to make sure that you don't have development build servers having access to your production package repositories just because they happen to be on the same network. And when you light-listed the firewall, you just gave the slash 24, for example, in order for all of them to access. So the main thing is you need to outline all those as you go along and document them. That way, the initial documentation is gonna be on paper. But as you get more and more into your DevOps work, into your configuration management, that will be actually documented in code. And what will happen is, especially if you're using Ansible, we're gonna come to that, it's YAML. So you can technically write code to generate the documentation from your configuration itself. And tools like Ansible Tower, et cetera, can help with that. Thank you for the question. Also, there are some intricacies for infrastructure that you need to keep an eye out for. And I'm gonna point out some of them. Hardware versus software accelerators. For example, you wanna ask, I have SSL termination, for example. A provider comes to you and says, I have a box with SSL termination. Well, is that box a software one or a hardware one? If it's a hardware one, where do you keep your keys when you reboot the box? Do you have an HSM? Where do you keep your keys just in case? What kind of ciphers does it support? Does it have all the PCI approved packages, et cetera? Also, are there any layers before the traffic hits your load balancer? Is there an F5 sitting up front? Are there two F5s sitting up front? Now, not everyone in your DevOps team needs to know that information, but one or two key engineers, when they start debugging something catastrophic, you actually need to understand me at that layer to make sure that you got what you're, that to be able to understand. Also some intricacies. Are my security groups or firewalls all the stateful or stateless? Do you all know what the difference between stateful and stateless firewalls? Who doesn't, okay. So there are some people who don't. The idea is a firewall, you block traffic coming in, right? Okay, and you can do it going out, but let's take traffic coming in. Traffic coming in, you either block or you allow. If you're blocked, you're blocked, it's done. But let's say you allow the block from within your company, for some reason, a whole block and somebody keeps calling. If it's a stateless firewall, anyone who calls in will get in, okay? Even if someone is sending a sin flood your way, as long as they're whitelisted and they're open to come in, a stateless firewall will let the sin flood come in. Whereas a stateful firewall will understand that there was no negotiation, so I'm just gonna go out drop that packet because there's no state for this connection between these two hosts, make sense? So small things, especially if you're not going with your standard AWS or your standard WAC space or some of these things actually come into play. Also the definition of availability zones, okay? If you go, I work with open stack primarily. If you go to multiple open stack providers, you see some of them define their availability zone as their whole data center, okay? So technically if your database and its failover are in the same data center, they're considered one availability zone per the SLA. So if both of them goes down, you're expected to have one in another availability zone. Whereas other providers have each rock as its own availability zone. So per the SLA, if you have them in two different locks, you're good. Maybe not per your own confidence, but still. Others have it as a hypervisor, which is scary. But at least you have a clear definition of that and you know it when you're going in to design your system. I'm gonna try to speed up a little bit so we have time for the rest of the info. So I'm gonna pick TTLs for load balancers. Who uses RabbitMQ or message queuing service, okay? For example, in order to maintain state between RabbitMQ and some other devices, you have a device, RabbitMQ cluster, there's a connection between one node and RabbitMQ, okay? And that connection stays up until a new connection is made and goes to another node. Now this node may only know the address of that exact RabbitMQ. So if it boots up and that node is down, it doesn't know how to reach a cluster. So how do you load balance? You can use load balancers to do that. However, RabbitMQ has strict policies on what's the TTL for that load balancer. So if your cloud provider does not support it, you'll be debugging RabbitMQ for like three days and you don't know why until you figure out that it was actually the TTL that's the load balancer not enforcing. So some things like this happen quite a bit. So in this talk, we're gonna talk about how we deploy on top of OpenStack. So I think I'll give a five minute preview of OpenStack in general. So OpenStack basically is your open source infrastructure for cloud virtualization. You have your compute nodes, your networking nodes and your storage nodes. And then you have an interface that you use to interface with all of those. The way OpenStack works, they have different projects. Each project is in charge of a specific component of OpenStack. And as a person dealing with your infrastructure, you need to understand what each component does and how to deal with it. That said, not every OpenStack provider provides each of these components. Okay? Now, do you guys mind if I walk down here? I really like to move when I speak. So Nova is the compute layer. So you do all the provisioning through Nova. Swift is for object store. It's one of those things that get dropped if you're in our private cloud and you don't request it. Glance is the imaging as a service. So it's your image management, uploading images, downloading images, et cetera. Keystone is for identity and it's one of the cornerstones of OpenStack and it's the bane of my existence after one other one. After Neutron, Neutron is software defined networks. And it is the, anyone from OpenStack Foundation here or anything? It is the messiest part of OpenStack. It drives me nuts, okay? Because you have software components, hardware components, software components for anyone. Other software components is just crazy. And Horizon here, it's actually a dashboard you use to modify and do a portal basically for OpenStack. Any questions? Okay. So the next step after understanding infrastructure is making sure you understand your application. And there are certain things that you need to keep an eye out in your application. So the application that we're working on right now, there's a Spring has a known hello world sort of application that's called PetClink, okay? And this is the one that I'm using for this exercise here. So you're here, you're gonna be talking with this example to an engine X node that's acting as a proxy. And it will do a proxy pass to all the traffic to Tomcat, which has the application deployed on it that talks to MariaDB in the backend, okay? We have some supporting cast in terms of Git and the second most thing I hate, Jenkins after Perforce of course. Jenkins and also the package manager, okay? So this is the example that we will be following going on through this exercise. So one of the, so as you start your application, your application is mostly still in development and your developers are developing on i7 machine, i7 hardware CPUs with at least 32 gigs of RAM. So if the application does anything wrong, they really don't see it. And it's really hard for them to profile it before they actually send it down to a profiler or even discover it until they deploy it on a VM that's yay big compared to their computer, right? So as you start deploying your application, you'll notice that you need to account for your CPU and RAM requirements and document them, okay? There's a client time is gonna come in and say, oh, I need this mini deployment for that PO who's gonna go and demo it somewhere. How big do I really need to make it, okay? And this will come in handy. How much object store, how many block storage do you need? Also, and I emphasize this is an unusual scenario, but your VM egress and your application egress requirements. You need to make sure you account for anybody calling into any VM or container that you have, okay? You need to account for everyone calling in. You need to make sure you allow them explicitly to calling, okay? And your application in general calls out to different locations, okay? And you need to make sure you account for all of those. I went through the exercise where we had this application running completely in a lab environment, okay? And we took it to an open stack instance that was completely firewalled. And egress access is only allowed by permission. So if I wanna go to Google.com, okay, they're not a competitor. So if I wanna go to Google.com from a VM, I actually need to file a firewall change request in order to access Google.com, okay? So in, or of course you can just do a VPN to another node that actually has access, but nobody knows about that one. But regardless, in order to, so that actually you sit down and you start seeing stuff failing in your environment. But better yet, environment is environment. You can open up tickets. You'll see your own CM code failing, okay? Why? Because this specific node package required to go to that weird repository instead of GitHub that you got white listed for two minutes. Oh, then I need to actually clone that locally so and change the configuration. That way, node will pull it from there as opposed to pulling it from GitHub. For example, or there's a trend going nowadays for a little bit more complex open source software where you say, oh, just curl this URL, pipe, bash, setup, and hey, it works. That never works if you're sitting down the firewall. And it drives me nuts, not knowing what's installed on my machine. I actually, what I ended up doing, I installed CoreOS and put in a huge Docker container and installed the application and did a diff and set down to see everything in order to be able to deploy the application. Stackstone, if you're watching, that's you. Okay. Yes. So that's actually the first thing that I did when I was diagnosing that issue. What I did is I set up Squid Proxy on an environment that has complete egress access, full access to the outside. I set up Squid Proxy and I deployed my whole application and I sent all my logs to Elasticsearch and then I started combing through them with a lot of Grip and Regex and I got the main set of URLs that I need access to. Okay. Then there's the, you want to hear about intricacies. There's one small intricacy. Okay. There's a certain set of firewalls that does not allow you to whitelist URLs. They only look at up by IP. Where is that a problem? Any guesses? SSL, that's a problem. Yes, but mostly I'm lagging, I'm calming something outside. Anyone works with mobile devices, develop apps for any reason. Okay. Your phone talks to Google and any service you interact with needs to talk to Google. Okay. On the back end to make sure that you can push notifications, you can push profiles that goes down to the devices. Good luck tracking Google IPs. You can't do that. And you go to the security office and say, oh, just do that slushy network. Give me the whole thing. That will never happen. Okay. So you need to work out some tricks. So you want to know a trick? How I dealt with that one? Hands. So what I did actually, I have a job in the same environment polling the same URLs. Okay. And whenever a new IP comes in, spits it out, sends an alert, then we follow a request for it. All IPv4 in our case. We're all private networking. So IPv6 doesn't really make sense. And if you have a dotnet engineer who became a Java engineer and you want to explain how to ping local host using IPv6, it's going to take some time. So we kind of switched to that. Okay. So going back here, also bootstrapping requirements. You learn a lot about the application if you go into a clean home environment. How did I ever bootstrap Caltrace? Where did that user come from? Oh, only because that deep matched the SSL certificate? That's how they got it and we couldn't deploy it anymore. So there are certain intricacies and lessons learned that come up. And I'm looking at my rationale because I'm bored. It's because I want to make sure we give enough time for other components of the talk. Okay. Also one important thing you need to know where you stand. In terms of we're all matured of ops engineers or aspiring DevOps engineers or hoping we can be DevOps engineers, you need to know the tool set that you have. Like a lot of people, we were invested in Puppet for quite a bit of time. Let's, I love the Puppet guys. Actually last scale, we sponsored Puppet labs at scale. And I'm so proud of that. But you need to know your tool and you need to know its limitations. Okay. For those who are using Puppet, one of the main issues that we had with Puppet is the construct. Okay. You want a tool that can bring up itself. A chicken and egg kind of problem. So guys, how can I bring up an environment where you need a Puppet master? How do I bring up a Puppet master? A Puppet master. Or you do it by hand. That's one thing. Also it actually Puppet doesn't, at least at the time and until recently does not have support to provision resources in OpenStack natively. The support that they have is repurposed EC2 calls. So if I actually wanna provision an OpenStack using Puppet, that won't work for me. So you need to understand the limitations. And some of the things that you, that you need to look into also decisions that you need to make about environment. Are you gonna use this environment as pet or as cattle? If I know what misbehaves, are you gonna just pet it? Please come back to health. I'll patch you. I'll do whatever you want. Just come back again. Are you just gonna kick it to the curb and, oh, Peter's not here. But at least put it on the side and try to, and just bring up another one that's healthy. Okay. Also you need to decide what layers of tolerance do you have for that misbehavior. Do you say, okay, if it's just a misconfiguration, property files went away, maybe the last Puppet one or Ansible one didn't catch something, will I throw it away? That may be more costly than just deploying your property files. So you need to have some sort of measure there on what to do, as opposed to operational failure, that you just totally kick it out and bring something else. Also, you need to make a key decision, especially if you're going to Ansible about your machine state. One of the main powers of Puppet is that it ensures that a specific node is in the state you expect it to be. So even if a work developer went in and changed something, if your Puppet code is right, five minutes later, 30 minutes later, that node is gonna be back the way it was before. Now, if that has value to you, then yes, you may think, you may Puppet, you may go Chef, you may go other resources, that will work for you. Ansible may not, it could be, but it may not be the, it may miss some of the components that you really like. If that thing is important. For example, in our team, we adapted that if it's bad, kick it to the curb mentality. So machine state is not really that big of a problem. And if you have enough sensors in that let you know what the machine state is at a certain point of time, you can put in the code to actually auto heal as you go along. Also, you need to make sure that if you're using Puppet or any language that has a DSL in it, you need to make sure that the more qualified people are writing code for it. Otherwise, you're gonna end up with Puppet code and almost everyone I talk to who uses Puppet are either in one of camps. They either have a dedicated team doing their Puppet code and they're extremely happy and they even have Puppet Enterprise and I am even, okay? Or on the other hand, everyone in the company is writing Puppet code and the guy who actually needs to run it is going crazy. Okay, all kind of weirdness in the code and everything at the end is an exec. Okay, touch, I'll screw it. So if you're really into code or coders as shippers, then you may need to re-evaluate the tool that you're working in. Also, no, you don't necessarily need a Dumber Deployment Tool. You can have a combination of both. Like a lot of shops have a Puppet and Ansible Deployment where they manage their machine state using Puppet but do their orchestration using Ansible. Well, then you need to start putting the same qualities you have for your code, for your DevOps code. Testing, code reviews, making sure no bad code goes in and at the end you start phasing things out and that's what we're doing actually at this point because you can tolerate it up to a certain point but afterwards it's like, I have bigger fish to fry, I cannot deal with this anymore. But if you're in an organization that they can write that code, I think it's harder for you to do that unless you're in a place of authority which usually equal developers don't have that within a team. You should do it regardless but you'll find people who aren't reviewing the code that's next to me or Oscar comes to me and say, hey, I'm sending a code review, can you approve? It's not even in my inbox yet. Okay, so things like this happen and it's the way things work. Something is really broken, it needs to be fixed but at the end you're gonna incur so much engineering debt that at some point you're gonna just need to invest in cutting the code. Does that make sense? And then we can discuss it more offline so we can go along with the presentation. Okay, also this is, if you take anything, anything out of this presentation, I hope you take a lot, but if you take anything, you need to take inter potency. It is the golden rule of developers and it is so important that any tool you pick, if you achieve a cut-in potency coverage of at least 80%, you're in really great shape, okay? What does that potency mean? It means I have my code, I run it once, it'll deploy my whole system A to Z, okay? I run it twice, if nothing is wrong, it'll come back, nothing is wrong. If I run it three times and a node was down, it's gonna run, fix that node and finish the run and report that one node back, okay? So much headache and grief can go that way and that's the fastest way where you can achieve this new buzzword auto healing, okay? Because if your code, if you enforce it, because I'm pretty sure if anyone opens their configuration management code base, whatever their tool is, they're gonna find a curl call, a curl call or an exit line, just executing a shell script, okay? The problem with that, okay, that shell script itself could be at your point. I could be making an API call that is mundane. If I run it 10 times, it won't screw anything in my system, but here's a problem. If you run an exit script on a curl call, every time that curl call runs, it is considered a change in the state of that node. So you will get our report back saying there is a change in state. That is bad. I should not, and I repeat, I should not get a notification of a change of state unless something actually changed. Does that make sense? So part of your effort, part of your work will be to make sure that even if there's a shell script, and by the way, this is something Ansible shines in. Because you can write a bash script that if the bash script understands that nothing happened, it can spit back nothing happens and Ansible will return nothing happened. So in that case, the actual call will be fine. But in that case, it's not a shell script, it's an Ansible module. Make sense? Okay. Any questions? Do I need to start pointing? For questions? You know I'm gonna call you out. Okay. So one important thing to know also is when to shut up. I know a lot of you here are actually here for Ansible. So now's the time we switch there. So before, so let's Ansible. So if you ask me what Ansible is doing and what it is, as far as I'm concerned, it comes, it ranks like in the top 10 of the best thing after sliced bread. At least in my case. Why in my case? Because my responsibility is to provision a complete environment from construct to guns, lots of guns. That's a quote from The Matrix, okay. Complete provisioning of everything, setting up everything, calling out to external services, reporting back that anything goes down, okay. I happen to use OpenStack and there's amazing support for OpenStack and Ansible. It's not just because Monty Taylor, one of the Ansible, one of the OpenStack board members is in charge of the Ansible code for OpenStack, has nothing to do with that. But actually there's a huge community behind it, okay. And if you go, puppet takes the idea of you get anything you need from the forge. Ansible is the complete opposite. Everything you need is baked in. So they have two repositories. They have a core modules and an extra modules repository. And people just contribute code there, like crazy. So anything, especially for cloud site, anything you need is there. The orchestration component of Ansible is very mature. So what does Ansible do? People say a lot of things, but for me these are the three big things. The first one is cloud provisioning. If you're in my case, you're using OpenStack. You can have in one file, okay, the credentials and details for four OpenStack providers in one file, it's called cloud.m. You can have four OpenStack providers and you can run the same code base once for each provider and you'll get the same outcome in all four. Okay, I did it for two and it works great. I didn't test it with four, but with two and it works. The same code base, as long as the OpenStack back end didn't crap out on you, which happens, you're gonna reach that same level. So it does that. Infrastructure has a service orchestration, so adding a load balancer when you need one, removing a load balancer when you need one. Like, I'll give you an example. We have a case that we call it the magic packet case, okay, where we have this client that calls to a load balancer, okay? This load balancer is an HA proxy in a active passive mode, okay? So you have an active HA proxy that load balances the nodes and then you have a passive one that if this one goes down, this one will come up and a new one will be provisioned somewhere else and it will become the passive one. So far so good? Okay, so what happens is there's this, as far as we can tell and as far as Juniper Network says, there are SDM provider, there's this magic packet that comes in and will go to this load balancer, it crashes the HA proxy. This one comes back to pick up the slot, packs our buffer, the same packet goes to this one, it crashes by the time the next one is up and it crashes by the time the next one is up and it crashes and we have this cascading failure of load balancers, okay? All we needed to do is to provision another load balancer and change the DNS entry from that load balancer to this load balancer, since we were using local DNS on the same hardware, we were golden, okay? Make sense? So things like this is really easy and fast to do with Ansible, for multiple reasons, one of them being how easy it is to pick up as opposed to other tools. So the way it works is, anyone has a pointer by any chance? I'm really used to pointing and stuff. Okay, so what happens is you have an inventory. Everything in Ansible is based on an inventory, okay? I have these sets of nodes and each node has certain properties, so I say give me all web servers, it'll give me all, oh, thank you so much. Great, appreciate it. So you have the inventory file here and any VM you own is part of that inventory, okay? And you can tag your files in INI format, your VMs, any way you want. So if you say, all my Phoenix two nodes, you can list 20 and you can have all my web server nodes and it can have five of those 20, okay? So when you query Ansible to do something, it'll just pick one of those, whatever group you give it to deal with. And also playbooks. And if I have one wish, we stop calling things recipes, playbooks, scripts, whatever hats and knives, et cetera, just pick a terminology. We should write out a name for everything in computer science, except configuration management. So I would love to change those, but I'm partial to playbooks so we'll go with that one. So you give an inventory and a certain playbook which is a set of tasks to Ansible to run and Ansible will figure out where to run it on. It'll SSH those boxes, download its mini small agent, put it on that box. That box has, does not need anything, except Python 2.4 by the way, to be installed on it. And then that whole thing will run. It'll come back and return what happened. That's it. Now if you happen to have an evil shell script, it'll take that shell script there, it'll run it and it'll come back telling you that it changed regardless of what happened. Questions? Okay. It sounds rosy, trust me it is, until it's not. So how do you install Ansible? Let's say you have an existing system right now and you wanna use Ansible to run it. You have VMs provision. If there are Linux boxes and there are more than CentOS 6, which most likely is the case, or if it's the last LTS for Ubuntu, you have a Python that's big enough to handle Ansible, okay? All you need to do is clone the latest version of Ansible if you're so inclined. Or of course you can use your package manager or pip to install it. Just make sure from now on, that you're using Ansible 2.0 or more. Because there are major differences between the previous Ansible and Ansible 2.0, okay? You're going to Ansible and you run this file just to make sure that your environment is set up properly. Now, if you wanna deal with OpenStack, you need to add these lines here. Sorry, Ansible needs these on your local box and these are for OpenStack, okay? OpenStack basically you need the Nova client, Neutron client, Keystone client and OpenStack client. And Shade. Shade is actually the Python library that all the calls that Ansible makes to OpenStack are translated from Ansible-ish to OpenStack-ish using Shade. Yes, yes, it is agentless in the sense that you don't need to install an agent on any box. However, the lead functions, when you tell it, go install nginx on those boxes. It'll SSH there, create a temporary directory under your user, install its agent, which is basically a Python shell that will run everything that it needs to run and then it will clean up after itself and come back. The agent, oh, you just got it from here. It's part of the Ansible code base, okay? And I looked at Ansible code base, it's actually beautiful. If you're not very good in Python and you wanna look at good Python code, in my opinion, I'm not sure other people, but this is actually some good code to look at and you can make sense out of things. There are some intricacies here and there and we're gonna see some of them in the code samples, but it's some good code. Now, small hint, if you ever have a problem when you SSH, especially if you're using a Bastion box, you need to install this small package that's called SSH Pass, okay? Just a small hint, if you're wanting to an SSH problem, just install this, we'll fix it. Have no idea why, but it does. Though, don't get it from source, it's hosted on Sourceforge and they have so many ads and stuff on their website, so just any version from your package manager will suffice. And yes? Well, okay, well, you, I know, okay? Because MIT is too liberal for you, right? No, no, it's not for security reasoning, it's for, okay, when you use a Bastion host, the Bastion host, all the SSH work is offloaded to the Bastion host, like a jump box. So, instead of you as a searching to the end server, you need to SSH this server, which can SSH over, and you need your SSH agent to follow through, there are components of SSH Pass that you need for that. Okay? If you need your password, not your key. I can reproduce the problem. Don't get me started. Can I try Semantic? No? We'll talk offline. And by the way, ladies and gentlemen, Professor Ted Faber, he's my open source mentor, so that's why he's giving me a hard time. Yeah, I'm gonna blame him, and you can blame Scale on him, he's actually called the Godfather of Scale. He was there when it all started. So, thank you. Round of applause, everyone? Thank you. Round of applause. That'll get you to stop talking. Okay, everybody else. Okay, so this is when we're gonna go to code. Yes. Okay, so think of it this way. And I'll give you another example afterwards, but this one is, so as part of securing your own network, okay, not all your boxes need to be accessible from the outside. So what you do is you create a private network, think of it like your home network, okay? You have your own router that's set up. All your computers are inside your home network, are using a 192.168.something.thatsomething IP address, right? But you cannot SSH that box directly. But let's say you need to manage, if you're a user inclined, I'm sure a lot of people are, you want to SSH to your home box from Office. So how do you do that? You either put your box in a DMZ zone, or you have a box inside your network that has a public IP address. Okay, and you SSH to that box, and that box and only that box has, think of it as two network cards, one network card attached to the outside world, another one attached to your inside network, and you can talk to your inside network from it. So that's what you use as a bastion box. So, especially if you're using open stack or private network, you have the luxury of having none of your boxes accessible from the outside, which is almost a requirement. Okay, and you have one box that you can use to jump to all those other boxes. Yes. You usually have, what happens is, like in our case for example, we have one box in each tenant. So if we have a specific project, we have one box in each. That's, and that's called the overlay network because that's a virtualized software defined network. Then you have actually physical boxes on the underlay network, okay, that are controlled, that are almost bare metal basically. And you allow those to calling, okay. And you can also whitelist an external box as long as it has an IP address that's publicly accessible and known. So it depends how paranoid you get and how much security precautions you wanna take. So apparently we only have 10 minutes to go through the meat of the talk and you can blame it on Professor Fever. Okay, so. Okay, and I officially hate this. Okay, so. So I put up this QR code before the talk started in hope that people will actually, okay. You know what I can use? Justine, do you mind coming on my computer please? Okay, so this is the code base where we have the examples for here. So the first thing that I want you to see is, can you please go to the Ansible configuration file? Okay, now let me bring up a case and tell me what problem you see coming there. I have, I'm provisioning brand new boxes. Boxes I never talked to before. And I'm gonna SSH to them, give me one problem I'm gonna face. The user's provision as part of the image. Host key, correct. So how do I deal with that? Yes, that's the easy way of doing it. However, in the production environment, you might wanna actually have the machines call back with their host key so you store it locally. That way you don't, you keep the keys as you expect. So, host keys checking, you just set it to false, okay. Also you might wanna allow the forwarding agent. What is the forwarding agent? Let's say you're using a jump box like we talked to about before. Now you load your keys on your box here, my personal laptop. And I SSH to the jump box. I don't own that jump box. Joe Schmoe owns it. I don't wanna put a public and private key that are attached to me on that box because Joe Schmoe has pseudo access. And he can read my private key and he can practically impersonate me, right? So I shouldn't be doing that. So what I do, I actually load an agent and that agent forwards my credentials with me as I jump through boxes. Make sense? So these are things that make my life easy as I go along. Okay, let's go back. And if we scroll down, let's go to clouds.default.yaml. The one under, yeah, that's why you're up there and I'm down here. Yes. Okay, scroll down please. So this is the, if you're dealing with OpenStack or even you're dealing with AWS, you need to define your credentials and your details somewhere. So this is called your clouds.yaml file where you can define an n number of clouds in this file. And remember it's yaml based. So indentation like Python indentation does matter. Okay. And you really said if it's a private cloud or a public cloud, if there's a region prefix, so if you're in Virginia one, Virginia two, and Keystone is the authentication details and the username, agent Smith, the password I hit oracles and the all these details, you actually get them from OpenStack itself. So once you're given access to an OpenStack instance and use Horizon, you go to security, you say, don't give me the details of in a bash file. We'll just give you a bash file with all this information except your password. Make sense? And the code is on GitHub, so we can follow along. So let's go one step back. So the first thing I wanna do is go to the construct. So the goal of the construct is to basically build my network infrastructure. Okay. And pay attention with me here because we're actually going through some of the Ansible basics as we go along. So first things first, we said Ansible is based on inventory, right? But in order to create VMs, I'm calling an external server. I'm not calling a server that already exists. Should I put that server as part of my inventory? No, I just say use localhost. I'm just making API calls. I'm not SSHing anywhere, right? So you can actually have modules that do not need to run on boxes. They just need to send alerts. They just need to make an API call. They just need to make something, even email someone, okay? And you just set the host that it runs on a localhost, okay? And you set the connection to local because the default it was SSH to your localhost if you don't tell it. So you set it to local and gather facts. Everybody familiar with Factor in Puppet? That's basically a script that you call and gives you everything that could be known about your box, okay? And that's how Puppet uses it to keep track of everything. So you're telling it, don't waste the two seconds you need to collect information. I don't care about the box that I'm, I don't care about the box information right now. Now, Ansible has something called pre-tasks and tasks. It's simple. You have a set of tasks that you wanna run, okay? It goes one by one, predictable. Anyone tried to do one by one in Puppet? Have you ever succeeded? I don't think so, okay? You can, okay, I know you can but you need to be an engine, okay? So sometimes you wanna run tasks but in order to get tasks you need to have some information collected beforehand. And that's what we do in pre-tasks. So in this case, I said I have a file that's called project.yaml, okay? That I need you to import all the information from. So would you mind duplicating the tab? Go one step back and go to group vars directory here, yes. And project.yaml, okay, scroll down. So here I'm putting some information that are variables. This is where I keep my variables basically. The whole idea is I wanna keep my playbook as generic as possible. So if I run it on different environments, all I need to do is change the variables, okay? So I define, I'm calling my project the matrix. I give it the user credentials. I also provided the SSH key. Why do I need to give it my SSH key? I'm SSH-ing into boxes so I need to give it my bubbly key otherwise I need to put my password everywhere, okay? By the way, hint, hint, if you're using Ansible all your boxes should have SSH. You should have your keys installed on all the boxes and you should have password list pseudo, okay? And the most common playbook that people write is the playbook that SSH to all the boxes and installs your key. There's actually a module for it. There's a question, yes. Very, very good question. So Ansible Tower, which is the product that the Ansible company says has an SSH key management system in it. In our case, we have a hybrid SSH show for production. We actually have a routine that goes in and deletes all the keys and even deletes all the users because we use LDAP to authenticate. So if you're gonna use something in production, provision a key on the fly and plug it in for you. And it's taken out after a while so nobody stays in production for longer than they need. Okay, so there are different ways to do it but if it's not production you can be a little bit lenient about it. There are projects that you can use to do it but I really didn't find one that I liked other than vault.io but I didn't have a chance to vet it properly. Okay, so also I wanna define some networks. Application needs some networks, right? So I'm gonna define a network that I'm gonna call the jump network and in OpenStack each network needs to have a subnet because the network can have multiple subnets. So I'm saying, great, this network that has this subnet and this is the IP address range that I want that network to have. And I go down for five networks. Would you mind going down there? And also I wanna define a router that I can use to connect all these networks together. So far so good and basically these variables define everything. So let's go to the previous tab. And so the first thing you do in here I'm assuming the most vanilla application you could have. You don't even have your key uploaded. So what you do here, the way Ansible works is you call a task, you start with a tick here and you give it a name. This is a comment basically. So when you do an Ansible run, this is the comment that's gonna come out. This is a module. Notice that the name and the module are in the same indentation level. Trust me, indentation will kill you. So keep that in mind. Then we indent a little and these are the variables that this take. That this module takes. It takes a cloud because remember the clouds.yaml file that we had? It needs to know which cloud to pull out of that file because you can have 10. So you need to tell it which one. And, all right, because I'm smart I kept it in a variable file, right? Okay. And I wanna say present because you can say absent and you can go in and delete a key from being there. And then I give it a name because I need to use it later and I gave it the public key file. Make sense? By the way guys, I kept my key there so if you wanna put it on any machine you want feel free to do so. Okay. Then I wanna start creating the networks. Now we're doing a little bit more and the advanced Ansible-ishness, okay? And two minutes, I promise I'll stop. So this is basically what I'm doing is I'm doing a loop, okay? So same thing, I have a module. I have the properties of each module but I'm looping over project networks because I remember I have project networks and I didn't have a set of networks under it and then each network had its own properties, okay? So I said loop over all the networks and from each item in that you're looping over give me the net name, network name. Fair enough? Let's scroll down. This is creating the subnets, okay? And it's the same thing, I'm looping over it and but I'm using more variables. So for, give me the net subnet, the CIDR and actually the network name also. So far so good? Now the last one, it's called OS subnet tasks. Sometimes when you're running something the inventory is not kept alive. The inventory is red at the beginning and it's used forward. Sometimes you need to load stuff from memory as you go along. So you have a module like this that tells me just give me the facts please, okay? And you give it the facts and it will store in a variable for you that you can use and it's basically just a dictionary and you can parse it any way you want, okay? So unfortunately, thank you justine. Unfortunately this is the, as much we can go to at the moment but I'm gonna sprint through a couple of things quickly and if you don't mind sticking around for five or 10 minutes. So the second step that we'll actually go through is the provision document and this one is where I actually provision the VMs and in this case I'm using an Ansible role that I created. The role is called provision VM and the role takes input that are the cloud name and the specifications of the VM itself. Now as since I'm a good boy what I do is I store all those into variable files. You can see I stored the VM specifications in a variable file. So the name of the VM, the key name that I attached to it the size of the VM and basically how many I want of it. So it'll create as much as I need. So going back to the presentation. So you start basically from an empty set. You call the construct and you get the networks that you see here just by calling it a playbook and then you call the provision and you have the hosts created after you call the provision. And when that happens actually the Asian technically lost because you actually provisioned everything that you wanted to provision and I guess I will say now you know kung fu as Neo famously said. So we can discuss this more offline. I appreciate your time and your patience and I hope you learned something new today. Thank you very much. They're gonna be posted on the talk itself on scale.