 I love the music here. It's amazing. Yay! So who's ready for some boffing? Yay! Woo! I know it's the last session of the day, but this is the one where you can have fun and, like, express your feelings in the true ways that you feel them. So come on up. Everyone should come to the front. Birds of a feather is, like, really sad when you guys are way back there. If you guys are here expecting slides. And also my capacity for running around so that you guys can hear each other will quickly diminish because exercise is awful and I don't do it. Who's been to a Birds of a Feather before? Who's never been to a Birds of a Feather? Alright, so, well, this is your kind and lovely host, Mr. Matt Fisher. He's an operator. Yep. Well, to be speaking. So I've operated OpenStack for about three years now. And we use Ansible in a combination of Ansible and Puppet to do that. And so I came up with some of these topics today, but the bottom line is this is just an open conversation. If you want to get up and ask a question, hey, how did you solve, you know, rebooting a rabbit cluster with Ansible? Get up and ask and someone else may share the answer with you. If you want to add something to this, if you want to say something is a terrible idea on this, this is an open etherpad. Maybe you could take that URL that you have at the top. Yeah, I can put it in here. Put it just below in like gigantic font. It's anonymous-ish, you only have to put your name on it. Alright, so who's ready to be participative other than Major Hayden? Because I'm going to just look at you and say stuff and I'm going to look at you. And then I'm going to meet a bunch of other folks so that, you know, Yay, because that's what I do. We do have some pre-chosen discussion questions or topics we can go through if you want, or I'm curious, I guess we could start, who here is using Ansible to deploy OpenStack? And who here is using OSAD versus homegrown? OSAD? So is there homegrown? I was using homegrown. Or othergrown? Everyone's using OSAD? Excuse me? Yeah, there's Kala, there's some people I've met who use Ursula, or have like based stuff off of Ursula. So besides just installing OpenStack, is anyone here using it to do management after the fact, to manage resources, users, projects, or to orchestrate things like upgrades even outside of OpenStack? Everybody? Of course. Well, that was two hands back there. Upgrades is why we started using it. So I'll skip down here because we'll come back to this major concerns when I'm going to skip down here. So I'm actually really curious how people are driving their Ansible. Are you literally just, you know, running it from your laptop? Are you using Jenkins? Do you have some other magic? So my name is Major, I work for RxBase. And so I would say our OpenStack Ansible deployments we all do in the shell. But then of course all the gate jobs we have upstream are all Jenkins based. But yeah, the vast majority of what we do is directly from the shell. And so in a customer environment we'll have a machine that's assigned to be the deployment server. But that server just doesn't do deployments. It may actually have OpenStack stuff running on it at some point. But that's just the place where everything is located including inventory and everything. So when you use the shell, one of the problems we had was maybe something would fail. And the person deploying it would see a failure, they didn't know what it was. And you know, you can't just say, walk over here and look at my laptop, they might not even be in the same state. So how did you guys, you know, share problems like that? Was it through blog files or something else? Yeah, so at first it was kind of like, hey go find the red text and copy it and put it in a bug report, which is obviously not very good. And then after a while a couple of folks worked on shell scripts. So the shell scripts would run and if the ansible failed, it would actually print out a message telling you exactly what to do with the output, which is not, it's like one step further down the road. But obviously I think it'd be nice to get a module at some point where you could have those things. Like it would fail and then you get something that says, hey, do you want to submit this as a bug? Why are in? And you say, why? And then it goes and gets submitted. We eventually use Jenkins just for, you know, it retains the logs. It's like, here's a URL to the deployment it broke. And then history, like it worked three days ago and it broke today or it broke an hour ago. Jenkins worked fairly well for that. Have you looked at Aura? No, so why don't we talk about that? I don't know very much about it. I just heard about it. Who knows about Aura? All right, who wants to be a volunteer and come tell people about it? Because I think it's, oh come on, someone. Matthew Garnier does. Actually, I didn't use it. I just had a presentation made by an ex-co-worker that rolled the tool. And who's called David Morosimhar. So he talks a lot about the tool. And in fact, it's just a way to report what happened into your playbook run. So you can get the actual playbook, the facts and all sorts of information that you can grab via hooks you can configure in Sable. It's just a way to grab everything and put it in a pretty UI. That's it. There is no orchestration or anything like that, like a tower. It's just a reporting tool, like a puppet board, let's say. I think, sorry, I just, I heard something about recording things. Presenting or sort of a shared screen for Ansible deployments or capturing errors was a problem for us. And I have this other note here about other problems you're having with Ansible. And I'll give you one of mine and then hopefully you guys will have a bunch of other ones. So we met. You had a problem with Ansible? I do. And I think it just got fixed. It had to do with how long Ansible takes to find the hard drives on our Swift nodes. And eventually it takes so long that it fails our deploy. I think it was recently fixed. The problem I actually wanted to talk about was upgrading Ansible itself. The problem is that we have a standard set of deployment scripts with everybody has. Maybe they run hourly, monthly, daily, whatever. Those are exercised all the time. Then we have the, you know, rabbit blew up at three in the morning, run this tool type scripts that are only run in emergency situations. Generally are difficult to test, may actually require you to be in a weird state to begin with. And our problem was we simply had so many of these one off hard to test scripts that upgrading Ansible was a problem. And so I was curious what people were doing about that. If you have something with roles, you can use molecule, which is mentioned down here, but simple things like a tooling to upgrade my SQL from minor versions within five, six. Not the easiest to test with going to Ansible too, for example. So I was curious what people were doing for testing. I can't believe that testing is just overrated and nobody's testing the testing. The problem with the operators team is concede. Yes, you were right. That is a sucky problem. And I hate that problem too. Anyone? No one else. No one else upgrade is upgrades. Okay. We're. Yeah. We run Bonnie CI. We don't run OpenStack. We run on OpenStack. And when we want to change the version of Ansible we're using, we have that expressly sort of pinned and we'll just change it in a PR to GitHub, which then gets tested by our Zool. And if the Ansible explodes, we don't roll that out when we go fix our Ansible to work with a new version. So who's the orange? Yeah, so my question was we're going to roll from, I forget what we were on when I left, like 187 or something. Maybe is that, does that sound right? Old, right? Yeah. One, we're on 196. Thank goodness, Clayton's here. We were looking at going to 20. And the question was we have scripts for like a failed hypervisor, you know, ops team go run this kind of script. And we actually had come to the conclusion that we're just going to pick the top 20 most used scripts that we had and make sure those worked and then fix everything after. And Clayton, I don't know if you guys have changed that decision since I left. Nope. Yeah, so I never did solve the problem. I solved all my problems. I didn't solve their problems. Okay. Besides testing, which just was challenging for me, what other problems or concerns or gaps do people have with their Ansible OpenStack solutions? There's a very broad question. So I expect somebody has a complaint. Or something that you love, something you hate. That's also valid. Well, we run into a lot of situations where configuration options change in an unusual way, or something gets deprecated or a way that was promoted to do a particular thing becomes deprecated or becomes a warning. In the modules themselves. Yeah, that can become a challenge. The OpenStack modules are just in the plethora of modules in Ansible. I was going to say, I think Monty tries to make sure that doesn't happen. We also try to make sure it doesn't happen in Ansible, but there's more than, yeah. You can ignore deprecations, right? Why not? Okay, does someone must have something they love or that they did recently that they thought was super cool? Because I know when I started using Ansible, I was like, oh my God, I just did something really super cool and was very excited and went and told all my coworkers about it. Who then said, neat, I'll look at that later when I have a use case for it. And slowly but surely. Oh, in my case, it was building Elk Stack demos, like constantly, because that's where I was working during that deep dark part of my life. We're not going to talk about it right now. Yeah, so come on, someone tell me something awesome you did. I'm going to stand walking around like a crazy person, which is normal. No, no. I'll give one. I mean, one of my favorite things we did, and it's not new at this point, was the multi-region Keystone upgrades where we'd set it up to pull Ansible inventory from Puppet Facts. Clayton did that work. So you'd query Puppet servers in both regions to pull the facts down. You'd figure out how many Keystone nodes there were. There should be six. And then perform all the database backups, stop some of the database nodes. It's sort of a hot spare. Keystone, get it down to one node, do the upgrade, do all the testing, and then roll it back all the way up to six nodes. And we were able to get that so that the Keystone major version upgrades was potentially undetectable, but possibly you miss like, if you're doing like a while true, give me a token, you might miss one or two. And then we use that as a pattern for all our other upgrades. And Ansible really, at the beginning it was cool to try, but it was definitely the, I want to do things, I want to orchestrate things across multiple boxes that Puppet didn't really check that box for us. It is possible that, you know, I can move to the, if you want to have a comment, then I will buy you a beer after this thing because I know you all want a beer or a tasty non-alcoholic beverage immediately after this thing. I mean, there is that. Well, let's skip. Go ahead. Or I'll just have myself a beer. Oh, here? I got to have the thing in my face now. So this is the output from DevStack gate that is actually automatically produced by jobs in the gate. Or I guess this one's on OpenStack Ansible, but DevStack gate also does it. So this actually has analyzed the Ansible run of this job, which is pretty cool. What host it ran on? You can see that changed, which is good for chasing down non-item boot and things. You can see all the plays. This is very cool actually. And this is run as a plugin that spits out this format while you run Ansible and then you can use this as your output for analyzing your runs of Ansible. A-R-A. Yeah, thanks. Did we ask for a show of hands who's running it? Anybody running A-R-A? A-R-A. Users? Hands high? All right, go ask that guy. Say that again. You should upgrade Ansible. I'm curious about this, so I know this was added by John. So is anyone using Mocule for testing? For some reason I can only, I only have his Twitter name in my head right now, which doesn't really help. It's like Johnny Johnny with really characters. His name's, yeah. Anyway, I'll look it up. He works at Cisco. I don't know all that much about it. Yeah, I don't either. I was just curious if people were using it. He wanted comments if anyone's using it. Go to the thing. I'm sure it probably says in the little page, like hopefully. Yeah, it's for testing roles. Testing Ansible roles, basically. Leveraging a bunch of awesome words to do stuff. And I know there's folks who use, oh, what is Will Thames's magical Ansible tester thing? Ansible lint. Yeah, he's got the linter, but there's another newer one as well. I can't remember. I'm sorry? Ansible review. Thank you. That's the thing that I'm thinking of. That's also Will Thames's testing thing, which is not for roles, but yeah. Yeah, so I don't know that I've met folks using molecule, although it seems to have lots of happy users. It's not unused for sure. Okay, let's skip down. Do we want to talk about management or upgrades? Does anyone have a preference? I'd really talk about upgrades, but what do you want to talk about? You know what, you're in charge, so if you want to talk about upgrades, do it. Okay, so we kind of hit on this earlier. People here are using Ansible for upgrades. I knew how much Uganya is. I'm curious how you're using it or what problem it solved for you, specifically. So far it's very simple. We have a bunch of puppet code we already have, and we want to still use it for now. So what we do is we wrote a batch script that runs Ansible. I mean, you stop the puppet agents, you deploy the new puppet master, and you force a puppet run, and that's basically it. Just a matter of updating the DB first and then deploying the new code. That's it. Are a lot of people here using Ansible with Puppet? We actually have Puppet Masters and Ansible Serves for the management of all the things around. Matthew did not say. He actually builds the Puppet Master package and deploys it through Ansible. We have also several jobs on Jenkins who run, and our devs have had solutions for everything, so they start with batch scripts, then they moved on writing their own batch to push Puppet on the nodes, run it there, run checks by hand, and the last iteration of that is that they now build Ansible playbooks to deploy the code actually on remote nodes directly from Jenkins. Folks who are using Puppet with Ansible, do you all know that there's a Puppet module in Ansible so that you can run Puppet stuff using Ansible, like there's a Puppet module? That was written by our friends in OpenStack Infra. Oh, well, really? I thought that was there back then. I mean, you guys have had that around for a while. Added in two? Well, it's there for when you move on to 2.0, but yeah. All right, well, it's there. We know where to find the maintainer. His name's Monty. All right, sorry. In fact, I wouldn't tell our devs about it, so they don't stop converting their deployment systems to Ansible directly, because they will maybe then try to create another intermediate step between their old system and the new system. It's better to force them through. But in actually what I like with Ansible is actually to ramp up all our colleagues to using it. It's not easy. Most of them don't. Some are more puppet people, so they prefer that system. I try to write as many playbooks for them as I can as handy tools, but it's not always working. What's the objection they have besides... Is there another objection they have to Ansible besides that they know puppet? The object... Yeah, culture, and I usually find that it took me weeks to learn puppet, or I often find people are like, oh, right, you just told me I should spend an afternoon learning this thing. They told me that about puppet. Then they told me that about Sheffan. You're swearing that I should believe you. I'm not that dumb. I'm not going to believe you about it being easy. And I'm like, well, to be fair, I'm a super skeptical person, so I probably wouldn't believe me either. But yeah, they've spent a bunch of time investing and learning something, and like, you know, why... Although one of the things that... I mean, you said you want to get people to do that. One of the things I often tell people is, you know, people are scared. They're like, God, I'm going to have to rewrite everything I already have. And it's like, well, I understand that you have like this gigantic stack of, you know, crap your boss wants to do probably with half the number of people that you had last year. You don't really have to... Like, you could, you know, if it continues to work, like, by God, just like leave it there and, you know, start attacking the rest of your stuff that can help you, you know, move further into the future, hopefully a little bit more easily than you were able to in the past. Maybe the main problem or the main issues they see is because they have a centralized system with the Puppet Master, where they know all the stuff is and it's a philosophical thing. They know that they can poke that place and everything is there and no other... I have a central repository for the... Yeah. It's an easy way to solve a series of problems and then now that they're stuck in it, they don't want to pull out to just be more rigorous on another system because this is what it ends up to be. They need to be more rigorous with the inventory to make sure that everything that they want to have is already there. In our case currently, they spin up a VM and they install the Puppet agent and by some kind of magic information it ends up happening in the Puppet Master. We had a bare metal process that just... the node went in the Puppet, as soon as it was done installing, Puppet did everything and it was just magic. And if you have a system that works, it's hard to throw it all away. But Puppet didn't get us all the way there, which is why we used Ansible. And the Ansible was growing, right? The amount of Ansible we were using was growing. The tasks we were using it for was growing. Maybe my take is currently we need both because some of the stuff is easier for some of the devs to do through the recipes that they already have. So why rewrite things that work? The best config management system is the one you have that works. Indeed. I want to go back up because people are doing some notes here in inventory management. And so I'm curious about some of these notes here. Who put the... Yeah, who's the sort of gray blue using custom Python scripts to fetch an inventory from OS APIs? Okay. So what's your use case for that? I was actually trying to ask all of you where are you getting your inventory files because the only option that ever worked for me was writing a Python script that connects to Google, AWS, or any other API and generate the inventory file. I don't say dynamically, just when I need it. I know that Ansible has some option for dynamic inventories, but this never worked for me, never, ever. We always use Python script. We use Python scripts. I don't know if someone has a different solution. So all of you just write scripts that fetch... For us, we're on OpenStack, so our bare metal, we don't really have any. But we use the Shade inventory, I think. It works fine. So Shade includes an OpenStack inventory that just uses your cloud's YAML and asks your cloud, what are all my hosts? Oh yeah, neat, so now it's included. I will say that my preference is for static inventory because there is a step between it appearing in my cloud and it being useful to my orchestration. So even though I have used dynamic inventory to success, I actually don't necessarily like it because often it's like, oh, now exploding, exploding. Oops, who added a system that's not ready? It's halfway through, it's in the wrong tenant. There's a bunch of problems that come from it. Generating might actually be better in some cases. We had that same problem, especially when you're adding a lot of nodes and one sort of halfway there and it reboots and then the deploy process determines that's a failure. So we just, I mean, we're gonna go back to Puppet, but we just created a Puppet fact. You could probably just drop a file on the box and use it as a marker, but the fact was like, this machine is deploy ready. And we would only pull inventory. If a machine wasn't not deploy ready, we would say we are skipping this box because either it's out of service or whatever reason and we would skip it and not abort the deploy. And, right, you would just print a warning in the inventory generation script but not actually put it in the inventory. And that worked well for, we would have nodes, you know, we have a bad disk, might have a node offline for two weeks and you'd want to remove it from inventory for a while. I seem to recall that Bifrost has some interesting things that it can do with inventory stuff as well. I mean, it's still just like a flat inventory file but does some more, you know, takes the things out of it and can apply it in interesting ways. Sometimes you want to get some inventory and make an image for that machine because it has a certain kind of processor, stuff like that. Bifrost is... So who wrote this way of doing it with zone transfers? Can you want to share that with us a little bit? Just tell us more about that. You're not volunteered. I'm not guilty of this script in our place but I was just trying to answer where does it come from because I didn't write it so I assumed it was just someone Googling finding it but it looks to be something that he wrote. And it's not, I mean, looking at it now, well, 100 lines could probably be more dense but so yeah, it's pretty straightforward. We rely on DNS or all our sites have different, basically whatever you call it in DNS so we can identify by basically a prefix or so and then it will... The script basically goes to the reasonable DNS server and does the transfer which we have opened up for and then we get at all our infrastructure hosts below the open stack itself. So I don't know, maybe we can post it somewhere. Yeah, that'd be good. So Robin was mentioning Bifrost. Does anybody use Bifrost in here and do you use the inventory for things outside of Bifrost? By any chance? So yeah, you have this sort of... first you add it to Bifrost so that... or you use Discover to use the discovery mechanism or you just manually add them to Ironic. So you use Bifrost which for those who don't know is a set of Ansible playbooks to drive Ironic without the rest of open stack around although you can also drive I think Neutron now too which is pretty cool. What's that? Yes, it is. But I talked to Julia a lot so I think it's real. But Bifrost, you intake it and then there's this kind of this phase everyone goes through with real hardware which is to turn it into your hardware and then you... it sounds like you're generating an inventory from Ironic's API to then feed into your Ansible... your Kala Ansible to deploy open stack. Makes sense. When we were deploying open stack using Ursula and Bifrost at IBM we had two Bifrosts. We had the intake one which was just... somebody just gave us a hundred machines on an L2. Let's isolate it, discover it, work it in and then the second we would actually move the nodes over to the real Bifrost which managed the production workloads and was much more locked down. Moral of the story, use Bifrost. It's cool. It is a magical rainbow cat. So one more quick topic is Major Hayden reminding us we're missing beer so we'll go fast. What are people doing about Ansible speed? And I'll say it started to become an issue for us our deployment got slow. It wasn't really Ansible's fault but we tried to tweak what we could do in Ansible. Parallelization is the obvious first thing. There's some other tweaks you can do in terms of like some of the SSH configuration options in Ansible. So I was curious what else people were doing here. Version 2, not access slower. See there's a reason to stay on 1.9. So I'll say within open stack Ansible since we have LXC we switched a lot of things to use LXC Attach rather than going through SSH that saved a little bit of time. Some of the other stuff we found if we did like for example building on an Etsy host file using fax from individual nodes we found that took a very long time if you had 500,000 plus nodes that you were deploying to. So we started doing some things very carefully around that to like generate all that ahead of time and then deploy it out to the nodes. I have to go back and look at some of the jobs. Trying to think what else. Not too much around SSH optimization or anything like that. Just try to limit plays as much as possible to only go to the nodes that they really need to go to so you're not wasting lunchtime. Just kind of rolling around with like skip tasks everywhere. So we try to find a lot of ways to what is it Ansible 2.2 or 2.1 or whatever that let you actually have the you can include a task but the actual task name could be templated so you could have a variable. So we used to do say like hey if we have yum go to this install yum yaml if we have apt go to install apt yaml but instead we just said install underscore and then whatever the Ansible package manager is.yaml and then you skip a whole bunch of tasks and go straight through. So it saved us a tub load of time. Anybody else have good tips? We ran into one where the way we were referencing hostvars ended up with a Cartesian product and so as we added nodes we were getting an exponential explosion of lookup time. So we would have, we were seeing runs go from 15 minutes to 25 minutes to 3 hours to 10 hours by adding 100 nodes at a time because there is a way you can be referencing hostvars that are references to the hostvars and then so you end up basically diving through multiple dicks that are much bigger. Did you guys know you were doing that or was it an accident? It was an accident. Once we stopped doing that and just saved the var once at the beginning we no longer had ridiculous amounts of time just spent basically spinning on the CPU. It's when you abuse hostvars. We also, don't do that. We had also started benchmarking it. Did you guys, Clayton, what was the tool we used to just do the timing? The deployers would tell us it felt slow but no one could ever really, you could see the time in Jenkins but you couldn't really see where the time was being spent. We had a good guess but we used that and we were able to validate that we were making it faster. Similar, Ursula, which was the Ansible we were using, adds some plugins that spit out timings and so we get like the top 100 worst tasks and so we just started knocking those down. Yep, sounds very similar. With those you can actually prove you've improved it rather than just it feels better. Right, it doesn't just feel better. I actually made the top task go from 9,000 seconds to four. Somebody did. So, Robin, I think you ought to give a little wrap-up speech and we can all go have a beer unless someone else has something. Well, I mean, you know, the nice thing about these events is that even though we try to gather people here to share stories is when you go out at night and you go to the little parties or booth crawls or whatever and then you have a beer then, oh boy, then all the things start flowing out. Anyway, is it about that time? I guess it is. I think we have till 10 after so we're just a few minutes early. Oh, well, I guess in advance of that not that I'm going to be running to the bar or anything after being in here all day. I did have a lot of fun today. Who came to other sessions today in here? Was it useful? So this is like the first time that we used to have a collaboration day which was not really a full track. It was like people in a room collaborating which I also find to be valuable but they're like, would you like a track? Sure, like we're going to have communities talk to other communities. That seems so like a normal thing to do that we don't often see being encouraged at large events anymore. So I hope that if you guys thought it was useful or I also think if you guys thought it sucked you should definitely tell the foundation but if you thought it was useful, I thought it was useful. So sending them that feedback I think might help us to get this, tell them, yeah, do that again. That was awesome for everybody. Before you go, if you're still on your laptop, put your name in. If you want to. If you want to, and if someone thinks your idea was great they might be able to find you a little easier if you have a name there. You don't have to. You can be unnamed if you want to. I have no name. Anyway, thank you guys for coming. This has been kind of fun. Boff at the end of the day. There's plenty of other forum sessions and stuff where feedback is useful. Oh, I guess I could type my name in there. Except I'm not typing right now. And I'm out of words. Does anyone else have comments, questions, hugs? Eggs they want to throw outside? Not at me?