 This thing on okay Thank you. You all hear me. All right, right? Everybody hear me? Okay, cuz I can't hear if oh, there we go. So my name is Clint Savage. I've been a scale attendee I think for now 10 years. So I've been enjoying coming here. I've spoken a few times So I want to talk a little bit about what I've been working on I started Recently at Red Hat I've been there now a year and during that time We've been working on a project called Lynch pin the idea behind what we have here is something that's oh, that was loud Is this idea that basically we want to be able to do some continuous integration stuff and be able to provision Anything you want in terms of a cloud? So we're gonna talk about some of the tools that go along with that and I kind of want to introduce it in a way that That maybe is a little bit fun, but also a little bit throwing things previously under the bus So we had this provisioner call provisioning 1.0 and it was written inside Red Hat and it pretty much was pretty powerful Pretty cumbersome, but and it was pretty complex So it had some good stuff in it and it made little clouds is a cloud factory, right? And so they actually had names like CI factory CI op central or whatever, but the thing about it is it was kind of Hard to deal with and I'm just gonna kind of share with you The complexity of it for one here is the installation So we looked at this installation. We said oh, this looks fun, right? And so I'm throwing this under the bus because well I can But let's have a look at that and kind of see what's involved, right? So first We kind of make some assumptions, and I don't want to make those assumptions anymore. So I did things like You know check the version and if this version is this one do this thing And if it's that version do that thing and I keep scrolling right you guys keep seeing that and Then I install more repositories And I do more testing. Oh look I can install the optional repo if you don't have it and The extras repo for you and this is pretty much what Red Hat specific, right? And that was something we didn't want to do in the next version So I wanted to kind of point these things out We wanted to kind of get away from that a little bit When we decided to start doing the new thing so then I don't even know what OSP4 is off hand I think that's for some open stack stuff that we do We have the Apple repo we build for you and this is all just in a install script So we did a whole bunch of testing just to make that work. Oh look libx SLT Python Who knows what beaker is anybody? Hearts, you know what beaker is Okay, at least one person in the room does so this is a another type of provisioner It does physical provisioning and they basically used that for doing a lot of kernel testing in Red Hat And they also use it in Fedora and the Fedora project uses it quite heavily also and we're working to make that larger We checked the Red Hat release see if these things are installed if they're not we install them for you I'm not done yet This is the part that stinks right if you imagine trying to read this and figure it all out you could It's not that difficult Where is the one I like my favorite one? Oh? Now we're gonna install pip or upgrade pip and then install this stuff So it seems like a lot of work right and to do this script and someone had to figure all this out So we didn't really want to do that anymore and beyond that. Oh Maybe I should reload my page hold on one second. Oh, of course the internet is broken Does everybody's broken? Okay, that's awesome because I'm gonna Well, that's gonna be great great cuz I just reloaded the page and it killed it. I Have slides so I guess I can get those but I didn't want to do that if I couldn't help it There's no Awesome, okay, let me pull up my slides, then I guess we'll go with that That was really bad. No, that's what I wanted and I don't know why I did that don't look at my bills Sorry about that So this is gonna be an older version of what I just had but it's pretty much the same thing Now I'm trying to remember how to do the viewing thing F5 maybe Yeah, there we go So in our CLI it was like this and you had all these options, right? so Huge amounts of options and you could use them all on the command line and it became kind of cumbersome if you didn't know them all or you had to look them up and this is just a bunch of bash and A bunch of Python mixed together and some other scripts and try to figure out what to do with it But it was really powerful. It actually had some cool features Mostly it worked in OpenStack worked in Beaker it worked in AWS barely but not really and so we kind of wanted to change that out the other thing that it wasn't Was open source So anybody know who this person is? Yeah, a couple people might know this is my friend Ricky. She she actually works at Red Hat as well, but She was in Australia for a couple months and she found this spider is a little story behind the spider And as she said it was humongous, right? It's like this big around I think and she went to tell her flatmate about it She says this is huge. I don't want to touch it and then the flatmate went over and said oh, that's a teeny one And apparently they get like this big so it's kind of an interesting story to relate But yeah, I'm not interested in that. I'm kind of scared of if it's not open source I really want to make it more flexible for you So we started this project called linchpin as if it were open source and we wanted to make it open source So right out of the box We made happy little clouds So if you guys enjoy that little art, I made all these transitions and stuff And they're all gonna suck now because I can only do it on here But at least I have the slides so what we did is we made it cleaner installation We made it easier to use simple topologies It's a topology is something where we can define all the clouds that we want to use and one of the things that we did with Linchpin that made it nice is you can do multiple clouds at the same time So if you wanted to do AWS and EC and excuse me Google cloud at the same time you could say for instance a simple example here you might have You know some quotas on certain accounts and you don't want to go over it So you might want to use two over here and four over there or something like that We're working on some tools where the security groups will also work together and things like that, but that's not there yet We made it simple and that's one of the goals that we have is it's simple as the best So we made it a very simple command line It's a simple command called linchpin and then it has an up and a down and things like that that are very Vagrant like so you'll if you're familiar vagrant you should be pretty comfortable with this there's a few adjustments, but mostly it's like that We do simple tear down simple provisioning of course and it's really easily easy to extend it if you know Ansible at all You can take one of the Ansible cloud modules Drop it in when make it make a schema for your topology and you can turn it on So it shouldn't take too long if you're a developer and comfortable with Ansible We're trying to make that even simpler actually it's currently a little bit complex But we're hoping to make it simpler in the next thing One of the things that it does that the other provisioner didn't do very well is creating complete and solid inventories for you to use So once you spin off your hosts, it gives you all the IPs and information you need about your hosts You can also provide it and I'll give examples of how to do this you can provide it Say you want to give its variables and assign those to the hosts and things like that You can also do that and it will provision those with that give you the inventory And then you can apply that inventory when you run your Ansible scripts or other things like that And it's a lot more than that too So enter linchpin as I mentioned It's very very powerful tool. It it's kind of this thing that Deadpool loves So my Deadpool memes are gonna be dead now their dank is all get out but we kind of chose Ansible for a lot of reasons right and I updated all these slides and they're gonna be when I give the slides to the Scale folks is will be better, but I apologize that they're not as nice as they were What's that? It did okay. Let's try reloading now Because maybe I can do all my cool slides Because it'd be funner that way. Oh look. I got it. Okay Let's see if it comes back Well, it shows that it's connected now. So hopefully this will load in a second. Yes, sir switch to slow Okay, I like switched into slow It's a good thing. I don't have like, you know, six hours of content. So I'd be worried While we're waiting for that, I'll continue on so one of the things that we think about We wanted to do is like we said we wanted to base it on something Everyone would know or be able to use pretty easily Ansible is a really big project that's grown really fast recently And it's really really useful and if you don't understand what Ansible can do It's it what it basically does is it's a configuration management tool But also provides what's called orchestration That's kind of what we're using this for is to orchestrate your clouds. We're using all of their modules We don't write modules unless they don't have them. We've got a couple that we wrote for them We're hoping to get them into the Ansible contributed modules as well. So that's actually something that we're working on Let's see if this loaded yet. I don't like that one hearts. It doesn't work It went back to fast If it doesn't work after that, I'll just move on So let's talk about the flow of linchpin for a minute because it's kind of useful So what you get is this topology file and I'll show you an example of it in a few minutes and how it works But essentially the idea is that the topology is the piece that you need to tell Lynch pin what you want so say you want, you know, three AWS machines and you want to run EC2 You tell it that if you want OS server for open stack You can tell it that you can tell any type of machine that you want that we support which are are plentiful and I Hopefully have a slide later that I can show you with that as well Then once you do that you run linchpin rise, which is what the command we're currently using We're changing that to linchpin up and then 1.0 version And it will spin up whatever many instances you have I've got an example of three instances here and Then what happens is after it? provisions them It gathers that data through the cloud API pulls that back down and generates an output file. Yeah, I'm still having issues with the network Let me see if it's fixed, but it wasn't a second ago. Yeah, it's still I can try both of them and either work So I'm just using my slides off of my I'm good So so we get this output file and we generate a unique output file based upon your information And so that will be something that you can keep and it'll actually has all of the data from whatever cloud You're on so if you actually need to dig into it You can see the output we have logs for all of this stuff and in a minute I'll actually do a demo and show you guys how this works from those outputs We can provide what's called a layout file The layout file and the output go together and they can generate an ansible inventory based upon what you've done And I mentioned kind of earlier, but I just gave you the flow here So we put in maximum effort here and the main thing is here. We wanted this to be simple, right? So so you know you could use it and it'd be really easy to use So you really only think about two files and those two files are Simplistic enough that they're written in yaml and they don't really have anything else to go with them You can use that inventory then like I mentioned like this So you take that output inventory you take your playbooks that you have and hopefully we're gonna we're gonna work on some things like Inventories for other types of configuration management tools, maybe salt, maybe puppet so you could do it there too Down the road and then after you execute your commands it builds what you have there and in this example I'm building an open-shift cluster So if you know what red hat does they build Clustered tools and open-shift is one way to do that So I have my three nodes in this cluster now and I can actually configure that and that's the example We'll be using today But if you're not comfortable with that you can do it with anything you like any questions so far Everybody happy with this it's pretty simple so far so I mentioned earlier that the installation was hard for the other one We don't have an RPM yet We don't have a Debian package, but we do have pip install so you could actually just type pip install linchpin and it works If you want it in a virtual M. It also works in virtual M So it's literally one command right now, and I'll actually show how that works in a second So here's a topology file. So we have a provisioner in CentOS is CI called Duffy Duffy is similar to beaker in that it uses SC micro or C micro systems that we have that we're laying around And it kick starts these boxes and spins them up and then puts them in a pool for me And then I can ask Duffy for them and it's like an API. I know a bunch of things. So we wrote this Ansible module to do this What it does here is it says hey, I've got this thing called Duffy three node cluster I'm going to be using the Duffy res type and you can see that down there about halfway down You can probably move my mouse there. I might be easier than trying to point it out So right here. I've got a resource type called Duffy and that's actually what's what's defining the Module that you're going to use from that you can give it any name you want when you give those names It's actually going to generate a list of Installations based upon that name. So it would be something like Duffy underscore Or Duffy under Duffy nodes underscore Zero or in one and two because you have three of these nodes You know sets this count there so I can spin up three nodes and they come right up So that's kind of how this would work and then it generates If there's credentials you can also provide credentials with it You can associate credentials based upon how the cloud works So if your cloud like Duffy works just has a simple configuration file I just point it to where that is and it says okay. I know where that is and looks it up With the other ones like clouds.yaml work with open stack or you can use the IMRC stuff for Amazon And they should work pretty straightforward Then you shouldn't have to learn that stuff again again trying to be simple using what you already know Shouldn't have to learn anything new On the other side of that we have the inventory layout I think this is actually more of the power of what Lynchpin can do So we have these general IPs at the very top our general vars Excuse me the very top then we have a list of groups that we want to create and it generates those those based upon the IP information that we gathered in those outputs And it combines these two and from here so in this case I've got One two three hosts if I have more It will actually not generate that inventory because it's trying to base upon how many hosts I provisioned and Then it generates an inventory based upon those so it only would generate a three three node inventory It would not generate a fourth node in this example So I would assign that to groups masters nodes and osv3 for the first host that's provisioned This is top down so as we go we're gonna look at this and say okay I've got Masters I've got a node and I've got a repo host because I'm building a from source open-shift cluster with this Make sense so far It's pretty simple Let's hope that's the hope is it that it stays simple Down below you can kind of see that and I would be on my regular slides You actually would see this scroll down and it shows a bunch of things like group variables and children that you can also assign That's part of what ansible does with its default inventory, so we're mapping the inventory file here to your hosts That's pretty much what that does The last thing that comes with it and this is the thing that brings everything together is our simple pin file So a pin fail pin file basically looks like this it has a target and Then from the target it has your topology in your layout in layout isn't required But you can use that as well and then you just say linchpin up or linchpin rise it currently and it will Look at that topology and turn on the nodes that you need. Oh, do their wire sweet. That'd be awesome Hopefully I can get to that and Then you can kind of see how the topology layout is here at the bottom, right? So it's pretty straightforward. I think it's over here if I make that work. I Apologize for keep doing this. Oh, of course the Wi-Fi is acting like it works now There we go So now we'll skip through all the fun stuff. Yeah, I gotta reload it again, but it should be good Go back to it again. Okay. We'll come back to that later And there's a bunch of new slides and I'll add I'll add that at the end. So here's the CLI Just pretty much all you get if you want more Tell me what you want because I'm really looking for people who are interested in using it We've got it on GitHub right now. So I want to be able to see what's available So we've got Lynch put in it that generates the pin file for you in an example format You can modify it obviously and then you have linchpin rise and linchpin drop Which will be up and down and then their future or up and halt or something like that Well, they'll match probably close to what vagrant does right now We also have a configuration generator Which there's this file called linchpin config if you want to change where certain Variables are when you're developing it you might do linchpin config and generate your own and then modify it We also have the ability to store Topologies offline like so you want to put them in your own get repository or you want to put them on a web server somewhere and download them So that people can use them that's another thing We're going to be able to do is be able to pull those from a repository So you don't have to know everything you just say hey the policies up there. I want to use this one. Yeah How about dynamic inventories? So we generate a dynamic inventory for you It comes out as a static one, but it's essentially a dynamic one based upon what you spun up Is that your question? Yeah, so when it spins it up it can actually do it in two ways he's that he asked Does it look when it what's currently up? Yeah, so it doesn't quite do it like that So he's asking if I looked to see what's on the actual cloud right now We don't assume so we're what we're trying to do is if you've already defined a topology and you've spun them up It won't spin them up again But it will look at the nodes and say oh we already have those running and it won't try to turn them on again And then it'll generate proper outputs and a topology from that. I mean inventory from that if one of those is dead It'll try to turn it on again. It depends on the cloud, but yeah so in Open stack, I think it does spin up a new instance If it can and if it fails it fails fast And that's another thing that's important for us too is that we want to make it Simple and fail quickly so that you know what's going on our error messages are not as good as I'd like them to be We're targeting 1-0 for that and they should be improved to also We're currently at a 0.9 So last they answer your question. Okay. Okay. All right So let's do a little bit of a demonstration and show you how linchpin might work This is a simple example. It's not too hard. It's only got one set of clouds It runs on open stack, but it does give you enough information to kind of feel Comfortable with I think the whole process So let's do that So I use a tool called play it again Sam if you haven't heard of it It's really cool. It just kind of makes it so it's easy to show but I actually ran all of these scripts not here Because the internet sucks, but it actually would make it so that it would work flexibly for us. So when I run The first one I'll show you the demo of the install It's pretty straightforward. So I just type a key on the keyboard and it looks like I'm typing things, but I'm just mashing So we install it in my virtual amp and There's linchpin on I know if you want to just use the without the equals it'll work just fine You can see it's grabbing it from pi pi and installing it and all of its dependencies and everything Sorry It runs on to seven. Yes, it runs on Python to seven. That was the question. He was asking and so that's basically now installed if I wanted to then do some More stuff I can do it a linchpin in it and that will generate So I create a directory and I do hey, I want to And edit and you can see our UI is a little bit need some help There's some spelling errors in there and bad explanations. We have fixed this Already in our develop tree and we're working on putting that in 1.0 as well So yeah played against Sam is just a little recording tool that records what I did and takes my keystrokes And yeah, it runs on Python to seven It's a little Python package. So when I run linchpin in it it generates a linchpin Configuration file and you can see and it also generates your layout for you. So you don't have to do anything. It just creates this And so then after that I can look at the pin file and you can kind of see what it looks like And I have some targets here So I have across two clouds if I wanted to use two different clouds and things like that And I also have that Duffy 3.0 cluster that I mentioned earlier I'm at the end of that one So then I want to do My own right because I want to create my own pin file I don't really want to use what they gave me is just kind of an example So I created a thing called LP demo to and just copied my pin file in there and all of my topologies and layouts So if you have those you could do it this way or you could like do linchpin get and pull down your topologies or you could just have them in a get repository like on your Jenkins or your You know go CD or something and have them pull them in Travis does the same thing and Once I've copied them over I can look at that pin file and it looks like this. It's much simpler It just has a cluster on open stack, which I'll show you the Layout for the other topology for that in the layout for my 3.0 cluster in OpenShift So that looks like this and there's my layouts oops into that one You see I'm just mashing keys right So let's do this one So that's what that looks like So this is our open stack example for a topology and I have named it what test and it's running the OS server It's gonna spin up M1 smalls and three of them if you want to assign a key pair So if you have that inside inside inside of open stack you can use key pairs So a lot of the features that are already there you can just use you don't have to do anything special and then we are using that specific network and Giving it a floating IP and this is the pool that we're getting it from Does that make sense to everybody so far if you're not comfortable with this let me know and we can talk about it Anytime so this is the layout. I mentioned earlier that I couldn't scroll but here I can This is the similar. Whoops. That wasn't so good. Let's try there So we've got the vars at the top and you can see that Dunder IP That actually takes the IP of the host that as it was given and assigns it to the host itself So now we can actually map the IP to a particular host As we go along it'll actually map them properly down the tree So so what that's doing is for each of the hosts it's going to assign Open shift host name equals IP address or host name depending on depending on the cloud some clouds Don't give you IP sometimes. They just give you the host name Like Duffy doesn't give me the host name. You just give the IP is to be the host name Then I can assign the groups I can sign The counts and then for each of the different groups at the bottom. These are generic specific to the group I can set up all the things that I need for My particular project that I'm working on so if I want to map The osv3 group I have some bars that I apply and then at the bottom you can see it has children as well And that's something that open shift uses to make sure that all the variables get assigned to its children And sometimes they're the same sometimes they're not So they would it these children at what master and nodes and not the repo host So let's go ahead and do a little bit more with this now and turn it on So simply put I'm just going to run lynch pin rise And you can see that it does a bunch of stuff with ansible and Then it provisions the machines and this is what it's doing right here Is it's actually contacting the open stack server at red hat which we can't get to any of here if you had your own You can tell it where that is in your configuration file or your Authentication and stuff just does is takes care of that generates the inventory and outputs and stuff right here and it's done So it's pretty fast, right? Wasn't too slow yes Yep, you can answer as a playbook against this and I'll show you that in just a second actually how that works. Yeah Yeah, it's so it is agnostic for any cloud So yeah, so the first question was this just runs and you can spin up your nodes and then you run ansible against it And I answered yes, I forgot to repeat her questions I want to do that and then can you repeat your question again, correct So the question was you're basically it's agnostic and you're you levering leveraging Ansible with lynch pin to spin up whatever cloud you like that's close enough And the answer that is yes, definitely We're we're basically focusing on trying to make the ansible be the power and we're just giving you a simple interface to do that So you could then take your own cloud tooling drop it in there as an ansible module and spin up whatever you'd like and Multiple of those different types too In fact, we so he's asking like security groups firewalls Do we have modules for that and in fact we do and we're working on more so we do support the OS Security groups for open stack and the AWS security groups. We just put them in and oh die oh dot nine So there'll be more as we go along. We just looking for more contributors to help us write those There's only two of us currently writing on the program We've got some part-time developers that have helped us like load the beaker one somebody did The somebody else did the AWS security groups for us already so hopefully there'll be more of that Did you have a question back there? Okay, your arm was up and I wasn't sure I saw you move it and I Yes the documentation is horrible on that, but we do have some information about how to Gently what do you need to generate when it comes back? Basically, it's just a dictionary of what the name of the what the name of the resource is and then a list of your IPs or other information that you want to provide you could provide it all or you can provide very limited I actually did that with Libvert, which will be one of our core modules Libvert only gives you back IPs right now, but it could give you more. I just haven't done much more with that right now I just want to get a proof of concept in place. Yes, sir So the question was are you able to run updates against the resources after you've provisioned them? Yeah, absolutely, but that would actually be something that's separate from linchpin So we give you the inventory file and you can use that to do whatever you want with it And it would run ansible, but you could just open that inventory file look at that and SSH the boxes to Okay. Oh You want to update like a security group? Okay. I suit your saying Yeah, so if you have the same resource name it will associate that with and modify it for you Is that closer to what you're after currently? No But so the question is does it delete security groups or other resources beyond the hosts and and clouds and the answer is currently? No, but there is a talk we were talking about a feature for it for specifically that where you can turn on and off things And it really depends on the actual Way you you set up the schema because we could have it turn there those down And it's just a matter of deciding how to do that and that's a development decision And so one of the things we want to do is set up a policy rule like Like a set of policies for that thing in that way you could modify those if you wanted so that you could have it tear up Tear down and spin up at whatever you'd like at any time and modify it in any way if you modify the one you're doing It'll automatically modify it now what update it, but it doesn't tear them down right now for liberate specifically we have Network and storage notes that we provide the storage pool and that doesn't get torn down at all and it probably never will But it's more of a decision that we made Intentionally, but I think that would be cool to have a policy that does that so that's a good feature that we don't have yet But we're talking about trying to figure out how to do any other questions so I think that the Terraform does something slightly different But I think it's just as powerful and just as cool that I think the main difference is that we are trying to make it as simple as possible Terraform is it's pretty complex, but it's also pretty powerful So we're trying to be simple make it spin up something and tear it down in a CI environment And we're really not thinking about production environments necessarily, but it does work that way and I think it's pretty powerful We also don't Assume things about the systems that we're trying to turn on if it doesn't work we tell you that's it I think with Terraform they can actually look at stuff and value and evaluate what they're doing first We are working on a topology generator Something I'll talk about later that what you can do is introspect the cloud You're working on and get a list of all the nodes that you're trying to spin up and things like that So that's more of our our goal I think is to more give you that functionality where you can see what's there and then use that as information to Generate your topology and then you can spin up whatever else on another cloud One of the internal departments to red hat when I was in Bernaud's Presenting this at DevConf came to me and said we'd like that because then we can pretty much tear down our entire open-stack Infrastructure and rebuild it and then rebuild all our nodes security groups and users and all of that We're not there yet, but that's it. That's our long-term like plan probably somewhere in the range of 2.0 Which is probably late late this year the help The long different like explanation for simple question. Okay, great So let's look at that inventory that we generated since we're back to that and You can kind of see that it's been mapping things. So for each of the different hosts. They're in the groups They're probably supposed to be in here's my general Scroll a little bit so this is for those vars for that group There's the children above it and I don't know why it formats it this way He I need to work on fixing the order and make it cleaner, but it does work pretty well So those variables that we saw earlier like OpenShift public hostname It assigned the IP to that host and this is the name of that host that was in there If you had a hostname it would map that as well then OpenShift uses those variables when it's creating The master for instance and the nodes pieces So it's actually something they use when they do the OpenShift ansible installation Which is what we're using to do this so then I can take and run an ansible script against this particular inventory spin up those nodes and actually have a little bit of Proof that that worked so there's nova list at the bottom and it actually shows the three nodes with their names as they are Listed there and I mentioned earlier. It would be like a to E test zero one and two and that's based upon what's in the topology So if you have that in the topology with the name The resource group name and then the resource name itself it'll generate it just like that So they're predictable and they should come out in that same order every time So we can ping that box of course If anybody was paying attention earlier I can SSH to that box you might have noticed that I had a key I associated with it There's my CI factory key that I associated with those nodes So I could SSH into that box we have it set up so that root allows us in I can check to see how long it's been up and You know check things like this if I wanted to run ansible script against it It would just be you know ansible playbook dash I and then pass this inventory file And now I know and I'm going to predictably know where that's going to be to it's gonna be always in that inventories directory And then I think that's the end of this one. So any questions about any of that? It's pretty straightforward I think I've answered a lot of questions about the good stuff too. So of course we can turn it off. So let's do that So that's just linchpin drop it'll become linchpin up and linchpin down probably Or destroy because we're actually talking about being able to halt on certain clouds And that'll be a feature based upon your cloud some clouds will do it some clouds won't So it tears those down removes them from The open stack in infrastructure you're in and the nice thing about this is now We've just been able to tear build something up based upon a specific image Then add all of our features to it run the things we want and tear them back down in a short period of time And it's only to two real commands plus your ansible playbook. Yes, sir So the question was can I leverage this for pushing out container packages? When you say that do you mean like do I want to provision Docker or are you saying you want to provision a bunch of nodes and then put Docker on them Yeah, and so it depends on how you want to approach it so the answer that question is yes and We we could definitely make a provisioner for like Docker or one of the things that we do with this is atomic OpenShift, which is essentially a Docker cloud like Docker swarm Atomic does that exact thing so atomic hosts can spin up a bunch of nodes and then put your Docker instances on it and Turn them on and off however you want and scale up and scale down It's totally doable You could do that basically right now by spinning up three nodes and Installing the things you want and from your playbook or we could write a module that does that with just the Docker module Basically, that's in ansible right now. We don't have that in our in there yet, but it's a feature We're talking about so it's in the future for sure So it depends on how you want to do it if you want to do it with like spinning up your own nodes and Installing a playbook to make it work then it works right now So it's just depends on how you want to approach it. Yes, sir How do you persist your data? Right So what you're basically asking me is if I say for instance, I've got a mysql database and I want to adopt I want it on this specific specific node and I want to be able to make sure that it stays there with the data That I have well that's up to you as the as the CI person All we're really doing is providing you notes to do that in terms of CI We always really want a clean environment to start with so you would want to build up your mysql database again Or you could provide an image that has that on it the way you want it and then we could use that image inside open Stack to spin it up as an example and it would turn on the right things So we're not assuming things again. We're making it simple so you can say I Want to make a mysql image on open stack or AWS or whatever and then it turns it on automatically and you Pre-configured all of that and then you just turn on that particular node and you define that in the topology Yeah, very likely and you and we do have AWS s3 supported as well So that was a good question he asked so I would want a database that's two terabytes. How do I do that? I need s3 or something well We actually have an s3 topology that you can apply and it's done similarly and informed to what we just saw a minute Get with open stack server Hopefully that answers another question So I can show that it tore it down There won't be any output and That's the end of that one. So then I think the last is there any more? No, that's it. So that's all my demos for now and let's see here. What was the end of it? Let's see if I can show the demo or the other stuff that I had. We'll skip through some of this Okay, so I have some the core cloud modules is our target for the future is it looks like this So we're gonna have open stack Google Cloud Amazon web services and Libvert as our core and then we'll have some pieces that do instead of just Anything else they'll be add-ons that you can install separately. So we're looking at Beaker we have a Duffy and rack space currently which is essentially open stack, but it's actually specific to rack space As mentioned earlier a lot of the AWS stuff is already in place like security groups S3 all that's actually already there. I just loop lumped them together, but that would be in the core Then we're looking at things like, you know, Digital Ocean Azure Linode. Oh And rack spaces there twice which I'm gonna fix because that's an error And then OpenShift as you kind of mentioned, sir I'm over here about Docker nodes and things like that OpenShift Ansible does that same sort of thing As well as atomic host or Docker swarm where it's kind of already there But you have to get another playbook to run it against which is already provided by another provider And we didn't necessarily want to do that But we could make an OpenShift module in Ansible and just use it So it just really depends on how we want to approach it and there's a lot of flexibility here Which is why these are additional modules and more of like a contribute contributed module structure Where you can make your own and then drop them into the schema and it just works Or at least that's the goal Skip that see that All right, so we mentioned a little bit what's going on here. I've got a simple CLI We've got packages that are available now on Pi Pi. We also I do something that I didn't mention earlier, which is asynchronous provisioning So if you want to turn these all on at once and not wait you can do that So say for instance, you have a cloud on AWS You have a cloud on Google and you have one on rack space and you want to turn them all at the same time It can go off and do that and then what it does is sits there and waits for them to provision and collects the output at The end so they're doing it all at the same time rather than in serial, which is really nice And then it'll generate outputs for those generate inventories as provided and then you can use that in the same way However, you'd like and it actually generates one for each cloud or an all version depending on how you want it At the moment, it's doing all of it so you can collect those and then use which whenever you'd like One of the things I noted here is our performance We've actually improved our performance quite a bit. It was originally pretty slow you can kind of compare we do actually are slower in some cases, but we're pretty comparable and It's fast enough Google Cloud If you look at the data, I actually have for Google Cloud, which is not up here It's horrible because it uses LibCloud and LibCloud thinks that it knows everything So we're gonna probably work on maybe making a module and submitting it to Ansible for Google Cloud directly with their API, but I don't know how we're gonna approach that quite yet So that's something we're thinking about as well AWS security groups are already available Is that what your question was? Yeah, so if you spin up like an AWS Instance and then you want also to do I think we call it AWS EC2. I don't know why we called it that but That's an EC2 instance. You can also apply a security group to that instance already And you can it'll generate the security group first and then you can you know associate them Yeah, it's already there. Yep. I lumped them together as I was mentioning earlier AWS is as a whole is pretty much complete not everything's there But a good portion of it and one of the things that we're like we mentioned earlier We're trying to do is things like OpenSec security groups that work together with AWS security groups. I apologize for that ringing Hopefully that helps so you can basically have them talk to each other across Clouds which would be really cool. It's not there yet, but it's something we're looking into for sure So hopefully this helps kind of give a clue as to what we're doing what we're doing now I mentioned in fact, you can see security get provisioning is on the top right or bottom right of that second list And then here's what we're kind of targeting some things that I'm thinking about some things that we'd like to see more of a Vagrant plug-in. So why why not just give you guys vagrant to? So we're gonna try to write something that works in vagrant and it'll call Lynch pin and provision the things that you want as well One thing I actually didn't write up here. How many of you guys have heard of Zool before? Anybody so Zool is actually an open-stack project that does about I think they said something like 1.2 kilo jobs an hour Which is a lot of jobs and it's a CI Tooling that they're working on because they have like six projects that they work on they want them all to be Interdependent and it's a gating CI that works really really well So there they have a tool inside that called note pool that spins up currently only open-stack Lynch pins gonna be a the piece that provides all the other clouds in the future. That's our plan anyway Hopefully that will come to fruition this year Another thing that we're looking at is hooks So think about when you do like before you provision what you might want to do after you provision What you might want to do before you tear down what you might want to do and after you tear down so for instance a simple example is say you want to spin up a Jenkins slave and master and You've got this master running now and you want to spin up three more slaves When you're done with them if you don't disconnect them the master thinks they're there still and tries to use them So you could write a little script Enansible put it as a hook and it would before you tear down It would do a pre pre tear down hook and it would call that disconnect it from Jenkins and then tear it down for you So you don't have to write extra things to make it work. We would put them probably provide those Mostly and not worry about things like that if you want to add stuff like that. It's pretty easy to do as well Cloud bursting somebody mentioned earlier. They might want to scale up. This is the idea where you can say I Want to run? You know ten of them instead of three of them. It'll provision now seven more and turn them on free. We can do that now Cloud bursting down. We can't do so we're working on that Closqloss cloud clustering I mentioned earlier we can we're working on that topology generation I also mentioned which is something I think will be really really cool and really really powerful That'll probably give you predictability like Like Terraform does and and cloud forms and things like that cloud formation One of the things that Jenkins provides is this thing called a workspace If you don't know what Jenkins does with this It's basically like where the environment lands when you do get checkouts and things like that So we're gonna provide a similar concept so that you can just use Jenkins with this Or other tools that use the workspace concept Like I mentioned cleaner process and then we have a github link. Let me click on it and see if it'll go there now I'll show you that Yeah, so right here is our future stuff on github. So if you're ever interested in github too, it's on linchpin right there And that should be helpful as well So it's a CentOS pathsig is the name of it and then we work on with the centOS team and that's actually where this all lands And I've been working on that for about eight months. Oops. Let's not look at Reddit right now So what else do you guys want to see that's one of the questions? I had for you guys anybody have thoughts? Yes a Little bit, but I think we're actually we're really close to production ready for it like we're we're using it internally already and some of our like our CI ops team is using it directly. We have sent to us using it Fedora's planning and using it in their future We'd like to get more people involved open stacks going to be using it on their open stack infra So it's solid enough to work and it works pretty well We want lots more and we want some people to help us contribute and we definitely need more contributors We've been doing this with you know, three four people and I'd like to get you know a good team and I think it's a really good project and it's fun and You know see how it goes Is that how answer your question? Okay anything else? You guys want to see yeah? Cloud Foundry you want to have Cloud Foundry in linchpin? Oh A migration path. Um, yeah, actually a migration path is something that we have talked about That's actually a really great idea being able to trans translate between things like Cloud Foundry and Was the other one that was mentioned earlier? My brain just lost it Terraform and other ones like that would be really great. So that's actually a feature. I need to write down So I wasn't thinking about it from that perspective, but that's great. Thank you anything else you guys can think of that you might want or interested in Okay So that's pretty much the end of it If you're interested in the project, there's the link to it at these slides will be provided by scale. They're already there I think the linchpin read the docs is available and we actually been updating that recently And if you ever need to get ahold of anybody on it, I'm actually on Twitter at her low and I won't people I go by her low online anywhere. So if you've looking for me. I'm her long get hub I'm her low on IRC. I'm her low on Twitter pretty much anywhere or add like a one at the end or a zero If it makes me have six characters like Gmail so Any other questions or comments or thoughts? Great. Well, thank you guys. I appreciate it Hey, we'll do that. Yeah, they must have somebody that wanted to talk Super quiet one two one two three four They're mostly wearing this for the cameras anyway one two three four zero Yeah Oh, there there goes there goes there we go, but you guys can just hear me because I'm right here two seconds. Sorry We were debating on cancelling because we didn't make the We didn't make the guidebook But we've got some folks here now. So we'll get started So First thanks for coming I was gonna mention to most people, but you are probably already know that somebody on my team right next door is giving a talk on a little bit more lower level Actually talking tools probably will show code snippets Docker and all that kind of stuff So if that's what you're looking for that's the place you should be This is more of a higher level make the business case How we think about the cloud journey overall an enterprise. So if that's more your jam, you're in the right place We couldn't get on the Wi-Fi. So we had to do a computer swap a roof. So if things go horribly wrong, that's what happened So I'm Justin Dean. I'm the SVP of platform and technical operations at Ticketmaster I've been there a couple years now me personally I'm passionate about building high-performance organizations and I'm super nerdy about doing things like automating beer and barbecue So if you're into that kind of stuff Check out pitmaster pie on github and you can also hit me up on Twitter Justin M Dean So, yeah, the main topic That I'll be covering is you know the big picture about our our journey our cloud native transformation as a company and You know just giving some insight into you know a large Enterprise and how we view the world of going into a public cloud and the challenges related to an enterprise and how we're solving those those challenges for us and Overall that may give you some hints or a blueprint or a business case or whatever in in your own enterprise Because you probably have similar challenges and you know along the way we've we've we've been betting heavy on Kubernetes and Coro s so we'll talk a little bit about those details and partnerships and whatnot So first and foremost I Always like to show our speeds and feed slide because most people don't realize Or did you probably don't put Ticketmaster in the classic? Enterprise space most people think of Ticketmaster as the website Where you may or may not have got those tickets you wanted and they don't realize You know the full breadth and ecosystem around live nation large live entertainment Enterprise combined with our web scale challenges, right? But if you look at some of the numbers and and you know we we you know we own Buildings we own venues we own like stuff physical assets that you know add to the complexity of an enterprise What one of the the stats that I was thinking about was we probably spend more Fencing and porta-potty toilets than most shops spend on their IT Infrastructure and ecosystem right when you think about the level of festivals and all that stuff And what does it take on to put on a lullapalooza in the middle of the desert or those kinds of things? so all of the the substrate to run that stuff is is classic enterprise stuff and you know in our Ticketmaster business we do 25 billion in transactions on the platforms that we put out there and we're 41 years old We're almost 41 So, you know you can imagine what that you know how that adds to our our complexity and our software stack At our root You know this we're a We're live entertainment company, and this is you know by far my favorite slide. I try to show it off everywhere I go You know just the simple sentence in the in the middle right somewhere in the world every 20 minutes is a live nation event Maybe just think about the scale of that right every 20 minutes We're the the software the products the platforms the ecosystems the workforce that we have To be able to put that on across the planet is just you know, it's pretty amazing How many people went to Friday night's party that we hosted at the castle, so if you didn't You missed a pretty good time It was it was We we were at capacity right in that in and the reason why I bring it up is because you know there There was a lot of people Fanatically dancing at this thing right and when our tagline is you know a team as a company has been you know We power unforgettable moments of joy and one thing I saw on Friday night was while the band was playing a Guy he was huge. He'd be seven feet, and I've seen him around the conference He got up on stage And he tore his shirt off and he had it He was wearing a gnome linux shirt, and he just ripped it off Shredded it in half and jumped off the stage, and I was thinking that so that is an unforgettable moment of joy that we like powered and In no time in human history has that ever happened like did somebody ripping off a gnome linux shirt in half So I thought I would share that but anyways So in order to do all that we have a fair amount of tech complexity to deal with as I'm sure you know most of your companies have So we've been around 40 40 plus years. We have what we refer to internally as a tech museum of sorts We have software from every era, so you know from the first ticketing engine That we built inside of of ax We still have a lot of that, you know currently we we still our primary ticketing engine that we sell our Main stuff like Adele or Bruno Mars right traverses our Vax system We've you know we've emulated it. We've made it work in a cloud ecosystem But you know it's still assembly. We still have teams of people who work on this stuff, right? And then we've got everything everything from 40 years ago to now Including you know what what used to be a state-of-the-art modern cloud private on-prem cloud, you know eight to ten years ago You know that's essentially runs the lion's share of most things in the last decade You know it's it that's that's your classic large Zen on Zen hypervisors Running VMs we have a massive addiction to NFS as a company. We use NFS wildly Which is as we start to go, you know more and more and more down a cloud native a mutable path is a bad thing Right it used to be clever ten years ago. It's like hey, this is a really sweet solution to have Slash software and have access to everything. It's really bad when you try to get off of that as a company It's it becomes hard so, you know all of this this Tech that we have makes it Makes it really really really challenging for us to move fast right and and for us to do things like hey tomorrow Let's all be inside of a public cloud in immutable containers inside of blah blah blah the latest and greatest right On top of that we've got you know business complexities of running these products We have 21 full-blown ticketing systems and hundreds of products Like that run on top of this ecosystem that we have right hundreds and hundreds of fourteen hundredish product and tech folks you know tens of thousands of VMs and We have some unique issues issues Like the 15,000 network endpoints that probably aren't as super common to most enterprises out there But given the fact that when you buy a ticket somewhere and you go to any you know You go to a hockey game scan the ticket like that Sound has to go and traverse our ecosystem some way circuits back to home Validate your ticket make sure it's unique make sure it it wasn't scalped and all that kind of stuff So there's a lot of complexity to deal with and then on top of that. It's growing so fast It's hard to keep up with right so one of the unique challenges that we deal with is our Our on sales and I'm sure you've seen a if you've seen any talk with us Somebody always talks about it because it's a it's a part of our culture and a part of our DNA We have to deal with this The the selling these mega concerts That have huge demand, right? Like Bruno Mars recently just you know crushed our systems and it crushes the you know record-breaking sales and all that kind of stuff Because everybody wants to go to it and there's very limited supply Right. We have the amount of seats is the amount of seats that the artist decides and the venues have and it's unique inventory It's not like Amazon books where they just make more and the the the market price is wildly different from the The the price that they go on sale for so we create this Phenomenon where essentially there's enough arbitrage in the game for all of the computers on the internet to come and Crush our site to try to buy those tickets so they can make more money on resale in a resale marketplace, right? I'm sure you guys have been there seen that So there's an entire market marketplace wrapped into that but what it means for us is that we have a Self-inflicted DDoS hitting us as a business function that our systems and platforms that we build have to be able to weather So another huge factor that we have is in a real driver for for this is the competition Right, and I'm sure just like every other company in your companies, right? If you make money in some way shape or form, there's other people who would like to get that money from you So, you know It's a unique position that we have where we're the we're the market leader in the space And we have just a huge huge huge ecosystem Everything from our you know live entertainment stuff and putting on productions and shows and all that kind of stuff all the way through The ticketing setups and the new market places emerging so what What we what we have though is is we have a huge surface area of business opportunities Right, and then you know in modern-day companies right those each one of these little fish or whatever That's the smaller competition that could be two people right and that could be a college project in their dorm Where they literally could start picking away at one of the little lines of business and and market that we occupy and dominate And for us it's a huge problem, right? We're we're looking to grow market share and grow our Position in the market not have it erode away right so and in your a large 40 year old enterprise It's extremely hard to pivot on a dime right this new thing hits us Hey, there's money going away from you know over here this new way of you know arbitrage or whatever We can't turn the company around and you know thousands of people and what they're working on in overnight, which is a huge problem So It's recap that part of the story so public company market pressure highly competitive We have a fair amount of legacy technology some of it's not ready for containers if you can imagine that We have huge technical debt and we're paying high interest rates on that debt in the form of operations And what it takes to in terms of money and mind share to operate the company We have you know huge scale and complexity challenges, and we have Black Friday every day so so The way we think about that You know and where and where we need to go with it It's all roads have led us to we must get faster as a company like fast speed is the name of the game and Velocity is key So the way we structure sort of the business case and and help get the whole company aligned on what we're doing and and and want to Change us into a fast-moving software company right because you know you don't just wake up and everybody says hey This is a great idea Justin. We should do all the cloud native stuff. You're talking about So the business case essentially is if we want more revenue more market share We need better products and better features. We need more of them Right. So in order to do that we have to deliver products faster in order to do that in our In our company with the dependencies and the all the things that I just Described in the last few slides all roads have led us to we need autonomous product teams And in order for that to be a reality For us we need to simplify our platform our platform was too complicated for product teams to Actually be able to own and operate their software So here's the journey that we've we've been on you know, I'll talk through it a little bit So it's a it's a multi-year Journey and it really started say in 2013 and I'll walk through this in a second, but if you look at the if you look at the timeline here We are sort of it, you know, we're obviously in 2017 But we are somewhere in the public cloud and get everything into Kubernetes space And I'll show you how we came how we got there So the first most important thing that we did was as a company as a leadership team as a senior leadership team at the board levels Right came to a conclusion that you either disrupt yourself or you're getting disrupted Every industry every company that's the state of reality So you can pretend like it doesn't exist and then uber is gonna wake up with your business, right or You know, I don't want to mention competitors, but I'm sure you've heard of them, right? So that's real so our From the you know CEO of live nation on down, right? 100% on board with this and we're the drivers of it So it's pretty unique stance, right? Most companies actually more grassroots, right? Ours was all so gave us sort of a perfect perfect storm so the first thing we ended up doing as a company is we went through what we're calling the lean transformation and Really what that means is we did a lot of reorganizing and reorganizing and putting our Changing from functional silo based teams to teams we built a lot of these delivery teams So cross functional teams where you have, you know business tech and product all together on one mission, right? Instead of, you know, any classical organization typically the way those groups talk to each other or they don't Or if they do talk to each other, it's through JIRA tickets or something like that, right? And there's no real connection to what they're delivering. So, you know same same here So we got these these we created these a ton of them 65 plus delivery teams Focus them on delivering some value, right? Like instead of saying like you're on this product team You know, you're on the GOIP team is I'm on the team That's the opening up our APIs for public consumption, right? And that the whole team was was rallied around you're here to deliver a business Outcome right so that's why I highlighted the outcomes is greater than output Because you know every company everybody's really busy It doesn't matter. I don't care if you're busy. Are you delivering outcomes? So really getting laser focused on delivering these outcomes and what we found is that We got really good at writing software We didn't get all that much better at delivering the software like actually making money in prod, right? There's a huge difference between it works works on my laptop. It works in non-prod wherever it works And it's making money market, you know market conditions are better for us right now, right? And what we found is that most of the time all roads led with I need ops, right? Blocked by ops And we had 25 to our 225 people in ops at this time and this is like straight up. Just you know tech ops my team, right? So We could either went and like what do we need 500 people like what's an appropriate amount of ops people before? We have to solve this problem, right? So this led us to Really focusing on okay, so we need to take these teams and make them autonomous. You don't need to all roads lead tops So this is effectively was our DevOps 2.0 era and Anyone who you know we had the DevOps days just two days ago And a lot of people still talking about the challenges. There's no way around it. It's hard. It's hard as an enterprise It's a cultural shift You don't just wake up and have everybody DevOpsing so we took some fairly, you know clear calculated Actions to force it along along with the cultural stuff, right? But we moved a lot of our application support teams out of the tech ops organization the platform organization and Embedded them in those delivery teams like you are now part that's your mission your mission is life Is not to be just you know the operational end of it. It's to help open up those APIs or it's to help whatever so Then we also did the same for our systems engineers, you know Be a part of those teams and help them to deliver faster if at any point a ticket or a the team The product team that's responsible for delivering a product if they have to put in a Jira ticket That's a fail right Jira tickets are evil and Something has broken whether it's people process procedure access or something right so We started to get to the point where we don't celebrate tickets, you know, hey, we closed a hundred tickets. I'm like, oh, that's a hundred times We failed So, you know part of that was building a lot of self-service tooling And really trying to you know getting conviction about how we need to simplify everything at the end of this But the goal of this phase of our journey was really to create autonomous self-sufficient Micro-businesses these teams are micro business you build it you own it you run it you operate it you deploy it You manage it you monitor it you pay your duty it you answer the pay-your-duty quickly, right? And and you monetize it and you figure out how to have a good P&L like that's It's a holistic micro business and if you look at the the little fish thing It's the same one from a few slides ago. That's our competition So that was kind of the goal is how do we build these teams enough to where they look like our competition and they can outpace them? So this helped a lot we got way better at Delivering software to market, but it didn't solve everything right. We still had a lot of complexity in our ecosystem So just some stats That that even gave us more conviction to keep doing our journey right was 30 to 50% of our time is wasted Moving bits around the system, you know just you know and you see it anywhere any shop you talk to right people are like Well, you know it's got I've got to get it to prod and I've got to do this thing and non-proud and there's just a lot of Talk about moving stuff around right. It's not easy. We have you know hundreds and hundreds of products We had over 150 custom-built ways to Release products Everything from the you know some teams are all you know container-based Docker-based immutable in the cloud Whatever whatever they thought was the best thing to do at the time Some of them are you know, can you move this tarpaul around for us and explode it and all that like so That was not good. We should have one way not 150 and then the other thing was we were caught 50% of our outages were mainly preventable and because of one and two Probably related to when we touch it. We broke it So we The first thing we did was like let's capture this, you know how it's very hard to say You know in any large company if you went to ten different people and said How are you guys doing the DevOps right? I mean imagine the answers right like this even our own community here Can't like a line on like what the actual definition means or whatever no different in the company and then you remove People ten layers deep from it and they really have no idea what the DevOps means or whatever it's just So we spent a fair amount of time putting our brains into what does it actually mean and how do we quantify it? with the number one purpose of Providing a clear assessment and a score and a roadmap for teams to improve Right, so we figured if they knew What they should be doing they would probably do it and then if we quantified it to the point where if they Can talk about it in a manner that product managers and the business understands They probably will start doing the right things So I'll show you it in a second But the other thing is that we recently open sourced this Internally we have the website that has like fancy dashboards and a web app and all that We open sourced the the data so a spreadsheet and all this information about how to how to use this maturity model that we built It's what it is is essentially it's Like it's Capabilities that we value and we think these are the capabilities how you how well you code build how release operate optimize about 40 of them and With between a one and a four of how well you do it written in pretty plain English of what you do, right? I can yeah, I'll show you an example one in a second, but it's a really great way to have your team self-assess and you know depending on your culture or whatever you can make it a scoreboard where you Make it a board of shame or you could you know Where our goal was to you know create a culture where people wanted to increase it But not as a shame back or anything like that right like we wanted people to be very open with their scores You know if you're a one you're a one that's what it is, right? But even if even if you're one and you want to be a three You know you can embed things into your next iteration or your next sprint to get you there, right? So so here's an example of of what it looks like. I know this is an eye chart And I don't even think it's legible in this format, but An example of you know on the right here You have all these all these different bar charts. That's on this is one product specifically So one app one app team and Then there's a trend line, which is really hard to see but it's a green that's where the rest of the company is So you can see sort of how do I stack up? You know and and oh these all you know Everyone else is doing the DevOps better than we are or whatever So give you an example, I don't know if you could read this but on on the capability around DevOps practices, right? You have everything from I can't read it either, but I'm assuming it says something like Ops does everything for me. I put in Jira tickets and they deploy it themselves all the way to a four where it's probably like I Environments in production are fully controlled and owned by the contributors building the product including alerts and Issue escalations, right? So we want everybody trending towards a four for that one That one actually we got pretty high rankings across the company. It's the highest score in all the capabilities Probably because the slide that I showed you before we'll reinvest it heavily on forcing a DevOps culture So and then this is what it if you if you go and actually download it on I don't know if you saw the link But it's tech dot ticketmaster.com And it may be like the second blog down And you'll get essentially a spreadsheet that that gives you all of the the capabilities along with how to score it And then an interesting thing that we added into it was a minimum cloud readiness score so if you've ever went and had the joy of Taking software that may not be ready for the cloud and stuffing it into a cloud or having teams migrate it, right? You realize that you have to do some things differently like you may not have Whatever the case may be it's a high-performance block storage with infinite amount of IOPS and Bare metal boxes with redundancy across the board that never fail, right? Like you have to be able to handle When your VM or when your EC2 instance disappears is your app Like auto-healing immediately, right and depending on when it was built. It probably is not so we came up And we can we we graded or we made a minimum score for each of the 40 that you have to meet before you actually start deploying anything into Into a cloud world, right? So it basically gave the company homework You know before you start dockerizing and whatever make sure you're here, right? And it quantified it to the point where you don't have to just have the conversation verbally a thousand times So I definitely recommend you checking it out Let us know if you You know, hi, it's weird. I don't know how you open source a spreadsheet But if you've got questions comments or whatever like definitely hit us up because we're interested in it and Interested to continue to iterate on it. So that took us to our public cloud journey So as a company we needed a catalyst to force ourselves to do everything we needed to do to make our products better and make them higher velocity for us we chose the public cloud as that catalyst and really, you know, I Me personally I couldn't care who we pay rent to right whether it's HP or Dell on prem and our data centers or whether we pay it to whatever cloud provider It's it's not really the secret sauce right the secret sauce is the fact that it forced us We're you know, we're viewing it as this giant carbon filter for our company With the the the value we got to get out of going cloud native and pushing into a public cloud is it's gonna it's forcing deep introspection of every single product every single piece of software and making them Fix all of the not fix the wrong word making them cloud native to start with so fixing the illities so Scalability reliability deployability monitor ability all that stuff in the spreadsheets Actually do that and actually invest in it because you can't live in a public cloud ecosystem without that right? It's a big giant ocean out there and you're gonna get hit in the face like when s3 stops s3 or just like whatever, right? So this is the huge carbon filter that's forcing An up leveling across the company and in order to get there We took a probably a slightly different approach than most Instead of having a cloud migration team and a centralized team that helps figure out like how we're gonna migrate everything and take the Bull and take the bull by the horns Move the company to the cloud. We created a cloud enablement team So if you remember what we want is autonomous teams that own and operate their software So the best way to do that is have them do it themselves So this team was essentially a group where that figured out. What's the right in-state architecture? How do we cloud natify and make it practical? What are the the tools needed and you know, what's? Knowing our ecosystem, you know coming up with the best in-state architecture and then how do we? translate that architecture to the whole company at large so that they can Actually get there so it's a lot of It's a lot of pattern matching and pattern building things like well, I use NFS To share an index between my search head nodes or whatever, right? Got a hundred servers. They all rely as soon as an index pops up. It's just magically there and they all get it well, so, you know Spoiler alert, but we chose containers not VMs and we're not bringing NFS with us so That is a pattern where they had to solve like how would you do this right and come up with this is the right way That we think you should do it like maybe I don't know I think they solve that one with a piece of Lambda code that got let everybody know really quickly. Hey, there's a change and then go suck it up on s3 so You know key point here is they were all about building the frameworks the tooling And the models and patterns and then getting that out to the rest of the company so that the rest of the company Product teams can deliver their self to the cloud Here was essentially the Sorry, I can't see where it is. Oh Yeah, yeah, so we So so the question was it was did the The cloud enablement team, you know infuse themselves with the rest of the company essentially Yes, so we're working. We're still trying to scale this thing across the you know hundreds and hundreds and hundreds of developers and So we're doing a culminate what what we're doing right now is a lot of we're calling dojos Where we'll go to a location and we own a lot of buildings across the world And we'll spend a couple weeks on location and so combined with a lot of self-learning. Here's the toolkits Here's the documentation. Here's how you do it with even we've got repos set up to where the code is already ready to go And it's like here's the base of it. So just insert your stuff So, you know, we're trying to emit out of this everything you need But a lot of people what they need is somebody just to show up and say It's time, you know, turn email off and Open up a terminal and let's do it. So we're doing that with what we're calling dojos and like we'll have all these product teams and and And have, you know tens or 20s or hundreds of people in a room and for Some of some of these products it takes one day to move them over to get them containerized and get them moving some take a lot but So yeah, so that's sort of the way it's being done now And we're also doing like I was mentioning the systems engineering teams instead of being, you know back in ops home base We've deployed them out to the product teams So then once they understand how all the tooling work, then they become the you know in team ambassador for that team So yeah, so so essentially what we came up with is for our cloud enablement methods, right seven simple steps I didn't I use the simple because they seem simple until they're not simple sometimes So number one containerize your app and we've we've chosen core OS is our container operating system of choice Seems pretty simple Terraform your infrastructure so it has to be inside of code to Yeah instrument everything in rich telemetry and this is by far my favorite of these things Because of the no SSH or RDP. So we made the decision Early on that everything needs to be immutable and it needs to come from software and it needs to come from a deployment and not a Copy or not out like a hand logged into something and did did something Because that forces a High degree of maturity you have to bring everything you need inside of your app with you right including the telemetry How do you know what's going on? Are you logging it correctly? Is there some what happens when one fails? Can you just redeploy it on the fly does it redeploy itself? Do you have enough synthetic monitoring? So if you do that right then then you would effectively make it easier to operate and it costs less to operate in terms of humans and It if it's in this fashion, it's usually automatable. So a catastrophic failure doesn't feel catastrophic If it just magically is running again, right versus trying to get the right person on call and hand doing something so if you have the choice if you can eliminate SSH and RDP from going into your cloud world, I would highly suggest doing it It's the most painful also, right? And that's the one where people are will have the pitchforks. I mean like this is ridiculous That and then obviously, you know make sure that you're paying attention to security and for us a big portion of tenants was Design has shared nothing architecture. So because of our huge NFS heroin addiction we didn't want to bring and propagate that In a cloud world, especially because it probably will get a lot more expensive or disappear I mean imagine for S3 for you know, that those kinds of things have that would have been bad and building for availability So all that to say is we had this team we had this process we had an architecture that we or we had the methods for getting there and You know, and we had over a thousand pages of documentation and patterns and just we had everything you could possibly ask for To to go full full bore. So we were like just open up the gates push the whole like push the whole ticketmaster and It didn't go exactly like that So we learned a few things with with with this approach is You know one was you know in a public in the in the public cloud ecosystem We're we're Amazon I almost forgot our Contractually obligated way of phrasing it preferred cloud vendors Amazon So, you know, they have a rich set of primitives API's to use We've got hundreds of Developers out there who with different Different walks of life. They've been a ticketmaster of different arrows different everything right different tech stacks everything from, you know, I write assembly to mod pearl to Java to Whatever power shell And what we're effectively asking them is we need all of you guys whose businesses to deliver business functionality We need you to learn these API's Learn the primitives of the cloud learn how to build infrastructure Learn how to make it resilient right learn how to you know Learn how to and then Terraform was our our Engine of choice to deploy this infrastructure learn Terraform and then learn how to deliver that infrastructure using Terraform Right, and then what we started to learn quickly is Amazon at least has over 65,000 permutations and Combinations of how to deploy something So it's like well, you know what instant sizes. I don't know. There's like 30 or 40 of them Combined with all the other different flavors of stuff that we're delivering if you want some s3 in there If you want some lambda if you want some discs whatever right so we have 6499 999 ways to get it wrong and we were exercising all of that and Where I started to realize we were doing it wrong or there's a problem here was you know not only the the velocity of the whole thing, but Started to view Terraform as a programmatic checkout engine this is a way for Distributed development teams and individual developers to add to cart and go and purchase stuff on the company's behalf with No form of like validation or you know, there's no central purchasing mechanism Right, it would basically like the programmatic way to buy servers at scale and then you know I started seeing you know for me. I care about the checkbook Okay, this is kind of a problem because then what they would do is and they would stuff that into a code repo somewhere That's just like it never happened So, you know overall we started learning like we're you know We're getting stuff there. We're getting stuff into the cloud But you know there's a huge learning curve for development teams and product teams those delivery teams We don't want them doing that right we want them delivering business products It was hard for them, you know to manage distributed systems at scale any of them that needed like You know if you need a lot of resiliency and you need a global resiliency and all that kind of stuff like those They're not experts at that We're asking the wrong people to optimize infrastructure All right, so and we're asking hundreds of teams of the wrong people to do that and You know and we're baking purchasing decisions into distributed Terraform codes, right? So which is we've learned that that's kind of a bad thing to do so Overall big picture as we started to realize we are spending too much time writing software to just deploy software Instead of writing software to add to make money right so that led us to our Container orchestration Phase so we had already knew you know since since we were sort of go-docker go Container we already realized that at some point this is we have to we have to wrangle this in and we need some orchestration As everyone's probably knows like it's pretty early times or it was early times at least In this space, so it was hard to know which one's production ready So we but we definitely knew that we needed to orchestrate our container world and we needed to abstract some of the complexities away from the development teams, so we We were already doing a lot of POCs and looking at all the different options out there We ultimately chose Kubernetes In you know a few reasons here on this amazing amount of text on the slide But we we started to see that you know one it was organically popping up all over the company, right? That's a good sign to start with you know people like it and it feels like it's solving problems already We felt like it was further ahead than other other options and you know One of the the things that is Super noticeable about the Kubernetes community is just how vibrant it is right it has like this hockey stick growth Right, and and it just feels like you're taking something that comes out of a large scale bulletproof Battle-tested Google Nucleus ecosystem and now it's becoming more mainstream in the public felt like we really couldn't go wrong there And the team really liked the API as the primitives that it had and specifically the self-healing stuff just worked and it worked really well and The the what sealed the deal for us. I think was that we we organically ended up using it to solve a pretty main major problem we had So in our we use open TSTB time series data pretty heavily For our you know visibility telemetry into how our systems are performing and all different Uses of it when we need it the most is When we're having on sales so and this is you know, we're selling Bruno Mars. We're getting crushed with an internet load of traffic and Unfortunately a lot of our supporting systems like our monitoring systems are also under attack getting crushed, right? So and we don't think about them like we think about the website, right? 90% of website or or mindshare goes into how do I scale the website and make it's ready for Bruno Mars? No one's sitting around thinking how do I scale open TSTB, right? But when it goes away It is a huge problem But now we we lose so much telemetry and in a non-sale condition for us It's sub minute decisions are being made right by as a As a business right like you're how many how many people are how many tickets are left? Should we open more shows should be not or whatever? So we can't have our our telemetry just explode and disappear during you know our Super Bowl And open TSTB is a hard one to scale correctly. We have a whole team on it that's been working on it blah blah It's it's just it's always challenging So the team fixed it and I I knew it was fixed because I stopped getting nasty emails, right? They went from like five or something a week to Yeah, it seems to be a stable and they they fixed it by deploying it under Kubernetes and using relying on the the self-healing properties of it to you know It didn't necessarily fix the stability problems within the systems. It made the stability problem within this the system Go away and and it you know Temporary temporarily solved it right but for us it was like hey this stuff works and the team Closest to being our cluster team to start with is using it to solve their own problems. So it gave us a lot of conviction So simplification with Kubernetes, you know like I showed you this Terraform stuff on the side While we thought was a pretty elegant solution. It's still kind of complex. So our overarching ecosystem looks like so now we have public cloud primitives in that cloud space that we rely on We deploy Kubernetes on top of that. We have a As you can see it, but we have a dedicated cluster ops team now, right? So we're relying on experts to Optimize deploy manage Kubernetes on behalf of the company and they are the ones that are in charge of optimizing infrastructure buying infrastructure Now instead of the asking everybody else to figure out do I need a to T2 micro or am I spending too much on this, right? and then that allows us to abstract the complexities make simple primitives API is available for Making our developers look the same from a systematic perspective, right? This is how you deploy your application so the goal or Kubernetes for us allows us to abstract Complexities so we fired up a team. We got serious about it a while back So we still have a full, you know full distributed team six engineers that work on essentially nothing other than the Kubernetes ecosystem And we have a logo. I don't know if you can see it, but that means it's real so so we put together this team and You know Given the the state of the container world a lot is still not figured out at an enterprise level, right? So this team is responsible for building the the overarching in-state, you know Officially supported Kubernetes clusters for the company So it's not just like deploy whatever is vanilla and you know deal with the you know Weirdness that or the deficiencies it might have their job is to figure out what we need to make it robust enterprise And then if we don't have it build it or if the community doesn't have it figure out how to make it happen So I'm sure those words on this screen if you can see them I come up anytime anybody wants to deploy this stuff at scale, right? Do we use flannel? Do we use calico? How many at CD nodes linker D anybody like for medias like all these these words come up more questions and answers? So for us what we did how we handled this as an enterprise is We went out to the community which is the most important part of you know any of these Open-source worlds that you're going to deploy at an enterprise scale So the community is awesome So we went out and spent as much time with the experts right spend time with Coro s with Kelsey I tower with Joseph Jackson Apprenta, you know and and really started mapping out our architecture what we're looking for and Comparing contrasting making sense of what's the right, you know Architecture to go after and then we you know ensure that our teams are attending conferences And 10 to we're here doing this one right like and really just being part of the community and joining the SIGs We recently joined the CNCF So we're we're trying to ensure that we're We're helping embrace enterprise needs and push them forward where we can for this whole ecosystem So all that to say is we learned a lot right took months and months to do all that right And then we the way we work ticket masters we do these things called inception So we did these two-day full-on everyone stakeholders to map out. What are all the milestones? What are we going to work on what are those six people and all the stakeholders going to work on? How do we deploy this came up with all these milestones which I won't talk about and then they started a bunch of work and then all the questions just kept coming right and then we It's like you dive a little bit deeper and then there's another question right and then Especially with related to the enterprise stuff like how do you have what's the best way to handle off? What's ingress is a big problem for us? Traffic and whatnot. I'll talk about in a second, but and then we needed to accelerate right because you know, we also have things like Contracts and space and power and renewals and gear and there's money out there that could care less about where you are on a journey So it forces timelines So for us what we ended up doing was we needed to accelerate So we we strategically partnered with with Coro s and I'll talk through why But there was some conscious sort of decisions that we made was you know rather than spend a lot of our effort building Things on top of Kubernetes that that we need for the enterprise space Let's go to the experts and especially ones like Coro s who have tectonic, which is essentially Kubernetes for the enterprise Already done and then we can spend our time You know battle proving some of these with them or whatnot and then adding things that are gaps or deltas and Hopefully we spend most of our time or we're trying to spend most of our time focusing on ticket master value, right? so we ended up doing a partnership deal and You know, we chose tectonic as our our implementation of choice You know number one big thing for me that was important was it's pretty much vanilla upstream Kubernetes, right? So we absolutely don't want to be locked into some weird version of something that we're stuck with forever And now we got you know, we're on the hook with a vendor for eternity, right? So they are hardcore open source use upstream whatever so felt that was pretty confident But the most important reason was it gave us immediate enterprise level confidence, right? like we're in the business of providing live entertainment and ticketing and Not in the business of having a whole staff of people who can operate at, you know, kernel debug level of Kubernetes stuff So I didn't want to staff that or hire it and I didn't want to wait for Us to get there before we can go and hit prod running So partnering with a company that can provide that was was really helpful so we could you know not waste time and accelerate things The other thing that we we made sense was We were partnering with people who have the the expertise and the wisdom to help like to find help us Ensure that we're on the right The right best practices for how we're deploying this stuff because the ecosystem is pretty large And you saw that question of the things on the That one that with the red words that said for me there's link or D and all that kind that that slide Those were actually questions that we sent over on day one like that all the text in the background that you saw I mean, it's just millions of questions in the ecosystem So partnering with somebody who can say use this one. Don't use this one use this one You know, it's super helpful and then the other big part is They are very very very very hot on the self-driving and self-hosted stuff, which is you know keyword for automatic updates so if you Have been around a while you've probably deployed stuff or your company's deployed stuff and Whatever version that it went out with on every piece of software every library every whatever It's probably what it is right now. Maybe right? I mean, it's really hard to go and keep things updated and it requires a Large amount of work and outages and all that kind of stuff to to keep things updated so we're no different and The fact that their stuff is you know, they try to force the framework where it's automatic updates on core OS and it's automatic updates on the tectonic Kubernetes management plane You know, we feel like is a really good strategic move that way We don't have to paint the Golden Gate Bridge all the time, right? Like we've you know, we're updating the 409 database server 499 upgrade today and so waste of time So the other big reason why we went in we did a partnership deal And if you have the resources, I would you know, obviously advocate on doing a partnership deal instead of just have a vendor We bought stuff relationship is so that we can provide or that we have a lot of influence on the roadmaps and we have a lot of influence on you know, in particular hopefully The problems that we're solving together Make it into the up script upstream, you know vanilla stuff, right in core OS It's a large contributor in the Kubernetes ecosystem So if there's if they're working with us, we solve a problem there's a good chance that it's going to end up in the upstream right and For for us is we really really really didn't want to end up on Ticketmaster net ease or whatever like in some fork of something that now we have to staff and support for eternity So the more vanilla and upstream that we we can make happen is a huge benefit for us Examples of this is our ingress problem. So If you don't know what ingress is it's essentially how do you get traffic into your Kubernetes clusters and For most for smaller implementations, it's not a big deal. It handles it, you know, there's native stuff to handle it It becomes a big deal when You have a massive amount of traffic, you know, like and you know our on sales the internet load-worthy traffic And how do we get that traffic from the internet? into AWS Into Kubernetes clusters and fan it out to where it needs to go without just being a huge mess, right? And so the ingress space is There's some there's some to be desired out there in that space So this is one of the areas where we're jointly working together and have been writing some some software to help solve it for us with a keen eye on Don't solve it for Ticketmaster solve it for enterprise web scale stuff and then feed that stuff upstream. So, you know This this for us is is so we've done a fair amount of work to really integrate the ingress stuff with AWS so and being able to talk to the AWS services natively And then handle the work work workloads separation issues handle TLS at scale and then so we're we're hopefully Soon we'll be pushing this up into a you know upstream requests and hopefully it will become the Offering in the vanilla Kubernetes stuff at some point And that way we don't have to be in Ticketnet ingress stuff But it's great to have a partner that you were actually working with on that Yeah, so Because they don't handle so that when we put Kubernetes on top of Amazon It's they're completely blind to it like there's just they don't they don't have an offering right they might have an offering It's not at the scale of which we need So we're installing their Kubernetes ecosystem on top of their platform their platform is has no idea that it's happening So they don't have the right You know so so the real problem with ingress for us right is right now what we would have Pre this ecosystem is we would have a fleet of load balancers You know like and we would be you know doing this traffic balancing splitting at a sick level That doesn't really exist and we're missing a lot of the hooks that say use ALBs correctly or use them in this manner from the Kubernetes ingress controllers So so that's the pieces that we're trying to fix so ultimately if this you know makes it all the way It'll be like okay. So Kubernetes now works pretty decently with Amazon at scale or we have a time issue. Oh, wow I'm sorry. Do I have to stop right now or can I I think we wasted too much time in the beginning So I don't know I could probably try to wrap this up in five minutes if you guys want to hang Okay, sorry. So other things we're working on Pod to pod networking with you know deep ACLs and things that look like they would in a traditional non-kubernetes world working on on implementing flannel and calico to use things like the Kubernetes labels and Cider ranges to effectively give us what what would look like, you know ACLs Another area where we're we're diving a little deeper with the chorus team is is The self-hosted Kubernetes itself. So like I was saying, you know It's a little bit scary to think that in fraud your Kubernetes management plane might just update itself on the fly So the piece that we're you know the piece that scares me about this is is the line where it says Creating disaster recovery scenarios and runbooks So I want the next time we talk to it did not to say that we want to say that it's automated and works without humans And runbooks, but that's kind of the state of some of this stuff, right? It's it's it's a little bit cutting-edge for for large enterprises And then tons and tons and tons of work that we're continuing to work on I'll just highlight a couple of them that that interests me the most or You know continuing to manage AWS services from Kubernetes native itself so that we don't have to have You know this other way of doing things the more stuff we can pull out of Terraform and put into the Kubernetes Native the better the easier it will become Detailed billing and charge back models inside of Kubernetes. I may be the only one who cares about this But other people who have large cloud bills to pay are gonna care about this So, you know because we've went from this model where You know the the teams can buy resources on their own. We it's a little easier to build them back, right? When we abstract it all and all the resources just become the resources that my team builds for the company We're back into the how do you build that out, right? And Kubernetes is that doesn't have an effective billing model to be able to do that So hopefully we'll that's that's literally on the core OS roadmap with us So maybe in a two or three sprints. We'll have something and maybe it'll make it upstream And then the other area we're looking in is is Kubernetes as RVM management plane for things that may take a while to go down the full container route And it would love to get out of the business of having multiple different cloud management planes to worry about and we don't do a lot of you know crazy advanced Cloud things like dealing with the storages or any of that stuff. So it's it's perfectly feasible and we're we've got some small working prototypes of things like being able to suck out a VM put it in the Kubernetes management plane and Just trick the networking enough to where it works and the VM gets a real network that it can talk on the network and all that kind of stuff So it's it's that's a TBD thing So I'll share this story quickly with you guys, but it's a it's a pretty big win in terms of a Team that is embrace Kubernetes and they're a modern team. They're the team that's building our new website platform And it's a it's a large undertaking It's you know multi-year project and a couple hundred people working on this thing, right? And it's slowly eating components of our website away. So it's not a we cut over to the new website It's functionality will just magically start working under this so they they were already inside of Docker inside of AWS and Sort of autonomous right before you know, they started from a modern era So they had an advantage over a lot of the other teams, but they had cobbled together the deployment Methods, right? It was like the Python Boto scripts and Amazon stuff cloud some cloud formation all kinds of stuff and it was still 20 minutes to deploy and sometimes it would fail and Well, that's not bad right 20 minutes. That's not the end of the world when you have a couple hundred people and It's obviously a waste a time waster and it's a time waster across tons of people and they have low confidence like what we found is, you know, this team went over they started using Kubernetes and The deployment time went to 60 seconds or less and it works every single time But it's high high degree of confidence and what it did for that team was it changed the dynamic of It changed the culture dynamic to where they stopped talking about deployments They stopped talking about pipelines like that's not a conversation anymore and they focus all their attention on delivering whatever features they were going to deliver and then They started creating the what we're calling daily daily delivery culture. So it's a stand-up in the morning What are you guys gonna build? What am I gonna build? I'm gonna do this feature I think there's some money there we can make blah blah blah I'm gonna do an AB test and then stand up in the evening where they talk about the thing that they deployed in Or the thing the feature that happened in our ecosystem and the result, right? so it allowed that to happen without without talking about the If you're gonna deploy something that day and you have it, you know Say you deploy it and then you got to deploy it again or three times to make sure you did it right or whatever It's not you might have an excuse of like how you couldn't get it to prod so That that that really sort of is a perfect use case for us and showed us the value and you know Continues to give us commitment for where we're going Ultimately the big thing that we're trying to do right is is we have an amazing company We have a bunch of people who are really great at creating things and innovating and making things and visionaries We're using all of our cloud native journey and the things that we've you know led us up into the Kubernetes space to Give them the time and the headspace to focus on you know creating right and not Moving bits around right so ultimately we're trying to let our makers make right that's that our biggest asset as a company So quick recap I know we've run out of time where we've ran out of time So we're using Kubernetes to abstract our infrastructure Abstracted complexities of our infrastructure. We have a small high performance cluster team Who's really good at cluster stuff so that everyone else doesn't have to be and we're trying to stop wasting effort writing software to deploy Software and we're trying to give time to our makers so that they can make Obviously we are hiring. We're trying to scale the team so there's a definitely just Find somebody who's wearing one of these shirts if you're interested or hit us up whenever I'm Justin in Dean on Twitter We're always available, but thank you very much. I'm glad we ended up doing this talk even though We are concerned if anyone wanted to hear it any questions comments concerns. I Believe so as soon as we email them to her though the slides will be available Any more questions? Yeah, good. I would say that no, I think that there's no Well, I've I've never heard of any use any case where a large enterprise has been around for a while Hasn't had to invest pretty have heavy mindshare and Be very forceful and strategic about you know, we're gonna DevOps and it What you what we don't want or you know, what can what can what I think can happen if you let it is DevOps be a tool talk Oh, we just need the right tools. It's like we have all the tools any tool imaginable We have it doesn't change a thing if we don't change the culture So it's really and it takes a while to change the culture, you know, you can't just say Everyone needs to DevOps now Here's all the stuff start doing it when you're fundamentally changing the the way their job works And especially if you have teams that have been, you know Doing that in a certain way for years or in some case decades like we have a team that's been You know 41 years the team is you know writing assembly together, right and and the concept of You know a lot of these the way this works They can easily say I know we're not doing that because of blah blah blah. This is a new fangled thing or whatever, right? So it takes a little while and what we really found was When people see it and they see the benefit of it themselves in whatever way or whatever meeting That's when it clicks and that's when it happens and takes shapes When what doesn't work is for me to preach or any executives in the company to say we're gonna do the DevOps in the cloud Or they're like whatever right like I mean, that's just human nature, but the first time where they see their Co-worker deploy something via Slack in one second and not even look at anything So it's just working. I'll know if it doesn't work because I'll just get a page and they're like Oh, well, I got to go to the Jira ticket and it's not gonna be a week or whatever, right? So the first time that happens and it's like well Damn, why don't I do that? Hopefully that answered the question All right. Well, thank you very much Awesome, awesome. Thank you guys. Thank you and thank you for coming here for this presentation on these notes So sunny Sunday afternoon and you know, it's share your passion for automation and Today I'll talk about Stackstorm what it is I'll venture a demo and We'll talk more about where this kind of automation applies Sounds good Let us go. So when I usually talk about Stackstorm, I say well, this is an open source event-driven automation platform for DevOps But if I do it here with you today on the Sunday, not so sunny afternoon You'll pull out your bullshit bingo card in me So instead I'll be telling you what Stackstorm is like and It is like EFTT if this and that do guys know this service. Yeah, this is the service which my mom can use to build automation and it's when something triggers run this action and Stackstorm is like this, but for DevOps where everything is called the rule is a YAML file And it has a trigger a criteria in the action We of course have a nice UI, but it is just to hammer home the same point that When an event triggers You run an action. I think you done Any questions? What is the demo right? Okay, there are more details So let's talk in more details figure out what the ingredients and then jump into the demo Stackstorm sits on top of your existing infrastructure Clouds Applications tools may be in the business processes and provides a higher level integration and orchestration That means there are sensors They're responsible for inbound integration It's essentially a piece of the Python code that you write That goes into the southbound API and figures out if a van happens We support pulling sensors and push sensors and the responsibility of the sensor is to figure out if the van happens It might be an Agios or Zabix event or it might be the temperature in this room and one this happens It forms and dispatch the trigger Actions on the other hand are responsible for outgoing integrations and again You can run an action in Python, but unlike sensors action can be anything Python on the platform will give a little bit more of convenience tools for the developers But if you have a bash script Turning that into the action is just a metro of using the metadata to specify here are the Positional arguments here are the key value arguments and we give them the description We give them the thing and then you register the action and suddenly this action has the help the CLI The API and it is even seen in UI and you can even use it in the chat ops and I'll get to this Many actions, but probably 2500 actions already written for us. So anything that you can think of running a local command run SSH command Integrating with many of the tools that we picture down there are readily available and You can run your own and for some actions You don't even need to write any quote if you need to run the command. It's just metadata and see this is an SSH action And I want to run that This particular command on the SSH Stexter will take care of running that On the remote hosts including carrying out some information there cleaning up after a self Or if you already have salt or Ansible and it handles the remote execution you can integrate with them Workflows are used to wire atomic actions together They provide the logic and they provide the data carrying from one domain to another domain is Jason and rules as we saw Where the trigger with an action? so let's Take a look on a simple example and that will be an outer mediation example say we have a web service and It's running out of the disk space that actually causes downtime more often that we all care to admit and We get the monitoring system say sensor That says well here is a low disk space event on the web service Goes to the atomation and the atomation fires a set of rules which check if this is something that is known And if this is something which known it runs the workflow to clean up Say log rotation is broken, and I need to clean up the logs. I just this is an old thing I don't need to wake you up And if there is something that Like we cannot know then we say victor ops or like ops gene the friends which we had here And call them in kind of your body need to wake up and take a real look of what's going on Obviously You won't be waking up to these guys as often the more automation you can do Let's switch gears and show some stuff for real and then we'll return to talking How I'm doing so far, okay? All right You're not leaving Okay, so yes, please Sensors are from From multiple devices But we are not the monitoring system and we're not one of the complex systems that do the correlation So rules are deliberately simple if this happens then that we don't do the rules Then if happen this happens and this happens within the 10 seconds then do that like for these I have some recommendations of the system that do it Okay, so like again, like we're not the monitoring tool that we had the tool that wires things together like Yeah, I'll let's let's put the questions to the talking part Let's pray the demo gods that things work The main CLI to stack storm is st2. So it does many things say we have the Monitoring system here sense in this case, which watch is the web 3 0 3 0 0 1 service and so far it's doing all right. So we already have as to 2 trigger list a bunch of triggers set up here and some of the defaults triggers like web hooks or chron timers and some of them the Triggers that trigger on the action completion but among them we have these sensor event handler and That means that we can use that in the rules It's to rule least let's list the rules and indeed we have the rule that Uses these disk remediation and cleans up the disk on the critical monitoring event. That's a good time to switch to actually to the UI and See the rules here. So not now You know, like there is some default rules some default rules not that interesting not that interesting. Aha Here is the rule and as it says it says When the sensor event handler handles fires this action We can see the nice UI, but we can also see that as a code And what's important and it will be returning to this point very often This is the code on the disk in a particular occasion and this is the same code on github And everything everything that we do is written as a code in the UI and the API is just night to wrap up on this one so They're all obviously when the rule does that so when The disk and we're running out of disk space will launch the action you can see that it uses the payload in the criteria and Then users obviously ginger templates to pass the parameters of the payload Into the action as action input parameters So the action itself is this space from ideation is So there is an action and this action again This happens to be a workflow and you can see this workflow But you know to make it more fun. We'll use the nice visual representation on the workflow the workflow designer I'm not using the conference internet. I'm tethering from my phone, but you know Somehow he doesn't like that much either so Here is what we do We first sons the check Then check the directory size and you can see that you know like it's for in fact, this is the code as Always the code is the master and everything else is just the eye candy on top of that then we remove the files and then we see Validate that the files are properly removed we post success to slack or if anything goes wrong We'll wake you up So To start that there is an action here that actually resets the demo It creates a large file on the disk But you know like I can run it from here. I'll can run that from the command line. I want to make it more fun How many of you are familiar with chat ops? All right now we're talking and walls of you who are not familiar with chat ops I'll say and I'll just show that this is the way to interact with your data center or infrastructure through the bot so You are not talking to the infrastructure directly You are telling the bot to do things from your behalf and what the bot can do for us Okay, quite a few things and one which it can I'm sorry if it is not very readable. Let me increase the font It's better alright, so And one many things it can actually do the demo reset disk space, so let's just ask the bot to do that for us demo reset disk space on web 301 did I get it right? All right Okay, it says okay everything for you and if I click on this I Go and I see that there is an action That's going on right now and by now the demo has been reset We good. Oh, you know, I wanted to show that But as we can see the outer mediation has already started and has already finished So I meant to start with ginger to see that Demo disk space went red and then we came here and we stashed that but by the time We're here again This has been already cleaned up and next time Since we'll check for that event will be gone. So we just run that as We were supposed to first we silence the check we did the deer size remove the files for the data size Post on slack and when I go to slack it says alright there was a critical alert and Then a demo bot has pruned that for this you can click on this and enjoy the results and if everything went wrong Then it would want it to be drops or obs gene or someone else and wake up with that You might have seen That in our rules We had What is this? These rules looks interesting Are you are actually the internet is killing me? So These rules says that when there is a tweet that matches text storm We drop that to our internal channel and mind you if you tweet about stick storm right now, we have a special Twitter late channel here and Something will show up right here as well in the history. So please do so and retain yourself I'm switching to the talking part Now and any questions around the demo before I switch? alright, so With that type of automation what can be automated? Well, the easier answer is what cannot be It's like for the script and problem bias, but here are just few things What has been currently automated with stax storm are? security responses security orchestration there is a number of few customers of ours who do the virus scan and shut down the networking ports or remove the Machine from the server all together when it doesn't fit or they reconfigure the firewall rules or Build the relatively complex workflows to honeypot the VM under question Complex deployments when we build stick storm we thought that it is a tool for out remediation Surprisingly, we begin to be used where the deployment is so complex that the normal tools are probably too opinionated or too tight for the type of things network remediation the link goes down You know Splunker system log tells you that Check if you have enough capacity shut down the link reboot the link that type of things open stack text storm is widely used open stack for say Remediation or like the M over creation on the hardware failure Service remediation. I'll have I'll hit some of the examples of that more and in terms of what we want to be Remediation is what's reasonable? We don't want to automate everything, but there are certain things that are irritating Which happens often and that doesn't require too much effort to automate and then these are the target for the automation So well, it's a lot of things where a stick storm is currently being used I'll just highlight few that I find personally interesting first is out remediation and all Hyperscale teams have figured out that they cannot deal with the amount of things that go wrong manually and They have built solutions for that the most notorious on that is Facebook and these guys three years ago on SRA con they say that we're saving 13,000 68 680 hours that was exact number because they measure a thing of equivalent of main labor a day at Facebook and These guys recently presented another meet-up Outer mediation meet-up in San Francisco and they brought the new numbers and they Automatically handle 94% of events and many of them are actually Real remediation and they say that the 6% that still hits us is waste too much for our team So the big ones cannot do that without us, but then there are even smaller ones Mirantis implemented integration between Zabix and stack storm at simantec to open the open source Open stack clouds and they were presenting that in Barcelona and you know, it's the title of the presentation was funny Sleep better at night. Why this is funny because this is the title all the Netflix presentation as well Like everyone that cares about us sleeping better at night. So this is probably the most Well-known Implementation of stack storm Netflix runs the Winston, which is out remediation as a service inside Netflix This text storm is a foundation on that and they built a pretty awesome Application on top of the platform and they drive it home. We say like look it reduces the time to resolution Substantially even in the cases where we wake a person up And give them a rich context of what needs to be out remediated So besides we sometimes handle the incident in computer time, not a human time when it gets fixed And it causes less downtime. Obviously, we don't burn the team out So integration This is the guy from stack storm community enter inshah. He is running Pretty much everything in Dimensional data and they have a zoo where they have Every tool and when it comes to integration does that look familiar when you begin to do the integration Every monitoring tool has an integration point with pretty much every action You begin to put the actions there and every other tool has the kind of integration and then Quickly enough with another tools your automation begins to look like spaghetti and in addition there are some legacy applications problem is When you ask like where is exactly is my automation? It's very difficult It's kind of it's like everyone is talking to everyone versus hub or bus architecture and When things go wrong People need to turn automation off and when you have this you don't know where to start It's really difficult to control. So this guy is actually using stack storm and their big contributors to stack storm Another interesting integration thing is how we ourselves use text or we do the evaluation version for a stex storm enterprise And it's an interesting overplay of the multiple services. So we have a website We have the kind of active campaign that sends emails and manage the mail campaign We have package cloud We have stex storm we have internal tools and to wire this all together into a simple task Just generate me license and on this evolution license generate me a key and this key send me to the mail and invite the guy to slack and In 30 days, please remove this key and remove that everywhere and send the mail that's you know It's kind of your license and the wire this very interesting application too. So that's general IT So like you can think of stex storm as duct tape for that center and you know, we have Creative applications to the duct tape in our normal life. So Stex storm has been creatively applied to many of the interesting things. This is a fun stuff In train of things is kind of when you begin to look at what we do Like one other thing like a laffy is, you know, Tesla is kind of current horn automation So the guy actually starts with just horning his Tesla and then he ended up doing pretty awesome stuff On that and like remember chat ops and we integrate in that And the other guy like recently just three days ago one of our community members just posted that and he says in like I'm doing the interest and stuff with JIP integrating with into the chat and Like all these cool stuff Serverless now you go to the conference and it's no fun if we're not talking serverless and the thing is the Stex storm is like and this is my today's explanation of it's pretty much like AWS lambda plus AWS stack functions. That's open source in your way and like really when you Squint your eyes a little bit and look at that the rules are like lambda, you know something happens run that and We have an actions and we have a very rich Set of a like a sensors and the best part is it's easy to write your own and then step functions since the workflows So you think you know, why why would I bother and this is Another presentation that I gave on a docker mid up recently and that's a result of the site project Which I do with a friend of mine and he is doing the genomic computations And when we looked at his requirements and serverless fits perfectly To the high elasticity workflows and The batch type of the workloads However, there are a few other requirements that unfortunately, you know, like it's ways beyond the current serverless platform It's like these things run for Hours sometimes for weeks They sometimes require up to the half a terabyte memory and all these doesn't kind of work Well with the current limitations of the serverless platforms We know an interesting thing it turned out to be really easy to implement. So it is pretty much you build the container, you know, if you build the workflow and We've some integration with swarm we turned it out into a pretty good Serverless framework. I mean, I probably don't dare to call it the framework here yet But this is a solution that is serverless based and it is built out of the ready to go components so changing the gears and You some of you might be sitting here. I think in like but Why you know, why is this like why? Why can't I just use the scripts? well, you can and The workflow in fact implies that use the scripts. It just places them where they belong, which is This is an action. This is one thing that is done Well, when it comes to wiring things together Workflow is actually Not that much fun to write and it is more complex to write But for one thing it is much better in operations Only why so because workflow is easy to read. It's easy to reason It's really too kind of to watch say You know, I'm running this workflow and it consists of five steps and I'm on step three and the step three is failing And then it's resilient Like in our case, we use the mistral. That's the open stack workflow engine and we be contributors to that and It is resilient meaning any of the component of the system goes down The workflow like they will come pick up and the workflow will continue executing And if something fails, you know, for instance, there is a network glitch or there is something the internally the Workflow has an ability to restart the execution from the point of failure That like when the script goes down you are left to pick up the pieces. So then Maybe some of you are familiar with roundbook automations things like HP Arquist trade like operation orchestrator Apollos or You know Microsoft system center orchestrator We realize a few others. Why don't we use them and? By the way, don't pull your bullshit being a car just yet I'm serious Three years ago, I was here at scale talking about the comparison comparing and contrasting the VM were an open stack and generally the Vendors approach to building software and the open source approach to building software in our area and you know Like I called these three properties of the software that is when you say it is built for DevOps It means something and first of all it is open source and you all know that in more and more companies have policies There doesn't let them run the things which are not open source For a very good reason. It's not the money it is control and then Infrastructure as code Like what it is. This is an ability to rebuild everything all you infrastructure from the code and artifacts And applying infrastructure as code to automation itself is super critical and I'll tell you a story The story is we have a user that uses text storm For their service catalog application So they have four data centers and the provision on the large mixed cluster with virtual VM were an open stack and They use the legacy automation framework and the problem with this is it's Actually Cisco processor gets a orchestrator and how they do that They go into the UI they kind of click click finger finger save and then it gets applied to production and they pray and It seemed to work and they go to a next data center and like click click finger finger Probably the same sequence click save and then I go to the next to the next Now what's interesting is I'm telling this story to some people and they're looking at it like and what's wrong with that? So Well, we've developed you don't do this and these guys couldn't even do that They were doing the DevOps transformation and they wanted that to be reliable. So they ended up having that Built and test on the development machine once done. It's committed to the branch to the staging branch the fleet of the stack storm server and is picking it up on the testing branch applies that runs against the test environment and when tests are Finished and things are ready They promote that to the production branch and they like to use puppy and the puppy Takes it from there and deploys that on the production stack storm services and the automation runs there as it was developed every time so it's like, you know, Jenny Kim was sharing the stories of his two plurings and This is it's actually Fascinated to me. I'm a developer and they say if you developers develop your applications and you dose don't use source control That's fine But if you're in operations and your operational artifacts are not under the source control That's actually a big problem because this is the top leading performance all the kind of success in the automation This is the information from the puppet lab the usual state of the develops report Again, there is more to the infrastructure squad as you saw We don't just do canal of integration here some of the operational patterns like how we remediate How would you remediate for instance? My sequel or how we're reading me a remediate Cassandra. However, you need something. This is something that can be shared And it's can be shared as code and let me illustrate that with example Cassandra anyone here runs Cassandra All right, so like Cassandra is Resilient so when you lose a node in the Cassandra ring it keeps on going But as an operator you still need to go ahead and replace this thought and apparently it is not that easy There is a kind of seven steps, which are written black and English on a wiki page or Kind of and it explains you what needs to be done and some of that is pretty hairy So for most of that there are API's but you actually need to jump on the box and run some local commands because at this point Cassandra doesn't expose this command as an API. Well, and you do that manual all the time So you can automate that but you can do better than that like say we automate that for a client of ours We wrote a blog about that We did a video on that. That's normal, right? And then we posted the automation recipe To github and now I can actually go and do pack install Cassandra mediation And I get that on my instance and I'll massage that I'll make sure that works for me But this is an opportunity. This is an opportunity to capture and share operational patterns as code And that gets us to the collaboration. So obviously the open source platform is All about collaboration. We have actually I was stunned to learn that we have almost 300 Contributors, I didn't even know that and you know, this is from the open hub and open hub recently It doesn't update very often. So that's November is the 10 we're growing but the point is contributing to the platform is just one way more interestingly the integration platform is The power of integration platform is the more of the more integration points you have there and we recently launched this text term exchange and this is our form of, you know Things where we keep the integration packs and I'll just let it sing for a second. We have a lot By the last count, we had probably 70 integration points with almost 3,000 units of integration meaning individual sensors and individual actions It's it's fun to write sensors. It's fun to write actions But sometimes they say if I want to just launch something on AWS you pick the AWS pack and everything that you have in BOTA is Available there as text or action same applies to open source to open stack now That drives me to it's too much for us To do that and that makes me ask you guys to contribute Because we are a tiny team that is fortunate enough to be backed by a large organization But when we want to integrate everything that's a monumental task. So If there is one takeaway from this talk is Contribute and you can contribute in the very different ways first of all use it I mean go to kind of get yourself a box run the background Run the command and you'll get stuck storm up and running on the environment in few minutes In fact, we have as the two vagrant repository that Gets it's all wired for you read the docs Take a look judge for yourself. It's good for you or not and you can contribute by liking it or By not liking it and probably even more by not liking it because we want to know why it's not good for you Why it doesn't solve the problem that you have so bring the box jump on the community Give us hard time And this is a contribution Contribute by Taking part of Integrations there are few people here who are representing the interesting integration points for instance, there is not a single week where Some of my users don't ask me do we have Zabix Zabix integration and shamefully we don't because they're just we have nuggets We have sense in your relic and blah blah blah. There are 140 integration tools out there And I cannot catch up keep up with that But we spoke with the Zabix guys and they will be looking into integrating this text from there now same Weaves of genie and of pager duty we have but you know, it's interesting to have more and Interestingly, you don't have to own the domain to be the contributor to this domain For instance, Kubernetes our friends with Pearson the big Kubernetes shop and they maintaining the Kubernetes pack for the community Paul's on the guys I'm maintaining the VMware integration because they care about that they started that and so on so forth like a G or it's not at last and it is collab not friends who are integrating that so if you Really care about a project there, you know bring the integration to a stack storm and we are there to help and guide you and look you can If he is text storm is not exactly a cup of tea or you don't know how to play that you can even Everything counts. You can tweet about that you can blog about that you can actually Go and do the github star in that and on this point. I probably you know, I rate my talks By the number of github stars I receive after the presentation. So when today we're done with these You know from the Sunday not so sunny afternoon and you are standing on the airport in the line or you're kind of driving And in the traffic jam and you reach out to a phone to fill up the moment enough bring it up and say, okay It's text storm give it a star And that questions Please Thank you. This is the reason Netflix actually zoomed in sex terms So unlike many of the devops tools we actually build that as the set of microservices itself So it is built to scale every component of sex storm is a microservice horizontal be scalable the guys are sitting on the Message bus. I provocatively don't even draw the Database here because it's irrelevant. We read everything from the file But other than that, you know, this is so we have the scalable architecture ourselves Netflix specifically runs two instances of Stackstorm in each of the availability zone and although they didn't have kind of any big load just yet kind of huge load they didn't have any Major outage either except the one that they provoked itself a kind of human interaction Please we do we are working on improving that even more and At the look at the best thing which we'd seen our users are doing is using chat ops for this and the way it works very nicely with chat ops is your first part Creates kind of prepares the action and then it posts to the chat ops kind of it's ready for you Decision point Proceed years or no, and then it picks from the chat ops and the continuation of the workflow fires up at this point We improve in this functionality this cycle by introducing what we say Questions that is in addition to just kind of doing the binder answer kind of stop note We are also wants to collect the information so to give the user an ability to enter the data that Potentially adjusts the continuation of the workflow So yes, there is some and more is coming Yes, please at this well This this is a very real and very complex problem, and it is not just here You know it's a similar problem applies to all the kind of DevOps automation tools our solution here is well, we are working on some of the dry run capability is but Simple answer is no we put it on you and how it works is Because this is the code. There are some set of recommended practices and following these practices will help you Partially solve this problem, but there is no easy Kind of there is there is no magic magic bullet here. So the the guys who run in us like Netflix is interesting case they Build something on top of stax drum that runs this automation locally Against a particular set of stops and they build this set of stops then they deploy that to the staging and Then the test that in the staging and there is a kind of they have to pretty much automated the whole thing of moving from the Development quickly check that the action is all right into the staging and into the production But this is the only way so it is more of the process around Test it locally test it again some environments know environment and when confident with the production and what we do is we only support this process when making sure that all our artifacts is the code and Once you commit them into the github and then pull it from a particular branch on another Stax drum box, they'll run exactly the same You don't need to write an agent The like our own remote execution happens over ssh just as ansible is happening sometimes Kind of this is for remote commands. Some people already have the salt stack deployment and use the salt for that time some have like A variety of the tools so no, you know and this tax storm is agent less application We don't need agents most of the time the actions are the call outs to the API endpoints And sometimes these are kind of remote execution action is reach out to ssh and do something there And we have like we actually like if you if you deep dive into that We have a concept of runners and there are some interest in runners for instance for network devices We have this inspect runner that just goes over ssh to the thing And there is a particular way of conveniently parsing or screen scraping the output because some of the devices like part of the problem that We are dressing is some of the integration point simply refuses to integrate So there are some like and legacy is the fact of life for many of us And being able to integrate with the things that are not built for that is yet another thing So as soon as you can programmatically do something with this point you put this into the action and then you run this action There and you can do magic with that, but no, we are not we're not using agents yes, please the So just kind of I'd like to use that not this spaghetti one, but this one So the rule engine is responsible for this so the sensor like architecturally when we the sensors are running In the sensor container this not a docker container is just the kind of the platform component That picks up the bunch of the Python elements and run them and when the sensor fires the trigger It gets into the sensor container and it creates the trigger and puts that into the message bus Then one or multiple rule engines will pick it up and say, okay. I got the trigger. What do I do with that? It evaluates the rule checks that against criteria and then Schedules the action execution action execution request calls into the message bus and it is in the scheduled state And then the next available runner will pick it up and say, okay, like I know to execute actions Here is an action execution acknowledged and then it runs this action execution It completes the execution and it sends the results back into the system and on top of that There are some native fires that say for in an instance of the workflow. They watch the Execution as progresses so that we don't kind of we're not in the black for as long as the action executed I'm here to take more questions kind of to show you around as you can see we can connect to that Get to any level of depth of that Again, thank you reminder you kind of those of you Who like to talk on me a star and with the rest again, thank you and enjoy your Sunday in Pasadena