 So you're probably wondering why I've called you all here today This is making infrastructure s'mores with chef. So if you were looking for You know how to make apple pie with pop it that was the last session I'm sorry. I promise it'll be funnier from here on out. It is the last one of the day. So As I said, the name of this session is making infrastructure s'mores with shut This is gonna be so hard for me to like stand in one place. I Promise if you can't hear me out, I'll fix it Making infrastructure s'mores with chef and I will tell you this is a fairly tool agnostic talk So I will give references that have to do with chef because chef is what I know But I'm hoping that you can take away these ideas and apply them to however you're doing your work And my name is Matt Stratton. So who the hell is Matt Stratton? So, oh, hey, surprise. I work at chef Up until recently I was a senior solution architect Ed chef I I live in Chicago cover the Midwest I've now moved into our customer success organization where I'm a customer architect. So I design customers apparently I've just moved along tonight work with customers that are current customers as opposed to selling them things I am the creator and co-host of a podcast called arrested DevOps So if you like podcasts and you like DevOps, you should listen to the show and I am one of the organizers of DevOps days Chicago Which is coming up August 30th and 31st the calls for papers are Open through the rest of the month So you should submit one of those and you should come to our event because it's rad And yeah, that is my license plate my car does say DevOps So I guess I buy into this stuff and you can find me on Twitter as at Matt Stratton that's pretty much where I am everywhere else as well but Most of hanging out on Twitter and up here on these Table black tables I've Excuse me. I've got you know some contact stuff if you want to talk to me about something I've got stickers from chef and ADO and all that so sticker up your laptops So let's talk a little bit about what is chef and why do we care? So the first thing is there's three key things about what we're talking about that we do with chef The first thing is that we're going to use chef to define reusable resources and our infrastructure state as code That's a really big statement that says a lot of things. So I'm gonna unpack it So the first thing is we're defining things, right? So you may hear difference of opinion about an imperative versus a declarative configuration management I've got no time for that right, you know the declarative is XML and nobody wants to use XML But what we mean when we say we're defining reusable resources? So a resource is a thing we care about right? It can be a file. It can be a service. It can be a package It's a thing we care about Where we want it to be reusable because we don't want to repeat ourselves, right? We and if we the most important thing here is it's infrastructure state as code and that's what I want to Deconstruct two key parts of that State it is about the state of our infrastructure. It's not about how you get there It's about the state desired state that also imaginative Microsoft in their tool that's like this They call it desired state configuration as opposed to something cool like puppeter chef. That's a lot more fun But it explains what it is and then as code and I'm gonna get into that a lot more later as to what it means to treat your infrastructure as code Chef can be used if you're managing your deployment your ongoing automation. So it's about day zero It's about day 300 right you need to spin up some new machines You want to make sure that your existing machines don't suck you're using a tool like puppeter chef? And there's community content available for all common automation tasks. These tools have been around for a long time Sorry, no matter what clever thing you think you came up with someone's already had to solve this problem for you This is the problem that I have I sit down and say wow, you know That I'm gonna go write a rad cookbook to go do this thing and then I go oh Somebody already did that shit guess. I'm not gonna be known for that I did find one thing though, and I wrote a cookbook for a tool called guacamole which is a Session-sharing thing so but what does it look like right? So like the thing that I love about something like declarative configuration management tools like this is so this is this is legit chef code here right for doing a very basic install of a patchy and Chances are mostly sitting in this room don't know chef from puppet from you know Pearl well, maybe you know pearl. I don't know but you should be able to look at this and get an understanding Generally speaking of what this is doing right? So we're saying the thing we want the state We want it to be in so think about it this way you talk about what you want not how to get there We focus on the outcomes not how the sausage gets made That's just sort of the thing to think about with that And we're talking about full feature parody with Windows and Linux and my examples I may be talking about Linux stuff if you're when you do Windows things We're living in this weird alternate universe where Spock has the goatee and everything where you know Linux runs on Azure and canonical does stuff with Windows and everybody's friends and it's weird but cool and somewhere There's a really sucky alternate universe where everything's horrible still right? So what does this mean right if we we're gonna talk about DevOps and before I dig into this I'm gonna ask a couple questions just to kind of do some level setting around what DevOps means because obviously I have feels about this given my license plate And so how many people that are in the room here would more identify themselves within your company today as more of the development side of the organization How many people would more identify with the operational side of the organization? How many people consider themselves more of the business part of the organization? All right, every single hand needed to go up for that last one that right there is DevOps Okay, there's no the business. We're all the business Unless you're a consultant then you're your own business, right? So we want to think about what DevOps means though It begins with automation because automation gets a lot of crap out of the way So we can do smart things that are using our big brains and using our good resources So automation means we need to treat our infrastructure as code. What does that do if we get to treat our infrastructure as code? Why do I want that? Well, one thing is it lets us version our infrastructure and our runtime environments So I didn't really give my bio here, but I'm a grizzled old sys admin I've been most of my career I would raise my hand for that upside of the house and try to hit the ceiling Most spent most of my time as a system administrator as a sysop and one of my trigger words is it worked on my machine Right or it worked in dev So if we can simply within our different environments understand what's different about them or understand what version of our Code and our infrastructure is working that makes our life a lot easier It lets us create our infrastructure and runtime environments consistently I love this screenshot. I did where it's got a spelling error, right? The squigglies are under ENVS so you can tell that I got this from screenshotting out of another PowerPoint But what's important here and we think about the loveliness that is promised to us by containers Why everybody loves containers so much is there's this myth of I shouldn't say myth I Just revealed an awful lot about how I feel about containers There's this belief that oh, it's great I can define this as one way and then it'll be I ship it everywhere, right? Guess what we've been saying this since Java if not before right right once run everywhere all this crap So the idea though, but if I'm creating my infrastructure and my run times consistently This is gonna help me have predictability, right? It's this again all works back against this it worked fine and dev and then it also lets me test my Infrastructure automatically it lets me have predictability around what these changes are gonna do. Oh That's awesome. That's actually Super ironic right there I Feel so so much about them that I will remind me later there Dr. Whale that I'm running the beta Man oh, and this is recorded that's beautiful awesome Jason's gonna like splice out just that one bit and put that on like a vine or like an animated GIF on Twitter Okay, so what when you talk about in for and the thing is the key of that the big reveal till the Docker whale stole my thunder is That this is just like what we talk about with applications, right? Our apps we treat them the same way whether we're talking about our infrastructure our compliance whatever the hell it is insert madlib noun as code so the key of Really mad at Docker now, okay So the great quote about infrastructure is code by Jesse Robbins is that when you treat your infrastructure as code This should enable the reconstruction of your business from nothing but a source code repository an Application data backup and compute resources so that means if all hell goes loose as long as I've got my source code I've got a backup of my data, and I've got compute somewhere I can get my company back online will very quickly digress to story of two infrastructures so one of them Was a company called code spaces anybody remember these these folks from a few years ago? Yeah, that's a read as you can see where this story is going code spaces was a hosted source code provider But that like GitHub, but wasn't just get they supported mercurial and SVN and all sorts of stuff They ran in AWS because the cloud right Dave's starting to remember them a little bit And they got slammed they got it's actually really interesting story to read about how they were hacked But it was a kind of one of those looping. I am role AWS attacks where every time they try it, you know, someone got a hold of their creds They try to go reset them they get reset back from underneath them in circles in circles now bear in mind that they had backups Right, so this is just remember this they were like they dutifully were cron backing up their machines and everything And after I think it was about six hours of this They determined the cost of rebuilding their systems was worth more than the company was worth and they went out of business Then on the other side we have a company called custom ink if you've worn cool conference t-shirts You've probably worn a custom ink shirt. They make lots of t-shirts for things like that. So custom ink They were hosted in a colo and they happen to be a chef customer But the point important part is they're using infrastructure as code got hit by a DDoS attack and distributed denial of service attack Because they treated their infrastructure as code. They were able to rebuild everything in AWS in the span of about four hours and Be back online and route the traffic over there. You can see a similar story with Facebook's acquisition of Instagram so Instagram and Facebook both use chef and so what happened when Facebook bought Instagram Instagram was hosted in AWS Facebook What oh AWS that's cute. We're gonna move you into our data center now And they were able to very quickly and easily migrate that stuff now the six point two billion photos as a whole other story And there's a cool wired article about that the whole idea is like you want to be compute agnostic I want to have the same thing whether I'm running a VM on my MacBook whether I'm running Dev and AWS Whether I'm running production in Azure or on metal in my data center So what does it mean to treat your infrastructure as code? It means three main things as far as I'm concerned Yeah, here the memes come first of all it's versioned my infrastructure is versioned and versioning does not mean dot b a k At the end of a shell script name right or even the more elaborate dot one twelve. Oh, you know 2016 or whatnot. It's versioned in that. It's a version of the state It's modularized which means I'm thinking about it in terms of what these different components do or you could say It's componentized right I've little built-for-purpose parts of it and it's tested That is what we're going to talk about for almost the entire rest of this talk Is how important testing is when it comes to info code and to your environments So one of the things that's super great about when we treat our infrastructure as code is it provides us with executable documentation This is the part when I go ahead and say there's a couple things like my joke that all wikis are right only But even more that kind of conflicts with my real point which is we think back a little bit to the olden days of 10 years ago maybe some people maybe 10 weeks ago and So I'm going to go ahead I'm going to build a system or I'm going to make a change to my system and I'm going to go to the wiki I'm going to do to flee like a good sysadmin go and I'm going to update and say this is what I did right now You know the NTP is listening on this particular port I've moved you know the file system from you know data is now in slash data instead of where it was before And that wiki is now the obituary of exactly what my system looked at like at exactly that time The thing is if you remember back to when y'all looked at that chef code I had in the beginning you didn't know chef, but you could understand what it did so if you have your Infra your configuration management is running Regularly this is the other thing don't do this thing a lot of people like oh Well, we can't make change as much so we're going to have puppet run like once a month That's scary as hell to have your config management run once a month because holy crap is that a lot of stuff to change? How do they know when it's needed I? Make yeah, any any time we're going to get to a slide later entitled beware of hubris This reminds me of when I worked for an e-commerce company and we were doing a B testing right and so we said okay? During the test there can be no changes right and they're like oh But we can change these things because we know that won't affect that says I'm like well if you're so smart and psychic Why do we have to test it all if you know what it will affect what the user does? So the thing is think about yeah, if we want it It's continually running and what the reason for this is it's managing configuration drift configuration drift Happens when one of two things happen It happens when your desired state and your current state don't match that can happen for two reasons one Desired state changes you want something different than you wanted yesterday current state changes Someone went and screwed up your shit right so either way you want to have Chef or puppet or whatever and I'm gonna just keep saying chef and you can manually like you know search and replace for Chef and puppet for that right now, but you want to have your system go ahead and just take care of it for you So here's the reality Think about domain expertise This is the part when I say there's about 16 full-stack engineers in the whole world and they all work for Netflix So stop thinking you're gonna hire them, okay? Systems are really complicated today There was a time when someone could understand everything in your environment depending upon the type of company that you're in And what's your software stack looks like it is possible to understand most of it? but and And depending again depending upon your organization and the larger organizations with more complex systems And that's the direction that all of our systems are going is in is complexity is increasing. It's not decreasing Expecting somebody to understand all of them is crazy So nobody can know everything about the stack what you want to do is let your domain experts contribute their portion directly So as the sys admin there is someone who's a system administrator. There's stuff. I know really well I have my domain expertise and I have like a passing knowledge of you know Maybe a higher-level language or microservices or something like that Similarly as someone who mostly writes code, you know, I can really know my code language I can know application architecture super well, and I have a passing understanding of networking. That's cool That's a great person to be but to expect me to be a CCIE at the same time as someone who can write deep libraries and see and to be able to do every single thing there is to do these are mythical people and You certainly can't afford them even if they exist because they're all consultants because they can make more money working for themselves So the thing to bear in mind the reason this comes into play is when we talk about our Configuration management and our automation in general a common play in a lot of organizations is to have an automation team And they will say and that team can be made up of one person or can be made up of a dozen depending upon the size No matter what it is. It's wrong. I don't usually say there's a wrong way to do things This is one of those things. This is the wrong way to do it and what happens is all you're doing is moving your silo around So if I sit there and I say, okay So I've got my complex applications and I'm gonna spur off and I'm gonna get two people I'm gonna say guess what you are now the chef people and anytime we have to we're gonna automate the hell out of everything And you are gonna write the chef automation for everything for the infrastructure for the middleware for the applications for all this and you go Oh my god now I have to understand all that stuff, right? So that's that's a thing that really sucks and this is why we want to be aware of hubris, right? This is where we bring in my friend Gil foil from Silicon Valley Who all good system administrators aspire to be more like? So a lot of times what we do and this is why it makes that that distributed work hard is We lock down systems to protect them, right? We're in a regulated industry We have compliance requirements, and if I sound like I'm being sarcastic it's because I am Everybody has these problems whether it's a formal or an informal regime We have things to protect and we often think that the simplest way is to reduce the surface, right? If we reduce the number of people who can touch a thing that's going to reduce the attack surface the problem is I will assure you that these that Protecting against mistakes based upon job title is one of the craziest fallacies in our Infrastructure industry today. I can assure you that these very same sys admins who insist that only they by virtue of nothing More than job title should be trusted to touch systems will happily tell me war stories over a couple of glasses of whiskey About when they totally effed up everything in production because they accidentally type the wrong command We are all people so that's the problem with separation of concern based upon okay Cuz I'm a sys admin I can be trusted truth is chances are I know my stuff, but I don't know your stuff So this is the way that everybody responds at this point usually especially the sys admins in the world in the room are going Wait a minute. You're telling me anyone can do anything And that's not what I'm telling you right and don't worry It's gonna be okay We've got the magic devops pixie dust to help us out here So this is how we're gonna be able to work together and not have this world of anybody can do everything. Everybody's got root Yolo mode right devops does not equal Yolo So the old way of doing stuff is communicating via tickets, right? So I'm a middleware guy. I'm an app guy or whatever person and I need something changed at the system level So what am I gonna do I'm gonna put in a ticket to the sys admins They're gonna say I need the heap size adjusted on the you know on Tomcat for me, right? So this sucks because I'm explaining things in English or another human language and there can be all kinds of room for Misinterpretation there right and maybe I'm gonna give you a script like that's advanced right communication I'm gonna have attached a script to the ticket. Please run this script still I the person that's taking it I don't necessarily know what's going on and one of the most frightening things that I've ever experienced with this was at a Certain e-commerce company. I worked for that was in the multi-family rental business and they have commercials starring Jeff Goldblum now they rhyme with smart men's calm and We had the rule that again any database changes had to be run by the DBA cool rad Okay, so that's your separation concerns. We're gonna have mr. DBA person is gonna run the changes Well, of course, who knows what the change has to be made the app person right the developers like Hey, we have to change the schema because my thing needs a thing So what do they do in the ticket they attach a sequel script and the DBA happily just goes okay? I'll run it doesn't look at it doesn't understand what it does But the button got pressed by the DBA. Therefore auditors are happy. Oh my god. This is crazy No good. So hence old and busted throw it out. This is the new way. We communicate via code Okay, and because ultimately what is a version control system? But basically a communication tool right? It's a way for developers to communicate with each other about changes. They've made Sometimes through typing in actual messages and oftentimes and better the things that you don't have to remember to communicate because the system does it for you So bear this in mind people make mistakes, right and this doesn't scale You aren't going to fix the humans So we're gonna fix the system If you are in an organization where people are punished for making mistakes Do you think that is going to make people make fewer mistakes? Historical question, but you all silently Psychically answered it correctly to me, which is of course. No what you are now have now done is built an organization full of subject matter experts on hiding mistakes and Now you are well and truly screwed because you have no idea what is going on in production because everyone's hiding everything as My colleague and friend Sasha Bates is fond of saying if you treat developers like children, they will act like children That's true of anybody, but especially developers apparently So how do I make sure that nobody messes stuff up? This is a new slide for this talk I don't know if you guys know about this one This was that dog who got in the toner and like the people like they left for like four hours The dog came back and it's beautiful because it's hard to see but there is only one footprint paw print on the bed It's like the dog realized oh wait I'm not supposed to be on the bed But so you know you're like man, I've heard about this dev ops thing. I've listened to some podcasts I've you know, it's all hippie-dippy culture stuff, right? Isn't it all about trust? Can't we just have people do whatever they want? Well, no we can't right again? We're trusting but it's not about trusting that someone won't do something malicious so Testing is the key and this is a book that everybody in this industry should read it's Sydney Decker's the field guide to Understanding human error. It is the most important book for us meaning people in it to read It's got a it's very accessible and almost none of it is about it But guess what it is made of people you're you're working with people. It's like soil and green So the thing is we're not worried about people doing something malicious not saying we shouldn't be worried about that But more often than not when someone screws something up It's not to like be an evil malicious hacker that's gonna steal like a half a cent off every transaction It's cuz they didn't know what they were doing, right? It came out of ignorance or out of a mistake Because we're humans. So if we think back to this idea of communicating through code If you have people who care about things they should be part of the code review process This is the part when when I'm explaining in for code to a room full of ops folks and they're like oh This is scary man because they're like I'm not a developer. I don't know any of this developing stuff I'm like you absolutely do we just call it different things We don't call it coding. We call it scripting We don't call it code review. We call it change control You know things like that we don't or we don't call it peer review, right? Or we don't call it testing. We call it debugging, but they're all the same skills So in in the older way and and maybe one might even call it the ITIL way Again, I won't get that would be a whole other two-hour talk that I've already given You'd have change manager. How many people have currently work or have worked in an organization that has formalized change management I'm sorry. It's gonna be better eventually But let's think about that, right? So if the way you can replace that if you're making a change because this is the concern a lot of times people are like Oh, well, someone's just gonna go and change the chef code and then there was no change control ticket What the hell we're like, but it's code it doesn't get deployed unless it goes through change review and A code review and then who should be looking at that the people who actually care and will understand what will happen So we're gonna talk about how we can do that But let's think about back a little bit And this is me tying it all back into the title of the talk because that's important So let's think about to when I gave the example before about saying we've got our automation team So if you have and even if it's not an automation team, even if it's something where a lot of times People will say okay Well, the sys admins they're gonna they're gonna be in charge of puppet because that's a sys admin tool, right? Okay, they'll do all the the puppetizing of the stuff, right? Well, if you have one team that does everything your s'mores kind of look like this Because all that that team knows is the graham cracker part, right? So what what happened in reality again when you have an infra team, you know as a server team that writes all of that They are gonna like config manage the hell out of all this low-level system stuff That's great and everything like that and then it's gonna be like okay And then dump your code in this folder and then go do your thing because I don't know how to do your stuff, right? Then if you don't this is what happens if the teams don't collaborate This is this this happens more often than you might realize or maybe you maybe you're not surprised When you have the app teams and the system teams in the middleware teams They all go and build their own pipelines and they use their own tools and you're like great The system team is gonna use puppet and they're gonna use Jenkins And then the app team is like writing their stuff in chef going through chef delivery and the middleware teams doing this Well, this is what your s'mores look like now They're beautiful marshmallows that are delightful and chocolate and everything but they aren't integrated in any way shape or form And then the one of the possibly worst things you can do is actually have a group that knows nothing about you either because they're Completely separate and don't actually have any domain knowledge of your systems because there may be I don't know An outside consulting group that doesn't get to know you and then your s'mores look like this So this is also why you don't hire somebody to come and write your automation for you That being said you can certainly hire somebody to come in and help you with it But the last thing you want to do is pay somebody an outside contractor to write you a bunch of puppet modules or chef cookbooks And then leave unless you'd like to continually paying those people So by the way, if anybody in the room if that's your job, I'm sorry for screwing your whole, you know snake oil thing So so yeah, so how do we solve this right? So we kind of understand the problem So what are some of the things we can do to make life better? With this and this is where I might get a little more chef specific But there's similar tools and Dan and I were actually just talking during the break So I'll talk about how you might do this with puppet So one of the big things is this is just one of the many testing approaches you could have is using a tool called food critic And what food critic is is it's it's like a linter. It's more I always explain to people It's like a style guide for chef So the community has kind of come together and said like okay. This is what good chef code looks like right? So the first thing is it's gonna help you with things that would make your cookbook fail period, right? So food critic rule 10 is just saying like if your search syntax is bad. It's just gonna say yo, this is bad Get out and then there's also style and convention that's been adopted by the community So food critic rule four is saying okay. Here's a better way to do it, right? I could write it and without getting too deep into like the chefism of it But I can in chef I can say run, you know service starts engine acts food critics gonna look at that and be like Did you know it's almost like clippy right so it looks like you're trying to start a service Did you know there's a better way to do that and? Because it runs against the static code and I know the core developers in the room are mad at me for calling it static code But I mean like just the files It says so very fast the food critic against your whole cookbook or something like that will take like a second and a half And puppet lint is very similar same idea behind it Now the thing that's really powerful about this so we talked about you're like, okay That's cool So that's at least making sure I'm not writing total crap But how do I make sure that people don't do things that I don't want them to do in general forget about what the community thinks What about what I think so? You can also write custom food critic rules and this is an example of one and For example, let's say you're saying it's really important that in our organization No one is allowed to mount disc volumes right we hate that for some reason and so by writing this rule That's relatively simple if I run food critic against any chef code that would result in a in a Disc volume being mounted it will fail and the reason that this is cool is because this happens early in the process and it's fast So as developer type people we should all know that the closer to Injection of a defect that we detect it the cheaper and easier it is to fix it right if I discover Five seconds after I write some crap that it's crap It's way easier for me to fix it than to discover it six months later and have to go back and be like what the hell was I thinking when I wrote That so the same thing the other difference is Anything like this is this could get through this would come out and then like six months later someone's looking and going holy crap Why is my you know front-end server got this external disc volume mounted? And I have to go back and figure out who did it and the person who did it can say I don't know that I wasn't supposed to do that and then you go But look here's the giant tome of things you shall not do what do you mean you didn't read this whole thing? So we're making it interactive is the point right and that's kind of what it would look like the idea Is this again as much better to see as an output then you know some nasty gram from the security group six months later Something like that. This is the real key idea use a pipeline. So I'm going to talk a little bit about this Your infrastructure doesn't live in a vacuum things depend on each other constantly, right? Your front ends depend on your back ends depend upon where your code is depends upon where your load balancers are and So what that means is I don't really it doesn't really do me enough good to just sort of test and say well I wrote some info code that configures a web server sure looks good fire it out there I want to make sure that it doesn't go ahead and break something else, right? I want to make sure that I understand where those pieces and parts come in and that's why the ideas of continuous Delivery apply themselves really nicely to infrastructure code Why why does this matter, right? Okay? So the first thing that you want to do is ensure that whatever system you're using whether it's puppet chef Whatever the thing is the only way that your info code should be able to Be deployed is via get right via source control. That's your interface, right? You don't get to go and say knife upload cookbook this and fire it off or run an Ansible command or do whatever You want to make a change you change it by going into source control and then that kicks off your pipeline which will auto which Is going to make sure that it has the appropriate automated gates and the appropriate human gates so a couple things to think about just in terms of The ideas of the difference between continuous integration continuous delivery and continuous deployment because they matter even though They're confusing and continuous is hard to spell So I think a lot of people are super familiar with continuous integration We've theoretically been doing it for a super long time The idea of continuous delivery is so continuous integration We're saying check in off early check in often, right? We're saying every time I make a change commit it make sure it builds Shit ain't on fire. Yo, that's continuous integration right there and don't check in to master at four o'clock on Friday You know if you'd like your friends Continuous delivery is kind of the natural progression of that We're saying great continuous integration tested to make sure the build didn't break But guess what we're doing more than building our software. We actually have to deploy it We have to release it and so if we think a little bit around what our businesses want and what at what a bit What businesses in general want and what scares the bejesus out of sista admins and people who carry pagers is if people are So think about this if you're someone who every time software is released in your organization It means you spend the weekend in the data center and then they come to you and say we'd like to do this more frequently Wow would that suck so that the idea of continuous delivery is one of the key principles is that releases should be boring They are dull dull dull dull no excitement. In fact, I'm not even gonna buy you pizza. It's that boring So the way we do that is that we're continually practicing right and then in order to enable us to do continuous delivery We have to automate as much as possible Which means our stages that our pipeline goes through have to be exactly the same as production except for scale If there's any way you can do that manually Hurrah to you because you found some people that are willing to work for next to nothing but are really smart and The key idea of that is that the problem with before we do automation These tasks like the way I used to release software is development or whomever would provide me with this document called a PMI and Wasn't private mortgage insurance. I learned about that a different way. It was production migration instructions It was this giant word document that said these are all the things you should do except for the 20 things We forgot to tell you you need to do because we did those a while ago waterfall and The problem is so I'm gonna sit there and then I'm gonna do those instructions and then they worked in QA and that's fine And then I'm gonna go and I'm gonna go try to release it into production And it didn't work because things were slightly different there So the whole thing is everything needs to look like production except for scale But all of those items I was telling you about that were being done manually They were super boring and super mundane yet They were required to be they were required to be done by someone with a high level of technical skill and to quote Jess Humble and the writer of the book continues to livery and just so you don't think this is my clever statement Asking highly skilled people to do repetitive and mundane tasks is the surest way to introduce error short of sleep deprivation or inebriation Okay, you will actually have more mistakes. So we need to automate the hell out of this stuff and All the way through so that was boring So it's predictable and the thing that makes me feel good as an ops person is if I know what's gonna happen When we push the button when it becomes the business decision about when we release the software The idea should be the change is ready to go any time We know hey our stuff's good if it's not we pull that crap out and it doesn't make it past and then whenever it's the appropriate business Decision, okay, we need to release this feature today. That's been ready for two weeks or whatever now we can push that button It's not about let's let's let's move towards that. It's what makes us all feel good about it So think about this now We want to use some type of automated compliance is our final test and our pipeline a lot of times when we talk About testing in a pipeline. We're probably thinking about two kinds of tests We're thinking about functional and regression right so or you know or smoke right regret Well regression we want to make sure that we didn't break something that wasn't broken before And we want to make sure that the new thing does the thing we expected it to do That's testing as far as our entire industry is concerned except for those poor people that are called security people Who have a whole other way of doing things and we like never invite them to? Meetings or do anything because we hate them because frankly the worst job in the world is working in infosecond and enterprise because you are the chief no officer So we need to make that better and this is how we're going to help make it better So think about this Audit of some kind is something we're likely running on our production machines anyway Almost everybody is subject to formal or informal compliance regimes right some people are subject to formal ones HIPAA PCI Socks all those things. I'm sorry, you know, but almost everybody has informer regimes You have standards in your organization that say hey, we don't you know have no root password on our servers, right? You know we disable password, you know password logins over SSH things like that So we're likely running these audits Now the thing is most of the time when we do compliance auditing and talk a little bit about the Audit what I call a compliance theater So I've been under a lot of different types of audits in my day in different organizations and Compliance in an organization from a technical perspective usually looks like this Because here's what happens. Okay. I'm doing real work. Oh, well, we got an audit coming up Everyone's gonna pay attention. We're gonna fix all the stuff that's broken cool. We passed Okay, now let's go do our work for a little while and then we're gonna slowly have some rot because we're doing our other job No one's paying attention to anything. Oh, oh time for that quarterly audit. Okay. Let's pay attention again So that sucks so at any given time and you're investing a huge amount This part here is a huge amount of effort and you're doing nothing but that during that time so What if we think about making our audit testing or a compliance testing if you will in an automated fashion And I'm not talking about like static code analysis That's reaching into like your variables and your methods to make sure you're not doing something hacky Just talk you think about as audit as acceptance testing for your infrastructure as a whole You know using something like in SPAC or server SPAC or various different testing frameworks because think about this Chef and puppet are good at doing what you tell them to do because again, I'm sorry I know I said there was magic devops pixie dust Chef ain't magic. Okay computers do what we tell them to do so I can tell chef will put things in the state I declare I will say I declare I do declare that there should be a service named engine X on there However, I don't say but what I really care about is is my damn website running right is something listening on port 80 So my spec testing is going to test for that and that's a type of compliance. What's the state when everything has been applied? So since we are going to be running these tests in production against the real live stuff We should be running it against our code before we release it because something that really sucks is I sit there I'm happily chomping away at my info code. That's cool. Let me release that an hour later security scan Hey, guess what Matt you just made an insecure system and now I got to figure out how to back it out And that all sucks wouldn't it be great if I could have known that before I released it So we want to test their info code against our standards before it's deployed We don't want to find out about it six months later during a pen test So Julian Dunnett chef is fond of saying that security and compliance are just another aspect of quality So think about this the way that a lot of folks tend to work We let's say we're doing an eight sprint project. Okay, we're gonna chug away We're not gonna really give too much of a crap about security or compliance So we get to that last sprint and now we're gonna have a hardening sprint where we're gonna run all of our security and compliance tests and guess what None of them are gonna pass because we haven't thought about it So now we got a problem so we can either not release our software That's not gonna fly because the salespeople went and sold all this shit already. So we got to release it Or we can just throw it in the backlog and be like, well, we'll get to when we can and we all know what happens to non You know feature backlog items all those wonderful tech debt things and and hey Maybe we'll get an exception from security that says it's okay to have this giant gaping hole in your security Well, guess what the bad guys on the internet don't really care that you have a note from your mother that says it's okay Sometimes they say signed Epstein's mother, but not everybody remembers welcome back Connor So the problem is we don't want to tack this stuff on the end because imagine what if I said that for quality What if I said let's do a project and for the first seven sprints We're not gonna test crap and then we're gonna do all our QA testing in the last sprint If that's the case you probably worked at apartments calm But if not so again treat it like quality security and compliance are first-class citizens This is the only slide where you'll see Etsy and Ronald Reagan on the same screen at the exact same time. I'm proud of that Ronald Reagan is famous for saying trust but verify and they have a similar approach at Etsy. So Etsy I Have a very high trust culture Almost everybody at Etsy on their first day releases code into production Not developers even like everybody like I don't know the person who waters the plants. I guess releases code into production But having a high trust culture doesn't mean you don't test things you don't verify I Don't even trust myself to test my own shit half the time, right? Like you got to have it in there you think about that I think about backups right backups are no good if you have to actually run them, right? This is why things like time machine with a time capsule are rad because my stuff gets backed up because I don't think about it You know how often I back stuff up that requires me to plug in a drive and run a command never, right? How often do I test things that I have to remember to run the tests half the time? But if my tests run every time I if the only way I can make a change to my Infra-co infrastructure is by putting a change in into get and then it gets picked up by a pipeline and the tests run Then 90% of the time it works a hundred percent of the time We'll get tested a hundred percent of the time We also want to think about this idea of separation of concerns I know earlier, you know, I you know kind of made some hay out of making fun of sys admins who think they're smarter because they've got Root, but this is definitely a thing in this way, right? As I say aka my tests are failing So I'll change the tests aka how I write code And what this means though is that the person in charge of your compliance code Should not be the same as a person writing the Infra code That's how development works, right? So one of the things if I'm using an automated Compliance compliance as code type approach. I'm not going to just simply say okay in my Infra-code or in my in my pipeline. I have a place where I Define the tests as well because they can be changed. So I want to make sure whatever solution I'm using I can separate those things off, right? I can say no, you know what those tech the The person who wrote the new Infra-code can't go in and just turn off the tests that they didn't like So that's a really important piece of this puzzle So we'll review a couple things then we'll take some questions and then you know, I guess everybody can get some drinks So things to bear in mind trust but verify your domain experts, right? So trust that people want to do the right thing trust that people can do good work Trust that people will make mistakes. So let's put some guardrails around them Share the cooking don't try to do it all yourself Nobody in this room works for Netflix. I have friends that work at Netflix. They don't know everything either by the way So don't you know think anything is that? Leverage some type of style or lint analysis tool like food critics something that's giving you helping you enforce your standards Use of whatever your production audit processes in your pipeline Whatever you're if you care enough about it to be running it in production You sure as hell should be running it in test and that just goes back for testing and monitoring in general All that monitoring is is testing with a time dimension So if you care enough to monitor something in production You damn well better have tests for it because otherwise you're gonna have a bad day and by the way Did I mention that testing is important? so Questions what questions can I answer for you? Pretend you know nothing Easily done. Okay. So the question for those of you who are watching and recording was what's a good starter resource for chef? Hey, look, it's on this slide I'll show you all these so these are some real the resources learn dot chef.io Has some good tutorials to walk you through the first few you don't even have to install anything You can do it through some AWS stuff that we just heard whatever the magic is behind the scenes For full disclosure and fairness of sharing. I don't remember. What's the URL for puppet? What's your learning site? Learn dot puppet something Okay, well I tried But I do know one thing I remember from when I was first learning puppet because I learned puppet before chef One thing I thought was really rad that puppet at least used to have was you could download like a learning VM That had all the stuff because we were we all try to make it so you don't have to like install shit to like give it a try Any other questions? Comments problems of your life So the question was where's the distinction between what goes in your monitoring To what goes in Nagios and what goes in servers back, right? So if you have long-lived infrastructure Right that if your pipeline is very long lived then you could possibly Just say okay I'm gonna throw Nagios agents on dev and QA and and watch for that the problem that I have with that is how do you? Ensure how do you tie that into it being able to be released? Right that that's sort of the thing. It's like because again, it's with a time dimension, right? So this is I guess here's the difference monitoring is testing with a time dimension. It's testing over time It's continually going testing is run a test Give me the give me the results of a thing that happened now If you can as much as possible and this goes back to I gave the example with compliance But you could think about it for for overall acceptance Try to use the same kind of stuff, right because the more disparate your test is from your monitoring The then you start to have this drift right if you're like well like this is I defined it over here So having a common language and that's again not trying to like be the product pushy person But this is the reason that we created chef compliance and inspect was to say we use the same language to define compliance in the real world with testing for compliance and functional Before and it's all in one place. So the more disparate you're tooling the more likely they're going to disagree with each other I I can hear you just fine, but that doesn't mean Yeah, I mean the first thing I'll tell you is the vast majority so it before I was designing customers I was working with prospective customers a solution architect and almost every one of my customers was legacy Brownfield enterprise. I mean just to put things in perspective talking again about companies like General Motors Ford, you know, I mean that again is as legacy as legacy as legacy like they ain't move in shit to the cloud anytime soon And so the practices to take under play is first of all what I usually recommend is don't take a thing That's already working and replace it with chef and have nothing new come out of it So I've had people do that before like and just to sort of put an example Let's say they're using Microsoft system center configuration manager Which like get some halfway there and they're a big windows. They're doing that then they say great. We love this chef thing This is awesome. We're gonna chef everything. So let's take all the stuff We're doing an STM and write chef code for that and we're gonna spend six months doing that at the end of six months Guess what they have? Nothing changed Except they have a new tool. So what I do is I say look at the stuff that isn't working and Shim up that first. So in your in your existing stuff, you're like, okay, you know This is stuff that we know in according to our current build process or maybe things this stuff never drifts This doesn't change. There's ways you can go about that, right? You can think about look at depending upon like what your ticketing system is or something like that Look at how you're spending your time, you know If there's something that sys admins are continually having to fix or Things that you always do manually during a release and say how can we go ahead and define this and and also? Another approach is just every time you have to make a change Make that and do that change in your info code and then you never have to do it again, right? But if you kind of take this idea of I'm gonna gonna try to draw up a big runbook of everything at once You're gonna have a project that never ends and that's the thing it never does end nor should it The other thing that I want to mention That your question made me think of that I get asked a lot is isn't there some tool that I could just point It a running system and it'll create all the chef code to make a system that looks like that And then my answer is how does it know what you care about? How does it know is it gonna define the? 15,000 files on the file system and all of their ACLs and every running process and every single service and every single user This it's not magic, right? You have to the hardest part of automation is knowing how to do the thing you want to do so if you if you The final thing I think about with that is you need to be able to take a step back And like if I was helping you I'd sit down we say okay We're gonna write a cookbook for your patchy servers They say how do you build a patchy servers today and a lot of my customers they have them and they ha and eventually comes out They're like well We we do a VM or clone of the one that's already there and then nobody actually knows how the systems are built So you have to know that you have to know the state you want and that's how you approach it You start by saying what's the state? I want it to be in and then you break that down into resources You say the state I wanted to be in is it's running this that means it needs these packages these files These templates etc, and then just sort of iterate down it, right? It's hard go Mm-hmm. I would still pin the hell out of stuff You always pin because in production so the way that you would think about it You're still gonna end up deploying the same mechanism, right? What you're gonna do is and I guess I'll just repeat for the recording. He's in their organization. They're doing Knife upload they're not using knife to manually upload the versions of the cookbooks and just saying it's fine That we put a newer version of the info code out there on the chef server Because the production environment is pinned to say only use this version. You have to go ahead and change it That's not gonna change. That's still really good practice What doing in this matter is making it so that that can't be done, right? So you can't undo it, right? You know, so you're gonna do something for example Excuse me like your pipeline, which is the only way like people you're the only user that has the rights to even upload Cookbooks to your chef server is the Jenkins user period So first of all if you want to even go and do it you can't Second of all it's only it's only gonna do it by doing a Birx upload Which will force it to freeze the version of the cookbook so you don't accidentally overwrite a different kind of code You know you're controlling it so it's always done the same way and then when you want to get super slick The way you do that migrate that promotion is so your environment files are just that they're files Guess where files can go in a version control system. They can be vent linted. They can be tested So if I'm saying okay, so I have production.json that defines my production environments Got maybe some attributes in it or whatnot But it also has for this cookbook version equal the hardy quality operator to that I want to now promote the new one I have to do it through version control and that can have the human gate to the right person who can say Yes, it changed, you know, and if I've had customers get all kinds of complex with that where it ties into their service Now and their change control and their idle and all this other stuff So I think you have the right idea, but they take that stuff take what you're doing with knife and say wouldn't it be cool if Chef delivery or Jenkins did that instead, you know after as a result of the successful tests Yes, I learned that chef. In fact, we just released a new revamp tutorial Because setting up the infrastructure for chef delivery just to it's a lot of work to see if you even want to play with it Right, so it's all done with a cloud formation template with AWS So as long as you have an AWS account you go there if you pointed at a CFT spins it up You know it'll take you half an hour to go through the tutorial So it'll cost you a couple bucks, but then you'll see how it all how that works And that's precisely what it is is it's moving in the tutorial It's about moving in for code not just not app code, but you could use app code as well so and Then also likewise, I mean feel free to grab I think I got a card somewhere if you want to chat about it. I'm happy to nerd out about it and I put these slides are up on the event page thingy So you can I guess leave comments for them and stuff. This is like where you can find me and Don't forget there's sprints on Friday. I won't be there, but I'm sure they'll be fun anyway So you could still go and if you want to find my slides or leave comments I made a bit.ly link of slash Drupal chef that'll take you to directly to the event page for this talk you can talk about how Awesome it was or how terrible it was or that you know My fly was open the whole time and I didn't realize it and y'all were that's what you were really laughing at but Either way Thank you very much everybody. This was a lot of fun