 Okay, so my name is James Gallick, and I wanted to start out today by honoring my personal favorite Festivist tradition. No, not the feats of strength, although that might be something fun we can do later. No, I wanted to start out today the same way that every good Festivist starts, with the airing of grievances. Now, I don't have any sound, but I had this video of Frank Costanza explaining how the airing of grievances works. At the Festivist dinner, you gather your family around and tell them all the ways they have disappointed you over the past year. But instead of talking about all the ways that my family has disappointed me over the last year, I'm going to be talking about all the ways that Sissedman work has disappointed me forever. So let's get started. There's a good reason most programmers hate doing Sissedman work, because it's fucking boring. As a programmer, the idea that this horribly manual process of somebody sitting in front of a terminal and typing commands into an SSH session could possibly result in something that's production-worthy on a consistent basis just seems completely insane. As a programmer, Sissedman work offends every one of my sensibilities. It's exactly the kind of process that should be automated by software, but for whatever reason, typically isn't. And have you ever noticed that every Unix service in the universe has its own fucking configuration language? I mean, there's no standard for that. These are some services that we use in our cluster at work. Every single one of them has its own configuration language. And unlike a real programming language or something like JSON or YAML, there's no support for these configuration languages, right? If I'm writing Ruby or even if I'm writing JSON or whatever, I've got like syntax coloring in my text editor. And typically, there's no way to even verify that your config file will parse other than starting the service. So trying to work on these configuration files is extraordinarily painful. I'm amazed personally that anybody can get Nagios to work on a consistent basis. But even if you've got the configuration files figured out, maybe you're a veteran Sissedman, there is always an esoteric detail waiting around the corner to bite you in the ass. Let me tell you what I mean by that. So I don't know how many of you are familiar with HA proxy, but it's a great TCP and HTTP load balancer. And we use it as our public facing load balancer in our cluster. And the other day, we set up SSL on our site for logins and for a few other reasons. And in order to get HA proxy, to make a long story short, in order to get HA proxy to load balance HTTPS traffic, I had to do not one, but two things that took me a couple hours of googling around for. The first was that I had to turn it off of HTTP mode. This is because HA proxy doesn't actually handle the encryption. And since the HTTP mode is protocol aware, it gets confused by encrypted requests. The second more obscure thing was that since HA proxy does health checks on all the back ends to make sure that it's not load balancing against dead servers, I had to set this option SSL health check or something in order to, because otherwise HA proxy would think that all of my back end servers were down. And so this is easy to explain quickly, but when you're actually trying to configure this stuff, you know, it's not so easy. I think my friend Howard summed it up pretty nicely when he said, there must be a way except it's obscure and non-obvious and you'll forget next time. That's the Unix way. But the good news is things are getting better. It's possible to use configuration management systems like Chef to alleviate some of the pain that has been traditionally associated with Sysadmin work. So Chef is a configuration management system and so the brief definition, the briefest definition of that that I could think of was it's a place where you store information about how to configure your systems. But I think it'll become clearer pretty soon. So I know I realize that at this point there are about a million configuration management solutions, particularly in the Ruby world. They seem to be popping up all the time, which I find somewhat reminiscent of the test framework boom of 2009. But I'm a big fan of Chef. It's got an awesome community. The configuration DSL is pure Ruby. So how many of you know Ruby? Right, pretty much everyone, right? So you basically already know Chef. You just kind of got to read the API docs and figure out how the system works. So that's a huge benefit. Some of the other configuration management systems that will remain unnamed force you to learn yet another configuration language that isn't a real programming language. And that just seems kind of backwards to me if part of the goal is to stop having to learn all of these configuration languages. So let's take a look at how Chef can alleviate some of this Sysadmin pain. Take a quick walk through Chef. So the central unit of work in Chef is the recipe. Each one of these blocks here, these declarations, is declaring a resource in Chef terms. So I'm going to go through each... So I should say that this is, in case it's not obvious, a recipe for installing MySQL server probably on a Debian-based system just because of the package naming. So I'm going to go through each of these resources and just explain a little bit about how they work. The first resource says package MySQL server action install. Pretty straightforward. Install the MySQL server package. But this is a cross-platform representation of package management. So if you're on a Debian-based system, it's going to install it with AppGet. If you are on a Red Hat-based system, it's going to install it with Yum. If you're on Solaris, I assume it would probably install it with whatever the hell Solaris uses for package management. So right away we can see that some of the... Especially if you have a heterogeneous setup, some of the kind of just stupid details that you have to keep in your mind when you're doing Sysadmin work are starting to kind of fade away behind this Ruby DSL. Okay, so the second resource... The second resource is declaring a service in Chef land. And again, this is a cross-platform representation of a service. If you're on a Debian-based system, etc-slash-init.de, it's just going to execute the script directly. If you're on a Red Hat-based system, it's going to run S-bin-slash-service to execute the script. We tell Chef about which commands that this script supports, all the init-script support, varying sets of commands. And then we can tell Chef to enable that service, which means make sure that it starts at next boot. Again, this is the kind of thing that I always forget how to do, since I'm not really a Sysadmin, just something I do a few hours a week. And so it's really convenient not to have to remember how to do that and Google it every time. The third resource is a template resource. And templates are typically how we manage config files in Chef. We tell Chef about the source of the template in the cookbook repository. And we can set permissions and the owner and group of the config file. And then this last line starts to get into some of the a little bit more interesting functionality of Chef. It says, notify the MySQL service to restart if this template changes. And in order to understand how this works, you have to understand that Chef resources, at least the ones that are written correctly, and certainly all the ones that are bundled with it, are item potent. And if you're not familiar with that term, it's a mathematical term that means a function that when applied to the same value twice will have the same result. The canonical example of an item potent function is absolute value. Absolute value of negative 2 is 2. The absolute value of 2 is 2, and so on. And so what happens when this template resource runs is that Chef evaluates the template in memory, compares the evaluated template to whatever's on disk, in this case at etc slash mysql slash my dot cnf. If they're the same, it's a no op. So just nothing happens. But if they're different, Chef will back up the existing config file and then write the new config file to disk and call any callbacks that have been registered. So in this case, restarting mysql. So again, change a tuning parameter in mysql config, mysql will get restarted, something you don't have to remember to do. The templates are ERB. I'm assuming probably everybody in here is familiar with ERB. If you've ever done any web development with Ruby, I'm sure you've at least seen ERB. This is a snippet from our old HA proxy recipe. And what you can see that we're doing here is that we're iterating over an array of app server IP addresses and emitting a line of config for each app server to create this load balancer. So it's not hard to imagine here that if we were on a cloud, like, say, EC2, where we have API-based provisioning, it's not hard to imagine a system where we could run a command that would provision a new instance and run the Chef recipes on that instance and then run the Chef recipes on the rest of the cluster. And when they ran on the rest of the cluster, this array of app server IP addresses would be one element larger, which would emit one additional line of config, which would mean that the template would get written to disk and HA proxy would be gracefully reloaded if you have your callback set up properly. And so all of a sudden this process of adding capacity to a cluster is a one-command affair instead of this horrible manual process involving a lot of human intervention and praying that your config files will parse correctly. So the data that recipes consume are called attributes. And it's my opinion that attributes are the unsung hero of Chef. This line that says this list of recipes basically just says install these recipes on the node. That's the only line in this whole JSON file here that is actually specific to Chef. All of the rest of this is completely arbitrary data that you can write your recipes to consume. And what this means is that if you write your recipes correctly, or if you write your recipes what I would call correctly, you can actually implement generic instructions on how to configure a piece of software or a service. And then any specificity that pertains to your particular installation, for example, MySQL tuning parameters just go in the attributes. And so what that means is that somebody trying to install or tune MySQL server, if your recipes are sufficiently complete, might never have to open a MySQL configuration file and might not even have to be particularly aware of the syntax for it depending on the service. And so you start to have these recipes which are an entire abstraction of configuring a service. And opening config files becomes a thing of the past. It's just a matter of, in our case, writing a Ruby hash and running our Chef recipes. But we'll see a little bit more about how that works in a few minutes. All this stuff lives in the cookbook repository. This is just a kind of basic cookbook repository layout. The interesting things here are, under the recipes directory, you've got default.rb and server.rb. If you refer to, in a recipes list or a recipe dependency, if you refer to MySQL unqualified, it'll run default.rb. If you refer to MySQL double colon server, it'll run recipe slash server.rb. So this is a really convenient way to just organize your cookbooks. Like, for example, something like Nagios where you've got a host machine, and then you've got a bunch of nodes that it does checks on that you need to install servers, that you need to install services on, you can keep everything in the same cookbook, which is convenient. Under the templates directory, you can see there's a default directory, which is where in this cookbook, the My.cnf.erb file is residing. It's possible to specify alternate directories that are named after a particular platform, and so those templates, like, say if you had a directory under here called ubuntu-8.04, then if you're on hardy, any templates in the ubuntu 8.04 directory will result, will take higher precedence than the ones in the default directory. So this is really convenient, like, a lot of people run, a lot of people run FreeBSD as a firewall, and so if you've got multiple platforms in your cluster, this is just a really kind of convenient way to organize things. So, okay, let's look at some examples. HAProxy, I alluded to that earlier. Let's look at how we can solve that problem of it being a pain in the ass to remember how the hell to set up an SSL load balancer with HAProxy. So the attributes for my HAProxy recipe look like this. Basically we're setting up two listen directives, in this case one non-secure on port 80, balancing to 70 and 71 on port 81, and that's on port 80. My HAProxy recipe defaults to HTTP mode, so we don't have to specify that. But the second one here, the second listen directive is a little more interesting because it says SSL equals true, it's on port 443 and balancing to the same two machines on port 443, but that's not the interesting part. The interesting part is this, SSL equals true, because if you remember when I was talking before, that's not a syntax in the HAProxy configuration file, right? This is something that I coded into my recipe so that next time I have to do this, I won't have to spend like three hours googling around for what the hell's wrong with my setup when I'm getting all these cryptic errors out of HAProxy. So the implementation of this is really simple. This is a template, I'm iterating over those listen directives, and anyway, all this is kind of esoteric, but the interesting part is this. So I just say, if options SSL, then set it to TCP mode and set this SSL hello check option. It's really simple, but this is going to save me a lot of time one day. And since recipes are shareable, if they're sufficiently generic, it's possible that, or I hope that, I'm hopeful that in the future, you'll be able to just grab a great open source recipe that does a lot of this stuff for you, and you won't even, you know, for 80% of cases, you won't even have to worry about all these esoteric details associated with configuring Unix services. So my second example is heartbeat. If you're not familiar with heartbeat, it's a service for managing virtual IP addresses in a cluster. It's usually used to create high availability configurations. So in our case, we have three public load balancers, but only one of them is actually active at any given time. But they're all, they're in a heartbeat, and so if the active one goes down, then one of the other ones will grab the IP address and they'll be, you know, very little down time. So the attributes are really simple. What interface are we heartbeating on? Which nodes are in the heartbeat? It's a mapping of fully qualified domain name to, in this case, private IP address. Set a password for the heartbeat, because you have to, and then the resources, which are the actual IP addresses that are being shared in the cluster. And this is a mapping of the fully qualified domain name of the machine that should own that IP address if it's online to the virtual IP address. This is simple. The heartbeat configuration is actually not that complicated, but it's got at least three configuration files, depending on how you set it up. I can never remember which of these directives go in which of the configuration files. So I wrote the Chef recipe and it's just a matter of writing this really simple Ruby code, or this really simple Ruby data structure, and the heartbeat gets set up for me automatically. And we use this all over our cluster. And it's super convenient. Anytime we want to set up a heartbeat, I just write out this little bit of Ruby, and it's done. I don't have to think about any of the configuration crap. So my last example is really simple, but I think it's almost the most powerful of them, and obviously it has to do with security. So this recipe has no attributes. Just run the recipe and it will do its thing. And the recipe is almost equally simple. It just basically sets dependencies on two other recipes. The IP tables recipe and the SSHD recipe. The IP tables recipe by default just locks the machine down. Of course you can override that with attributes, but by default the machine is locked down completely. All ports are closed. And the SSHD recipe, just all it does is turn off password authentication. And this is really powerful because these two things are like, I would call them like the basics of reasonable security on a Linux machine, like a Linux machine that has a public IP address anyway. And a lot of people forget to do this. Like I've forgotten to do it lots of times. I've been in lots of companies that had popular websites where this wasn't done. And I don't know if you realize this, but most operating systems don't ship with any method for detecting a brute force attack on SSHD. So if one of your users has passwords enabled and SSHD is allowing password logins, you're going to get brute force. I mean it's just a matter of time because you have no way of detecting it and you'll never know. So we set this recipe to run by default on all new machines that we provision and it's just set and forget. We never have to think about it. It's really simple, but it's just one of those things that it's so easy to forget to do and we don't have to remember anymore. If you want to learn more about Chef, the Wiki is pretty good documentation. I mean it's Ruby, so you could. Okay, so where was I? Yeah, so the Wiki is pretty good. It's not perfect, but it's definitely a pretty good source of documentation. If you already know Ruby, it's like just basically reading API docs, so I'm sure everyone is pretty comfortable with that. The IRC channel is also like super awesome. The ops code guys who write and maintain Chef are amazing and really helpful and they just, you know, they will really, they're really eager to help people get kind of started and get through problems and file bug reports and stuff like that. So definitely check out the IRC channel if you're checking out Chef. Any questions? Yeah, I mean you can certainly write a recipe to build something from source. Yeah, I mean we do that. We just write like a pretty simple recipe to compile it and install it somewhere. Either that or we compile it in advance and then just untar it like in slash opt or something. That's the easiest way to do it typically because then you're not dependent on being able to get the file from somewhere and whatever and waiting on it to compile when you bring a new machine online. How many machines is vast? Thousands? Hundreds of thousands, okay. So we don't have hundreds of thousands of machines, so we don't have hundreds of machines. The biggest cluster we've had so far when we were on virtualized hardware was like 35 machines I think. The way that Chef Server and Chef Client work is that the recipes are distributed when they're run and they're run on a cron job all at the same time, so I would suspect that if you had hundreds to thousands of machines that might be a problem, but I don't know. Oh yeah, okay. So yeah, it just works. Any other questions? Yeah, to be clear, I don't use Chef Server either. We actually just use Capistrano right now to distribute our recipes, which is a little ghetto, but I've been working on a system that uses like some AMQP stuff because I'm not a big fan of polling, but anyway, there are lots, definitely lots of ways to distribute recipes to different machines for sure. Alright, thanks.