 Welcome to my session. I'm going to talk about Chef and M Collective and how you can use those tools to build your own clouds and control a lot of infrastructure. First, a couple of words about myself. I work in Berlin. I do a lot of infrastructure stuff, a lot of work around deployment. That's how I got drawn into my project called Scalarium, which is a configuration management tool for controlling clouds on EC2. The talk will be part of lessons learned on building such tool and the lessons that we took while working on a big infrastructure project for a client with 1,000 servers. What all of those have in common is that we talk about cloud computing, controlling clouds, so what is it? In my context, it's basically infrastructure as a server, so mostly Amazon EC2, but for a lot of people it also means in-house capacity like VMware, Eucalyptus and stuff like that. So servers via API. The problem is if you use one of those, what you end up is you get a wild mix of servers. You can boot a Linux server, you can boot an Ubuntu server, but it's basically just plain infrastructure. There is no coordination, there is no real system. And what you really want to have is some kind of a system that I can deploy my app to. I want a structured system that is fully configured where the individual components know where the other components are and what they're doing and where you have a lot of automatic configuration and so on. So what you really need on top of those plain APIs is automation. Automation for configuration, for scaling, for deployment. And this is the thing that I'm going to talk about. Of course there are commercial automation solutions like engineers that I mentioned, right scale. They do this mainly on Amazon EC2. But what I'm going to talk about is how could you build this for yourself. Which I don't recommend unless you have a very good reason to do yourself. But there could be reasons. So for example one of our clients they had 1,000 machines in-house that they already used to deploy a very large real estate application with millions of users. And they had 1,000 machines. And you cannot just like say, oh let's go to complete it to Amazon and just write off those 1,000 machines where we have like 10 full-time admins working on those. We have like very clear security guidelines and so on. Another reason as I said, governance security policies, if you're working for the government, if you have very sensitive data it's very hard to give those away. And mostly flexibility. If the commercial solutions don't fit exactly your need or mostly it's the other way around that you are so inflexible to come up with a good solution. That's, in this case what the enterprise company had to, like they were convinced they had to build their own because nobody covers our requirements of course. So what do we need if we want to build a system that automates infrastructure with thousands of machines? What are the basic ingredients? And those are like the core components that I want to focus on. What you do is you, of course you have all your thousands of machines that you need to configure and individual classes, individual projects. And you want to somehow control them. Controlling means you want to deploy your software. You want to start a new service. You want to replace failing systems. And mainly there are three components that you need. The first is, of course, this command and control infrastructure that has all the logic, how the instance should be used, what kind of configuration you want to have and that executes the commands where you go to in order to deploy something. Then you have the communication channel. And then you have on the individual host the actual configuration. How do I turn a fresh booted instance into my red application server into my process database? So those are the three components you have to implement. As I said, the first one is probably the one where you that is mostly individual to your constraints that is very dependent on your processes, on your protocols and so on. So the main idea is you talk, this component actually talks to the infrastructure APIs. So in the case of Amazon, it talks to EC2 to start service and it handles all the interaction with the user. So either via an API, via the command line interface or via the web UI. And it stores the configuration as I said, what do you want to do on the machines? I'm going to show you an implementation based on fork, which we use to do all the interactions with Amazon. A simple rate web app and a call to be repository. But more on that later. The communication channel is in theory very simple. It should just relay the messages, relay what you want to do to the individual machines. But the problem is you want to do build it in a scalable redundant way. You have thousands of machines. So you need a scalable solution. Just logging via SSH doesn't work. Because it doesn't scale across multiple machines. If it doesn't scale across data centers, you need some kind of an asynchronous scalable solution. And what I'm going to talk about is how you could do this with M Collective. Which is a tool that I think very few heard about, who already heard about M Collective? Okay, okay, a couple. The host configuration is probably the most obvious stuff. How do you actually turn the machine? And how do you actually execute the local commands? So mostly this will be Chef. I guess pretty much everybody already heard about Chef, right? Who has actually experienced what was working with Chef? Okay, because I have a part where I do an introduction to Chef. But depending on the time, it could be that I have to make it really short. So those are the three components that you need. And let's start at the bottom if we're going to talk about those. So the host configuration. Yeah? As I said, Chef. Chef is a Ruby DSL to describe how you want your server to look like. And then Chef does the magic of turning it into the machine that you desire. And the great thing is that it's a implemented Ruby. And the configuration is also Ruby. So it's a Ruby DSL. It's very easy to extend, very easy to look at, especially if you're a Ruby developer. But also, we've seen Java, Perl, PHP guys using it to build their machines. Because it's very, very high level and very easy to read. The basic idea is I have a blank Linux machine. Then I have my Chef executable that takes cookbooks. Cookbooks are in Chef. The general definition, how could a server look like? How do you achieve installing a patchy? What are the necessary steps? What are the recipes in Chef terminology? What do we have to do in order to achieve this? So your cookbooks are your complete repository of possible configurations. And then you have the configuration file that just tells it, okay, for this specific machine, I pick this script and this script and this script and I have those four configuration values. And then Chef goes and changes your machine to look exactly like you specified. In an operating system independent way, which is very, very great about it. Because you can have Linux machines, you can have open Solaris machines, you can have BSD machines. Pretty much every flavor of Linux and Chef will do the right thing. It will choose the correct package manager. It will know how to read and create files depending on the operating system. And at the end, you hopefully have a machine that looks exactly like you specified. It has rails installed with your specific version of Ruby, your gem sense on. If you look at Chef, there are two ways how you can run Chef in a big scenario. The first one is if you pop up the Chef documentation at the website and you read about it, you read Chef server everywhere. And Chef server is probably the way Chef is intended to run. So the idea is it's a client server architecture. I have the Chef server on the one hand, which stores all my cookbooks, all my global configuration, and I just have to upload it and change it there. And then I have maybe hundreds of notes that periodically connect to the server and download the configuration and apply the configuration if it change. The good thing is you don't have to maintain the configuration on individual machines, but it has a couple of drawbacks that I will talk about in a minute. And the other way how you can run Chef is focusing on the single server view. It's called Chef Solo, which basically means you push the configuration down to the machine and you just keep the part that does the actual bootstrapping, that does the actual logic of transforming a machine into what you want to look like. And those are the two possible ways how you can run it. And I think that you should focus on this way if you're running an architecture like this for a couple of reasons. So for once Chef server is very complex compared to Chef Solo. So everybody who started to look into Chef has to know about Rabbit and Queue, has to know what couch to be, it used to be that you have to use stomp and so on. So it's very complex. You had at the beginning they used OpenID for authentication. So it's a very complex piece of software. And what you want to achieve is pretty simple, just push configuration. The problem is that for all this complexity, you don't gain a lot. They don't have a concept of different environments or stages. So the new version of Chef is going to get at least a different environment. So you can say in production I choose this configuration variable and staging another one. But it's very hard to scale this approach if you have 1000 machines, 10 projects and each project has like staging production tests in QA. Because you have only one global repository of cookbooks. So everybody is going to step on each other's toes. The other problem is that there are no life cycle events. There's only one global Chef run. And you cannot really force it. You can only wait until the client connects and then applies the configuration. And it doesn't work really nicely with deployment, for example. If I want to deploy a new version, I want to do it now. I don't want to wait like half an hour until every client connects and applies the configuration. And while doing so, also checks of Apache is installed. But I just wanted to deploy the new version of my code. So the Chef itself or Chef server at least is more aimed at, I'm pushing out configuration changes. But if you want to handle like a complete life cycle of a server, including deployment, it's probably not the right choice, at least for now. So what we ended up choosing twice for like first for Scalarum and then for the big client project is an approach where we use Chef solo and push the configuration via, so either via Nanite or in this case via mCollective, which I'm going to talk about shortly. So the idea is that I use my communication channel to just push the correct config files to push the correct configuration and cookbooks down to the machines and just let Chef execute it and do what it can do best is apply transformations in order to make my machine look like it should be. So yeah, because of the time I'm not going to talk too much about Chef, I just like going to give you a brief intro probably for those who don't know how Chef roughly works. So the main idea is that you have cookbooks for the different things that you want to achieve. So in this case, I have a MamcashD cookbook that I use in order to control MamcashD, in order to install it, start it, stop it and whatever. And the general workflow is I have attribute files. So I have a folder where I put attributes files, roughly like variables that I can extract or same defaults. Then I have the actual recipes. So the recipes are the part that actually has the logic of how do I install MamcashD. And I have templates which I use in order to create files on the system. And like all Chef recipes, all Chef cookbooks have this basic structure. There are a couple of, there could be a couple of other things like files and metadata and so on. But this is the core of Chef. I have a cookbook that consists of attributes, recipes and templates. Yeah. So an attribute file just declares defaults. So in this case, we declare some defaults like what should the user of MamcashD be, what the port where I want it to listen and what should be or how much memory do I want it to have. So this way you just extract variables so you don't have to use the same value everywhere on your recipes. And by doing so, you can nicely also override it. So attributes have a way of overriding them, which I'm not going to talk about too much, but you can override them, which is great for having defaults and then have the user decide on different values. If we look at a recipe, so this is the actual gist of Chef. In this case, we're defining a service. So we're telling Chef, there is a service on your system or maybe it should be, but I can control this entity called MamcashD and I define how do I start it, how do I stop it, and what we want to do with it. In this case, I don't want to do nothing, I'm just defining it so I can reuse it later. And so this is the service. So I have multiple recipes. In this case, this is a service one and then we have the install one, which includes the service one because I don't have to really declare the service everywhere. And then I'm actually installing a MamcashD. So I'm calling out to the package resource in Chef. So it's a Ruby method that installs a package for me. And so I'm installing two packages and then I'm creating a template. So I create a file on the file system dynamically with the input that I have. And the template is just an ERB template, which probably most of you are familiar with. And then the ninth addition is the notifies on the bottom which tells Chef if this template changed while you created, or if the version you created is different than the one of the file system, fire this event. So in this case, we're restarting MamcashD if the configuration file changes. So if like I install the system once and a month later, I change, for example, the memory value and I rerun this, sorry, I rerun the Chef recipe. It will restart MamcashD only by changing the config file. So it's a very nice twist in order to be able to declare dependencies but not like being restarting services all the time if nothing changed. And the last thing that is missing is the template. Pretty simple, just a very simple ERB template where I'm accessing the default values. So in this case, I'm accessing the memory and the port and the user definition. Very simple. So how do I run it if I have those lying on my file system? I first have to have a configuration file. So this is just a plain JSON file that the most important part is the run list part that tells Chef what do I want to execute. In this case, I want to execute the MamcashD install recipe and I want to access another recipe that we haven't seen. But the nice thing is I can override the memory value as you see. So it's very easy to have defaults and then in your JSON file override them. And once I have this JSON file, I can just call out to Chef Solo which is the part that takes the config file, takes the cookbooks, changes your system. And this is how we call it via MCollective. So we have just Chef Solo installed in the machines and we push new configuration values down and call Chef to in order to execute it. You can do a lot more with Chef than just installing packages and starting services. So you can create users, you can create routes, you can create file directories, delete them, execute arbitrary calls. So Chef has a lot to offer. But I hope that you got just how Chef works. Probably the important question is how do we do deployment with Chef? There are two ways. One is you can use the Chef deploy recipe which is also a resource where you can specify please check out the code from here and put it there and then maybe fire callback which works nicely for like source dependent deployments like a typical Rails process where you check out code, touch temp restart on passenger and restart it. And it's compatible with the Capistrano way of deploying. So it has the same file system structure. So you can like deploy via Capistrano once you've set it up with it. And the other way is to have just an arbitrary script because we can, everything that you can script you can do with Chef. It just has like nicer capabilities to express yourself. So you can call, you can like download the VAR file from somewhere, you can like do a code check out, then do a configure make make install. So the only thing is you have to write those steps yourself. But you can do pretty much everything with Chef. And it's then nicely wrapped in error handling and notifications and so on. Yeah, this is an example probably not important too much how a deploy resource looks like. Yeah, so this is the component. How do we actually change the local system? We use Chef for it. So then the next question is how do I, how do I propagate the messages to the instance? How does, I'm a rates application server know that it has to deploy now? And we do this via mCollective. mCollective is, if you look at it website it says it's a framework to build server orchestration or parallel job executing system. So it's, it's designed to propagate messages reliably across probably thousands of machines and allows you then to implement the actual logic of the orchestration. So it's just a means of communication. It works basically by using ActiveMQ as a broker, as a message broker. And the idea is that you have agents running on your systems that you want to control. And then you can talk to the, to those either via the Ruby API. So there's a Ruby gem to do it. Or there is a small command line interface where you can do pretty much the same thing. And the idea is that you just call agents on remote systems. And those agents have to be implemented by you and could do whatever you want to. So how does it work? The general idea is first you have your client and you ask the broker, I want to call us, like all agents that have a certain condition. I want to know, I want to call a chef on all of those. So what mCollective does it first it discovers who is online, who is listening to those messages. Because it could be that a couple of machines just shut down or crashed or you've put it up a couple of new ones. So first it has a small discovery phase where it finds agents that respond to your message. And then you can call those actions and mCollective will then return the individual results to you. So you can inspect and see, oh, the call failed on this instance but it succeeded in those 10. And by using ActiveMQ it's a very, very scalable and fall toned away and asynchronously. And it scales nicely over thousands of machines because you're not like waiting for the individual responses. You're just publishing messages and it's very easy to scale this. How does an agent look like if I have to implement one? So this is like the simplest agent that just echoes the data back to you. So you validate the input where you can say I only respond. So in this case we have the Hello World agent with an action echo and I validate that I get a message that is a string and then I just reply with it. So very simple. And if I want to call it there are two ways. The first one is via the command line. So mCollective RPC. So I'm calling the Hello World agent with an action echo. I give it a couple of arguments and you can see at the top it's firstly determine how many agents are listening, how many are reachable and it has a timeout of two seconds by default. So in this case it found one agent listening and then it gathered the responses and at the end I've printed out the actual output. The other way how you would call it is out of Ruby. So this is a very simple script how you could call it via Ruby. So you just require mCollective and you find that I want to have the RPC client for Hello World and call the echo action on it. It's very simple. Nothing too fancy yet. So what is great about mCollective is because it's using active MQ you can use like broker possibilities like fan out and topic broadcast to find out who is listening to your message and who should respond. So and mCollective those are facts and filters. So the idea is that not all my agents are equal. Some of those should be application servers, a couple of others should be database servers and then I have load balancers, I have caching servers and what have you. So I can have user defined value so those are usually classes which the agent loads on booting so you could have just a simple configuration file on every machine that tells you you are web server, you are load balancer, you belong to project A, you belong to project B and then you can filter by those. Another mean would be facts. So facts are things that the agent can introspect about the local system like what packages are installed, what services are running and so on. And the great thing is that you can extend it via Chef in Ohai so you can get information that Chef knows like what packages are installed in order to use those as fact filters in M-collective. So how would you do this? It's very simple again via the command line you just say I want to find all hosts that have a certain fact so in this case I'm not calling an agent but I'm just finding what hosts are available and I only want to return those that have the fact country is UK. Just an arbitrary fact. If now no agent has it nobody will respond to me. And the other thing I testing for a class. I can do the same in code so this is how I would do this in Ruby, how I would filter what kind of agents should be responding to me so in this case I'm only interested in dev servers and only in servers that have the country flag set to UK. So this way it's very easy to leverage the capabilities of a broker to find out specific hosts like I want to deploy to all my production machines that belong to project A. And those are just filters and M-collective make sure that it only calls the correct hosts. So in our case this is an example of the logic in the command application that called Chef on a couple of machines. So we just have a client that we instantiate and then we call the actual RPC method on the Chef agent and gather the responses and then return it. And of course somebody has to introspect those and see did everything succeeded or not. And the other part is the agent running Chef or calling out to Chef which basically the most important part is just the call to Chef solo. And the other part is extracting the JSON configuration file that you're giving it and making sure that you can log it and so on. But the basic idea is I use this, I use M-collective to call out to Chef and I always give it a JSON config file for example. And this way I can scale it across hundreds of nodes. So how do I can ensure security in such a setup? If you have a lot of machines, if you're so big, of course security is a big problem. How can I make sure that like nobody can hack it to my box and then control 1000 of machines? That a couple of different means how I can do this in M-collective. So the first is at the client level. So of course the client has to connect to ActiveMQ so you can have an ActiveMQ username and password. Probably more secure is using AIS and RS8 plugins where you can encrypt and sign every message and then every message and then the actual agents can verify those and say I have the certificate of this user and I can verify that this message is authentic. A simpler version would be just using SSL which gives you the signing part but not the encryption part. And of course you can use TLS for all the communication with the client. So that's a broker. On the broker side you can have ActiveMQ permissions to say what clients can talk to what topics and to what exchanges on the broker. So you can say that you have different versions of your client and one can only talk to production machines and the other one can only talk to staging machines. And at the agent level you have everything that because the agent is basically just another client. So you can have authorization ACLs and you can have auditing. So you can just have a very small plugin that logs every command and what certificate. So basically what user called what command and log it somewhere. Those plugins are included by default. So it's very easy to have an auditable system where it can make sure only certain users can call certain customers. How do we scale such a setup? How do we make sure ActiveMQ or M Collective works on thousands of machines? As I said it's not too difficult to do because your only problem is scaling ActiveMQ which is a solved problem. Probably not always very nice but it's you can have a network of brokers. So you can have for example per data center one and you can talk to each other and relay traffic. You can have clusters so that the client always tries to talk to multiple ones and if one fails it tries another one. And you can have master slave setups. So it's not too difficult to have a very scalable and reliable infrastructure with ActiveMQ. Of course in practice it's always more complex than that but at least there is a way where you can do this. The nice thing is that ActiveMQ is very very good at handling thousands of messages. So even if you have hundreds of machines you don't need like 20 brokers or something. So what we did in this case was having like one broker per data center and then just one aggregation broker where you can then fan out to the individual data centers. And then you can handle the command and control part. This is probably the most individual part so I don't have like too much to show but more a couple of general concepts if you want so. So the responsibility is this is the place where I store the configuration. I store how should my clusters look like. What instances are part of what setup and how it interacts with the actual infrastructure provider. So it starts and stops the instances. It replaces failed ones. It allocates IP addresses and so on which is probably the easiest part and the boring part because it's just a couple of very simple API calls. More problematic is handling recovery and presence which you cannot do so nicely in this case. So I think queries with every call what service are responding so it doesn't notice if an instance is failing. So if you configured I want to have 100 instances in this cluster and you say please deploy and only 99 are responding and collective doesn't care and because it always does like an on-demand discovery it will not notice anything like an event like an agent disconnecting or something like that. This is tricky to do this right. But the main responsibility is in the end to generate those configuration files for Chef and depending on the role, depending on the configuration. This part is probably similar to all cloud management software that are out there and this is the first part where most projects agree on the custom part is at the lower end so user management authorization like the business processes how should you restart things so this is where it gets complicated in big companies and this is the main reason why they want to have usually their own custom solution. Because they say we always like before we deploy an application we have to get like an acknowledgement from somewhere and so on so it's usually very complicated and usually very custom. Not very efficient but those are enterprises. So this is the part where you have this custom logic like who should be allowed to do what kind of functionality do we want to offer to what users and all the deployment recipes of course are usually very very custom. So if you look at for example one configuration how it could look like so this is a very simple JSON representation of the role which is like the core concept of how do you want to how do you split your clusters so in this case we have a rate application server role where we define multiple chef recipes that we want to have and as I said this is one drawback of Chef server that it doesn't know events so you only have one global chef run but in this case we have a set up event, a configure event, a deploy event and undeploy and shut down where we can find very granularly define what do we want to do on certain actions and those would be the things that you configure for your customers and of course you can add what kind of instances should be running in this configuration. So in our case the architecture looked like this so we had a lot of so we had a very simple rail tab that is the endpoint for the user either via API or via the web which just manipulates this storage that I just showed you in this model of a role of cluster. In our case we stored it in but it's basically just a means to manipulate a model that you have of your system and then you have again on the back inside a lot of internal agents that then go out and call EC2, call the actual agents on the system and push to respond to the changes, update the model that you have in the database and then the user sees oh deployment failed or whatever. So in our case all those internal agents are either also M-collective agents or there could be just simple risk worker or something like that. But the general idea is that you use again the message bus in order to publish what did we or what from our experiences what works well and what doesn't work well in the system. We made the experience that Chef is probably a great tool for doing configuration but it's also very easy to write spaghetti recipes. So it has a low learning curve but it's very easy to get very very unreadable recipes and especially in bigger enterprises the idea of having to execute the configuration always and always again is not very appreciated. So what we started to do is to go more and more into the packaging step so to use Chef to only execute package managers for example configuration but for the boot strapping. Another thing is that that Chef has some annoyances and if we have a couple of minutes I can probably talk about those two so one is that Chef is not really idempotent and the other is that the two phases and what I mean by write once and test everywhere is you write one recipe and then Chef executed on thousands of machines but the Chef run is dependent on the local machine. So for example we had a case where like one of 10 servers will not compile PHP correctly. Nobody knows why, it's just like some weird case and the better solution is of course to have a package version of PHP and then where but then how do we get this package from so you have a Chef script that then creates a package and so on but it would be nicer to be able to have a binary output out of Chef run that you then only propagate to machines which is also a lot easier to verify if this machine has all the correct configuration and has something that Chef is missing. From our perspective MCollective worked out great as a tool. The only problem is as I said the missing presence and event notification so you have to have some kind of an internal agents that do keep alive checks and heart beats and so on. Nanite which is a different agent framework that also it has those but it has a couple of other problems that we can talk afterwards probably and a very nice information or a very nice change in MCollective is that the guy who wrote MCollective works now at Puppet Labs so there is going to be probably a much tighter integration with Puppet in the future with MCollective which means that MCollective probably will have built an agents for doing the same thing that I just showed you to do with Chef but we don't have to reimplement it where you can already just call out to Puppet. Yeah so we have a couple of minutes so are you interested in why Chef is not item potent and why two phases of Chef suck? Yeah? Okay so the first annoyances of Chef is that Chef looks item potent but in reality it isn't or at least you could again say it's a bug feature so in this case we're looking at a directory resource of Chef that should create a directory for me in this case I want to create data logs and it should have the mode 0644 it should belong to Mike and it should be of the group users and if you don't have this directory on the file system Chef will create it for you exactly like you specified it what happens is if the directory already exists Chef will say oh the directory is there I will just skip this step at least what I would expect is that Chef will then see if the user and the mode and so on is correct or not but Chef doesn't so if you have like a server where the directory is already created and another one where the directory is not existing and at the end you expect the permissions to be the correct ones on both machines you're wrong if you want to you could say is it a bug or is it a bug in the directory thing so there are a couple of other places where it's like this but especially if you have an undefined state of your machines beforehand it can be tricky because you have to then to ensure that the permissions are correct and so on another thing with Chef is this one would be easy to fix just to make sure that all the resources also to make sure that all the sub attributes will always be correct the other one has more to do with how Chef itself works so if you have a more complex Chef recipe like this one I want to touch some file on Red Hat and then I check if this file exists I want to restart Apache does anybody who maybe has done something with Chef see a problem so it will the second part will never be executed even if we are on the Red Hat machine the reason is simple is because Chef has two phases the first one is the compile phase and this can be very annoying if you are playing with Chef and you don't understand why it's not working because this is totally valid Ruby right the problem is that it's not doing exactly what you expect so at first Chef loads all your recipes and compiles them to its in-memory representation and you want to do and change on your system and then it goes on and executes it so the result of the execute method is not that the file is actually touched it is that the in-memory representation of Chef would change to know that you want to execute to touch some file on the file system so when the if statement is executed it wasn't created yet so at first it loads everything and then it executes it all at once of course in the same order and so on but because of those two phases you have to to use a couple of other helpers in this case the correct solution would be to use the only if there is an only if and not if helper that do the check during the runtime and not during the compile time and the other alternative would be to execute it right away so during the compile phase to execute the resource and yeah, so this is very important to know if you're working with Chef like nearly everybody runs at some point against those two bugs or features of Chef and it's very annoying yeah so I'm closing with a couple of bad remarks on Chef any questions? yes please yeah so we do this by out of the recipes to call out to different web services and to see of those already so a typical thing is to that you only want to restart your application service if the database server are accessible are running and we do this by also supplying information about the cluster state inside the configuration file that we put down so an instance always knows what the state of the other instances are and then inside the recipes you have a check that for example the restart recipe of the web server would then check are there any database servers available or not and if there aren't it would fail and you would get the exception back upstream where it can then introspect it or you can of course automatically handle it and retry for a couple of minutes or something like that yeah so the question is how do we bootstrap Chef on the machines so we on EC2 we use the user data script for it so on Amazon EC2 you can give machines arbitrary user data so metadata if you want so and inside this we give it a simple shell script to install the like the agent and Chef and what most Linux versions on EC2 will do when the init script detect that they are on EC2 they will check if there is this metadata available for this machine and if it is they will download it and if it starts with a bin as H or a shebang it will execute it so this way we bootstrap the machines on EC2 on the in-house systems with VMware we had the agent already on the images but with EC2 it is very simple using the user data script so Capistrano is definitely a much more simple tool so if you have only a couple of servers that you don't bootstrap on a daily basis Capistrano is a lot simpler, a lot easier and I would recommend using it if you have a lot of machines it is not very easy to do this by Capistrano because Capistrano uses Synchronous SSH connections which means you cannot do a lot of those because usually firewalls will prevent you or network congestion so you cannot do 1,000 SSH connections push a lot of data over it a lot not very fast and not simultaneously if thousands of things should happen to your servers so what we ended up doing is porting all things that we did with Capistrano to ChefDeploy the nice thing is that ChefDeploy also supports the callbacks of Capistrano so everything that you can do before Update Code Chef supports the same callbacks but you can still because of the file system layout is compatible for one client they wrote a simple gem that then takes those callbacks and export your CAP file so that you could still use CAPDeploy from the command line and I think they were also thinking about writing a Capistrano plugin that actually only relays all your commands to MCollective yes ChefServer has data bags which is basically arbitrary data that you can put in that you can then use in your recipes or cookbooks and access and we do pretty much the same so in the JSON representation where I store information for this role I have like arbitrary JSON that I can put into too and then we of course give it all yeah so if you look at the architecture so everything is basically stored in CouchDB and then the agents look at the database and generate a JSON configuration on the fly and push it down and it's very easy, very fast and this way you can in the database we store what systems do you have, what representation should they have, what data bags do you have and then generate the JSON and put it down to the instances yes so because you have now a central place where you can query all instances and store monitoring data for example so we store a lot of monitoring and metering information in Redis and then have an agent that every couple of minutes looks at this data and decides should I start and stop more instances and it's very easy because you can query the instances and then you can just start once and they will pop up automatically bootstrap via Chef and be available so thank you very much