 I'd like to welcome Steve Baker, talking about deploying to the cloud. Hi, I'm Steve Baker. I'm an engineer at Red Hat who works on the OpenStack project. Specifically, I work on a project called Heat. And I'm going to talk today about using golden images to deploy docker containers in an OpenStack environment. So, I'm all about orchestration. And just one thing I wanted to explain first is the difference between a declarative and procedural orchestration. So procedural or imperative, it's a list of instructions. Do this, do this, do this. It looks like code. You've got loops and conditionals and whatnot. Whereas declarative orchestration lets you describe the state of the world you want to be, you want to exist. And you give that description to something else and it's that something else's job to figure out what operations need to happen for it to actually come into being. And both kinds of orchestration have their places. You can use a mixture of both and they work well together. There's different situations where you'd want to use each. So, what is Heat? It's a rest service for the declarative orchestration of multi-tenant OpenStack cloud services. OK, that's kind of complicated. So let's break that down. Let's start with OpenStack. What is OpenStack? Fundamentally, it's API-driven infrastructure. You have a collection of REST APIs. Each API creates a thing for you. And in the cloud, those things either give you some form of compute, some arrangement of networking, and various different kinds of storage. Multi-tenant. What does that mean? That means in a given cloud, you're going to have multiple users using the same infrastructure and they'll be sufficiently separated that they won't interfere with each other. Declare of orchestration, hopefully we've covered that already. And Heat is a REST service. So just like all of the other OpenStack REST services, Heat's just another one of the services you can consume. In effect, consuming all of the other OpenStack APIs in a slightly different way by defining a text file template that describes what you want to create instead of calling those APIs directly yourself or with some other tool. Hey, Kubernetes. That wasn't in the title of the talk. So what is Kubernetes? It's a REST service for the declarative orchestration of containers. Awesome. Containers are cool. Kubernetes is cool. It's quite a young project, but it's evolving very rapidly. I want to orchestrate containers in the cloud. But, you know, there's no orchestration service in the... Sorry, there's no container service in OpenStack. There's no OpenStack container API. That makes me sad. Kubernetes and Docker are not multi-tenant APIs. That makes me even sadder. So what I'm going to demonstrate here is my particular solution to their problem. As I said before, heat interacts directly with multi-tenant APIs. If it's not multi-tenant, heat really has no business interacting with it directly. It's a convenience security trade-off that makes sense for us. So, you know, one of the most common APIs you'll be interacting with is compute. You bring up an over VM and you want to do something with it. You want to run some software on it, your workload. And in the heat template, there's always been a way of doing that. Traditionally, it's been when the server boots, you give it a packet of boot time user data and generally your VM will be running something like cloud in it. So you'll be defining metadata, cloud config, descriptions, shell scripts. So in heat you can do that. Now there's extra resources that let you define, you know, any number of scripts in cloud config and a multi-part MIME that can wrap that all together and then just point that at the server and it'll boot with that config. However, any change to that config will result in your server being rebooted during a stack update, which isn't always what you want. So now we have some more resource types, config resources and deployment resources. A config resource is really just a container for config of some kind. It could be anything, it could be a shell script, it could be metadata for whatever tool you've got on, it could be puppet, ansible, salt, configuration. But it lets you store it in your heat template or at least associated with your heat template, which keeps everything one place which could help with the maintainability of your cloud application. So when thinking about how to orchestrate containers, I was looking at the architecture of Kubernetes and it has a number of components. One of the components is called kubelet, which is a service which does the actual bringing up containers bit. So my idea is to abuse slightly the architecture of Kubernetes just use kubelet by itself. Kubelet normally listens to EDCD to be told what to provision, which comes from the Kubernetes manager APIs. But it also has a mode where you can configure it just to watch a directory and it will just bring up whatever containers are described in the files in that directory. Kubernetes has an orchestration unit called APOD. APOD is a collection of containers. Now if you're embracing the one process per container paradigm, which is actually quite nice, what you're left with is still a fairly fine-grained thing that you end up managing. So having APOD allows you to group together containers that logically belong together. It also has some other nice features like a shared volume description within the pod so that you can have a shared state between containers. There's some networking niceness. Any container port that one of the containers exposes shows up as a local host port on the others so that you can hard code more of your service config. So generally speaking, I'd like to advocate that you know, pods are a really good unit of orchestration. Whatever container orchestration tool is out there. And hopefully there'll be more things than just Kubernetes that adopt it. So what you don't get with Kubelet on its own, no service load balancing. It'd be nice to be able to just bring up a bunch of identical pods and have some kind of load balancing on top. You can do that in the pure App and Stack world. You can do it in the pure Kubernetes world. With a bit of work, you know, it might be possible to put something together but you get into a point where you should just use full Kubernetes. Another thing you don't get is scheduling. You have to choose manually where you're putting your pods on your VMs. And in this case, you're choosing that by explicitly saying in your heat template, you know, this deployment goes into this one. Actually, Nova has a pretty good scheduler as well. And there is work to break it out into its own standalone service. So one day it might be possible to write a heat template which uses this scheduler service to schedule arbitrary things into an arbitrary pool of stuff. But until then, it's either manual or using some other scheduler. So this is the simplest heat template I could fit into a single slide. It's got three sections, a definition of parameters, which is the things you pass into heat when you launch a stack. It's got a list of resources, which are the, you know, each resource roughly maps to an API that creates a thing. And then you have a collection of outputs where you can build data for the user to do things with the stack after it's been launched. So here we've got just a simple server running with a key that allows me to SSH in. So I was wanting to demonstrate something actually happening. So I was trying to figure out what to actually create and provision in containers. And I decided to do something that I would actually find useful, which is a standalone heat appliance. So it's normally a multi-tenant tool running in the cloud. It's part of the service catalog configured in Keystone. It can and increasingly will be able to be run in a standalone mode, so it's not as integrated with the open stack that it's running within. So I wanted to build something which runs the latest heat, whatever is checked out on my laptop, against a cloud which either doesn't have a heat or has an old heat, just so I can play with it, try things out. So that's what I'm going to demonstrate. And because of that, lucky you, you get to learn about the heat architecture. So at the top we've got a heat API which is an HTTP service which just listens for requests and passes them on to a message queue. Heat engine is attached to that queue, takes requests, does work, stores state in a relational database and then returns the result. This scales out in a fairly typical way. API is stateless, so we can put a load balancer on top. Any heat engine which is attached to the queue gets its share of work, I think essentially in a round-robin basis. You'll see there that the heat engine relationship goes both ways, increasingly heat engine is calling itself or calling another heat engine to do bits of work. That'll get more and more fine-grained in the future, so there'll be less of the global stack lock where some work remains on a single engine for the entire duration of a particular stack operation like a create. But I want something much simpler. I just want my own personal appliance, bring it up somewhere, do stuff with it. So in this case I'm going to do it in a single pod. In a more elaborate setup, the relational database might not be in containers at all or it'd be its own pod. I'd quite like to use a database as a service, so just take it out of my hands entirely. If you're able to queue itself, that could go into its own pod or if there's already some shared infrastructure there, then use that. But heat engine and heat API, while they don't have to, they can exist in the same pod because they don't talk to each other directly. But if we're talking about a scaling unit, we may as well scale them up in API engine pairs. That's a perfectly reasonable way of doing it. So here we have four images, four Docker images we need to create. Actually there's two, for RabbitMQ and MySQL, there's some perfectly serviceable images out there already. There's a project called the COLA project, which is used to bring up a full open stack in a container environment. So I'm just going to reuse the images from that for those infrastructure pieces. I can't use the heat images from COLA because I want to run from source rather than from packages. So I'm rolling my own for them. And here's the Docker files. There's a base file that the other heat images inherit from. It's relatively boring because all the fun stuff happens in the install heat shell script. By having a separate script, I don't get as many intermediate layers, it gives me a chance to install the build chain, do the install of dependencies, and then at least partially clean up the build chain, tool chain. So I end up with a slightly smaller image. They still are quite big images, because I didn't see the make your small images, small talk on Monday. So that's our Docker images sorted. But we're advocating image-based deployments throughout. So we've also got a VM that we're launching. So we really should be embracing image-based deployments with them as well. There's a tool from the Triple O project called Disk Image Builder. And I've used that to build a custom VM image, which has a few dependencies that we need to configure. It needs Kubernetes and Docker. Disk Image Builder has a concept of elements. Each element does a piece of image-building work. So there's a heat config kubelet element, which configures the kubelet service to run, sets up the bridge network that Docker has started with and whatnot. Currently it's Fedora only, because I use lots of SystemD. I actually found it really handy. It solved lots of problems. It was quite nice. Now the other thing I'm doing is the Docker container images. I'm putting them into a single table, including that in my VM image. I was just playing around with things. It's also perfectly valid to run your own private registry or get the images in some other way through a push. But what I was really trying to avoid doing was relying on the Docker Hub. I know it's been said a few times in this conference, but there has been some issues with vulnerabilities with Docker images. But it's more than just, hey, don't run random code off the internet in containers because it might break out. These recent issues have been more about the mere fact of calling Docker pull. By the time the image is on your machine and unpacked, it's already too late, you're hosed. It hasn't even been run yet. And more than that, what image Simoning is there doesn't give many assurances that the image actually matches the signature. So a good way of mitigating this is just to disable the Docker Hub for pull requests. Either don't use a registry at all or use your own private registry. So make sure you trust your images. Anyway, back to heat. So I've got some more template snippets. It's a bit too big, but I've got the config resource and the deployment resource, which I'm going to show you, which brings up the appliance. Here is the structured config resource, which means that the config format is going to be just more YAML. And here we have a bog standard pod definition file that you could feed to Kubernetes. The one thing that's different about it is it's got these gate input placeholders. Throughout for specific values. So I've got a rabbit in queue, MurrayDB, heat engine, a bunch of placeholders there for the configuration and heat API. So that's just a blob of configuration, which doesn't do anything yet. So further down we've got the deployment resource, where we associate the config resource with a particular server. And here we're overriding those gate input placeholders with actual values. So some of the things that we're passing in as parameters to the template. Some of the things like the secrets resources. And here we've also got the IP address of the server so that we know what to connect to when we've actually got something to do. So when we launch the stack, what's going to happen? Heat launches a VM with my custom kubelet enabled image. He builds a package of data describing all the deployment resources that that server needs. And there's a few ways of getting that data to the server. Generally it's poll based. An agent on my custom VM fetches that data and writes out the pod template files to where kubelet is expecting to see them. Kubelet does its magic. So now you've got containers that are running with the exact config that you need. Currently the VM agent is monitoring Docker directly for container creation and when it sees what is it expecting, it signals heat with the results. Now there's not a lot of documented API for kubelet itself for finding out more about why a container might have failed. So if it doesn't work, you're a bit hosed. But that is one area where further work will be required. But I'll talk about that later. So I'll just play this pseudo live demo. I'm not that crazy to do a live one. So we're launching the stack here. We've got a fresh heat. The only parameter that I need to override is the OS Auth URL because it includes a tenant ID that tends to change whenever I launch my open stack. So I'll pass it in every time. So we've created a stack called heat standalone. We'll just see how the resources are progressing in creation. Okay, the server is still in progress. That's going to take a while. So I've skipped this bit because there's a lot of sitting around while image data gets copied. So that's not particularly exciting. If the image is so small, it would be really quick. But that's an optimization exercise. So let's see where we are now. Okay, all the resources are up. Let's pop into Horizon and see how these resources relate to each other. So it's a bit of a quick jump, but let's see if I can follow my mouse. There's the deployment resource, the resources that hang off it, like the secret resources, and the config is the server. There's the security group for SSH again, and there's the floating IP address resources. So we have a complete stack. I've defined an output on the template which spits out a blob of shell, which lets me adopt the configuration to point at the standalone heat instead of this multi-tenant heat. So once I've switched to that, any heat commands are now pointing at my standalone heat, and we can see that stack list is empty because there's nothing there. I've just created it. So let's prove that this new heat actually does a thing. We're just going to launch a simple server. Standalone at the moment, there are some things that can't work because heat requires a certain level of integration with keystone authentication, special users, special domains. That's not going to be the case forever. Hopefully by kilo, it'll be possible to run standalone and get all the important functionality. So there we go. So we do an overlist. We can see that we've got one VM running heat standalone and one VM running the thing that heat standalone provisioned. So now that my stack's up, I've got a heat, what's going to happen throughout the lifecycle of the stack? I'm going to be building new Docker images and I'll be wanting to deploy them. I can do that with a simple heat stack update. I can just pass in the different image version and that'll come up. I might want to do more radical changes to the architecture of building a lifecycle, adding new resources, adding removing containers, shopping things around. Many of those I can do that with a heat stack update as well, just with the new template format and it'll compare older new and just do the correct operations. There may well be some lifecycle tasks which can't be represented as a stack update because there are order sensitive things which can't be inferred from the difference between the old and the new template. So in that case there's going to have to be some kind of procedural workflow and you could do that in any number of ways. But chances are even that workflow is going to be a series of stack updates doing a little bit at a time for the order sensitive things. So here's that slide again. Here's the evolution of heat software configuration. We've done boot time, we've done config deployment resources that call a configuration tool. The next step is what we've got now is we've got these deployment resources driving some service that's run on the node, on the VM. In this case the service was Kubelet. There's another hook that interacts with Docker as well. So what's the next step after this? This is configuring a service on a single node. The first step is to do a better job of interacting with full clusters that heat is brought up. Obviously a full Kubernetes would be nice but also any other cluster-based app that runs on ETCD or things like MeSauce. So what are we trying to achieve here? I think what we're trying to achieve is have the heat template be the place where all the stuff lives, the cloud resources definition plus the data that is passed to the cluster for the cluster to do its work. Also clusters need to change size, scale up or down based on the current load. There needs to be enough information coming out of the cluster for heat to know that it needs to scale up or down for a particular workload. That's all sort of in place already but it needs to be a lot easier and we need to interface with specific cluster technologies. So that'll be the focus in the future I think. So next steps for this particular software config hook will be nice to get some said-byes of stats out of my node and there's a way of doing that with a deployment resource that can also have outputs so I can float those up to outputs in my stack and get some kind of idea of what's going on. I think what I'll do next is try and bring up a full Kubernetes cluster with heat. Use the same config and deployment resource pattern that I've got but instead of passing it to a single node I'll pass it to the cluster or the master in the cluster and let Kubernetes decide where to schedule it. Also it'd be nice if I didn't have to build a custom VM image so I'm going to give it an atomic OS, a try. That's a strip-down OS that's dedicated to be a Docker host. It already has Kubernetes installed on it. So all I need there is like in my custom image I've got a fairly fat agent which does the polling but if it's not a custom image then it's... it can be installed on boot time but it'll be kind of tedious so it'd be nice if I could write the most bare minimum agent which can be injected at boot time and that can be used to be the interface between heat and the Kubernetes API. Another avenue is to encourage Kubernetes to declare stable interfaces for its internal components. They've had a vague indication that they would be happy doing that. I wouldn't want to push it because no doubt these internal interfaces are probably still quite not very stable so we'll just let that evolve a bit and then it'll be nice if these services which are useful on their own are actually encouraged to be used on their own. So there's a whole lot more Docker things and container things going on with an OpenStack project than what I've just showed you. So I'll just go through them now and give you an overview. So this is the Nova Docker driver. This is a Docker plugin that lets you bring up containers instead of full VMs. It's out of the Nova tree now and they seem to be quite happy with that situation. In many cases it might suit people's needs. The conceptual mapping between specifically Docker containers and VMs is not perfect. So there's some Docker features that you just can't use. I don't know, maybe it's improved a little bit. But it's not the ideal situation. But it's certainly there is an option. Unfortunately an option that would require your OpenStack operator to install to actually use. Now remember I said that if it's not a multi-tenant API then a multi-tenant heat has no business interacting with it directly. However if it's not a multi-tenant heat, if it's your own personal appliance, then you can do whatever you like. There is actually a plugin resource in our Contra tree which lets you define a resource that interacts directly with a Docker API. And that would be a Docker API on a VM that you've brought up. So that's another option that's perfectly viable. There is a project called Magnum and they're writing a multi-tenant container API which is exactly what I wanted. It's still fairly early days. Probably their first step will be a just a straight up Docker API which is multi-tenant and I suspect they will also have in scope a multi-tenant Kubernetes API. They do have a model where tenants are isolated by VM so you don't have to worry about being scheduled next to a hostile container. So we'll see how that goes. But even when something's ready to use, you're still going to have to rely on your cloud to install the service. So you may still need to look at other options. This one's a bit different. It's just a collection of heat templates which brings up an atomic-based Kubernetes cluster. It's written by Lars Kellogg-Steadman, another Red Hat engineer. It's kind of cool. I think they're using it for various Magnum and Kohler things. So this may well be my starting point for the next step of defining the pods and the template and then passing it off to this cluster that's brought up. So get the code. If you want to build your own heat appliance, container-based, that's that first link there. The disk image builder element for building the software config hook that handles Kubelet is in the second link. And that's it. Do you have any questions? With heat being a sort of cloud formation for OpenStack, does it have a mode which cloud formation is missing to modify my template and run something like simulate mode to see what's going to change? So like a preview command. Okay, thanks. Yeah, so actually heat started out as essentially a clone of cloud formation that runs with an OpenStack. It's certainly evolved beyond that at this point. There's still the compatible cloud formation resources there but the focus now is on native OpenStack resources. Now there is a change set in process which lets you run a stack update with a preview option which will give you some indication of what would change if the preview was actually run. I think it's still in code review. It would hopefully land a kilo. I don't know where it's at at the moment. Does that answer your question? With cloud formation it is a problem because you change one thing. By dependency other things got restarted and you kind of mess up your system if you didn't apply a policy first. So when you change something in a heat template and do a stack update there's a dependency hierarchy and it depends on the resource, what property have changed whether that resource gets replaced or whether it gets updated in place. So this preview command would give us some indication that what you're going to do is going to replace your thing and you might want to decide what you're going to get. Since you're using heat to launch Docker containers here is it possible to use other components of open stack alongside this like Cilometer and others? Along with Cilometer. So heat has some alarming and scaling policy resources and some Cilometer resources when you wire it all up together actions can be triggered on heat when Cilometer alarms get triggered. So Cilometer will poke a webhook on heat and say this alarm has happened and then heat will do what it needs to which will generally be scale things up or down. So yes there is Cilometer integration but it should also be possible to trigger these things if you don't have a Cilometer in other ways directly from your VMs your VMs could be sending out metric data as well directly to heat. That's quite interesting, thank you. I didn't know you were going to try to train the Kubelet in that way, that was quite cool. I have previously experimented with running Kubernetes on heat, using heat to set up Kubernetes cluster. One of the biggest challenges I ran into was etcd is quite precious in how it gets updated. You can't for example just take down all the etcd machines and then try to bring them all back up again because they can't come to agreement again. It destroys the consensus algorithm. Do you have any thoughts as to how you might deal with that under heat which has limited ways you can describe the rate of change I missed a bit of that scenario. You've got a Kubernetes cluster? If you're installing your etcd cluster in particular using heat then what you can do is take down all the etcd machines at once and then bring them all back up again because you can no longer make majority consensus decisions if they're all down. This is for an update scenario where something changes and heat thinks it wants to replace all the servers. It should be possible to do your thing without replacing all the servers. If the cluster grows or shrinks you should be able to get the cluster member information into the remaining servers without replacing them. It should be possible to build a template for the servers when you're doing things. I didn't find a way to do an image update where you're rolling out a new version of etcd. You might have been using the resource group resource. There's also an order scaling group resource which has a lot more features rolling updates which is probably what you need so if it really is having to replace the servers then it'll do it a bit at a time. The resource group resource is useful because if you just want a bunch of things it'll give you a bunch of things but a real scaling group needs a lot more features so definitely look at that other auto scaling resource. Even if it's not auto scaling it's kind of misnamed it's the scaling resource and the auto comes from other things. Any more questions? Give it up to Steve Beiker. Thanks so much.