 So, first of all, my name is David Nally, my contact information's up on the screen. Feel free to send me an email, berate me on Twitter if you vehemently disagree with me. So I'll kind of set the stage. What I hope to do today is to do a mix of both theory and practice on setting up a test to have Cloud and doing that with Apache Cloud Stack. And so some things are going to get far away from technical details, and then we'll dive back into details at other points. But I've got a couple of questions for you, and this particular tutorial is best done interactively. If you want to see something, jump up and say it. If you don't like some content, feel free to disagree, feel free to ask questions and interject. No need to wait till the end. But I'm curious, why are you here? Anyone want to volunteer? Yes? Okay. Anyone interested in doing something that's not test dev? With, let's call it a private cloud. Okay. What are you interested in doing? Okay. I'm actually surprised how many people want to do that and want to use a cloud to do it. Is anyone here who's been sent here to, because someone at their company has decided that they will have a cloud strategy and you have two weeks to implement it or something like that? I gave a similar course in Los Angeles a year or two ago and a guy from a very, very large defense contractor runs up to me and says, hey, I'm here and I just want you to know that you're going to be setting the strategy that very large defense contractor is going to use for their cloud strategy. And I said, what? And he goes, well, my boss's boss's boss told me that we have to have a cloud strategy, we have to have it in two weeks and I saw this particular training and so I'm here to take advantage of it and I will be defining our strategy based upon what you tell me. And I personally think that's doomed to failure. If I'm the only source of truth that you take home, I'm flattered but you're woefully unprepared. Anyone using AWS today? Anyone using other public cloud providers? I'm actually surprised. So for those of you who aren't, who say you're not using AWS, does that mean that no one in your company is using AWS? Wow. One of the things I found in the US is that a lot of the companies are, say, they're not using AWS or any other public cloud but when people are doing the expense reports, there's a massive outlay to Amazon and they can't imagine ordering that many books. As a matter of fact, in the US my experience has been that if they have any decent size IT or particularly if they're a tech related company, they are using AWS whether they know it or not. Development managers are just expensing things. So anyone doing private cloud today? Have I asked that? I don't see any hands. All right, so let me tell you a little bit about myself. So I am a member of the Apache Software Foundation and thankful that they elected me to that. I am on the Project Management Committee for Apache CloudStack. I was originally at cloud.com before Citrix acquired cloud.com and then CloudStack moved to the ASF shortly thereafter. Before that, I was an operations guy and I worked in operations as a sysadmin or team lead for about a decade and during that time, I've been contributing to a number of open source projects like Fedora, Zeno, Sahana and I currently work for Citrix in the open source business office. As a disclaimer, you will see plenty of things that are clearly not the Citrix corporate line. So these are my own views and none of theirs. So I want to explain and perhaps this is well known. But I think from an ops perspective, we often get a siloed view of the world. And so from a developer's perspective, here's what happens when they get a new project they've got to start working on. So they start the project and they go to your ticketing system, hopefully, hopefully you have a ticketing system and they file a ticket to get some resources and then they wait. If it's a really big project, they may be waiting a long time. If they, let's say that they actually need new physical machines, it's that size of a project, you've got to get the PO approved. Don't know how long that's going to take in your particular order. You've got to place the order and the vendor's going to have to fulfill the order. And typically, I see that ranging from four to six weeks, sometimes more, but then you've got the server and they come in and everyone's happy, except then they sit in the hallway or they sit in a server room waiting to be racked and cabled. So they have to get all of those racked and cabled and that's a different department, so another ticket. And then they all get cabled up and you need network access, which is IP addresses, you're going to need, perhaps if it's its own network, you're going to need routing and you'll need firewall rules written. That's a completely different department, maybe two or three different departments, depending upon your size. So more tickets, more waiting. And then finally after that, you get to go do things. If they did everything correctly and you need no changes. And so I've talked to a publishing company. They already were using virtual machines pretty heavily. And their time to get a virtual machine deployed was three months. From the time a developer asked, to the time it made it through all of the steps, getting firewall rules done, getting IP addresses handed out, and actually getting the virtual machine provisioned and then turned over to the end user three months. I talked to a university who has literally a different department that handles each one of these steps. And they were at eight weeks to get a virtual machine deployed to someone. So you've essentially got this developer who's got potentially a great idea. And he can't do get things done because he's waiting. So sum it up, generally speaking, this is a generalization. What IT operations is providing is not what developers want. And by and large, what we see is that developers are realizing that, they can do some small scale stuff on their laptop. They can use things like LXC. They can use some desktop virtualization tools and get some basic access on their own. But when they need larger scale than that, they simply have nowhere to go inside the organization so they go outside. And it's really easy to use a micro instance that you can get for free on Amazon. It's really easy to plop down a credit card and then get that reimbursed because most of the time their manager's not gonna say no to it. And I tend to see a lot of development managers actually running group AWS accounts and letting everyone in their team do that because it is so much faster for them to get things done for the business. And they're able to meet their deadlines without interference from operations. But I really do think it's not just the time to get things done. It actually is, it comes down to being active interference. I think that's partly an attitude problem that we in operations have adopted. I think going back to the time when we were the priest that people had to make supplication to to run code on Vax machines. We've gotten the idea that people are coming to us asking permission for us to get things done rather than providing a service. And that we'll get to it when we get to it and they will have to live with it. And I do not think that's the case, but that's a cultural discussion and we'll get away from that. So what we want to do is get rid of the waiting. And what we specifically want to do is we want to automate all of the things that don't need to be dealt with by an individual. And there's a lot of that. We want to have automated processes that we can easily enforce, that we can build rules around so that we're still comfortable with things. But at the same time allow people to get things done to as great a degree as possible. So that brings me to what is CloudStack. From a project standpoint, the project's Atheopathy Software Foundation. It's a top level project there. It's an infrastructure to service platform. Started development in 2008, had some production deployments in 2009. Was released under the GPLv3 in 2010, re-licensed in 2012 to the Apache Software License. CloudStack differs from a lot of the other infrastructures of service platforms in a couple of ways. First and foremost is focus. And the focus is different. If you look at some of the other platforms out there, they have a very different focus. And I will tell you what I think CloudStack's focus is, at least what it is today. That may change at some point because it is a living organization, a living project. But I think that the focus of CloudStack is to have an infrastructure as a service platform that just works. That is really easy to deploy because we want to focus on rapid time to value. That's one of the reasons you want a cloud platform in the first place. And we want to make sure that we're going to be able to allow you to do that. So let me talk a little about the architecture so you understand how we're going to go about deploying this. And rather than bore you with PowerPoint, let me. So I want to real briefly talk about how we group resources. So we have this concept of a region, which you don't see up there. And a region is a collection of zones that are geographically close together. So we typically are talking 10 milliseconds or less is what we would really expect as far as latency. So you're really talking about data centers in the same city. And you would assume that a zone itself is going to be a data center or maybe even you are partitioning your data center into one or more zones. But the regions will, if you have multiple regions, and we're just not advocating that you do this for a test dev cloud, but just to set some background. They use async message bus to communicate between the regions. So that you don't have to deal with latency being an issue. You can, latency is not that big a deal and you'll catch up when you catch up. And essentially these zones have some quasi independent management of each other. But each region will have its own set of management servers. And they'll communicate with each other, but you don't necessarily have to worry about management server falling off the face of the earth and going away. So the zone is where we're really going to focus most of our attention today. And it's what's most relevant because you would have to have at least one zone. And the zone is really where you're going to be making networking decisions. So at the zone level, actually let me see if I can. At the zone level we are deciding what the underlying network model for the rest of, for everything underneath is going to be. And I don't want to delve jump right into networking but essentially you're going to be deciding whether you're using SDN. Whether you're going to be using VLANs or you're going to do layer three isolation like Amazon security groups. So a zone is almost always within a single data center. It is a rather arbitrary distinction but because you're setting the network model, it assumes that you're not going to span more than a single data center with a zone. The next level down are pods and typically those are a rack or a row of racks. And we're essentially assuming that there is a top of rack or end of row switch. And so the guest network within a pod is all going to be the same. Everything inside that pod is going to have access to the same guest networks. Below that are clusters. And this is the first time where we actually start enforcing some rules on things. So clusters are the lowest level that we really are making any real decisions at. So when we make deployment decisions, we make them at the cluster level. And so within a cluster, the hypervisors have to be the same. The hardware needs to be sort of the same, at least as far as the hypervisors concern. Because we may decide to move virtual machines around within a cluster to rebalance load on machines. We may decide that because of a failure, we're going to restart an instance on another machine in the cluster. And so they need to have access to the same networking. They need to probably have access to the same CPU or at least close enough that the hypervisor doesn't care and will migrate it cleanly. That said, you can have multiple types of hypervisors in a pod. So you can have a cluster of KVM, a cluster of Zen server, and a cluster of VMware, all living in the same pod. And all running along, being managed by the same set of management servers. And of course we have the hosts that are in the clusters. A couple other elements that aren't necessarily related to the straight hierarchy. We have a concept called secondary storage. And secondary storage is effectively our object store that we use for storing, we use that to store snapshots of running virtual machines. And we'll also use that to store the disk images for machines that we are going to deploy. And so this is storage that tends to not change a lot and its contents tend to be immutable. And you can do this with, if you want the quick and dirty route, you can set up an NFS share and use that or you can use Swift or S3 or anything that adheres to either Swift or S3 APIs. We also have primary storage and there's really three different flavors of primary storage. And primary storage is where we run the actual virtual machines. And so there's three places, three types really. There's local storage. So you can have disks in the host themselves and provision virtual machines there. Need to understand that there are some implications like you're not going to be able to live migrate a machine, which should be fine for a test dev environment. You do get far better performance there, but you tend to get far better performance, depending upon the alternative I suppose. But it also is a bit limiting in that it makes deployment decisions a little more difficult. Because you're now, it injects some additional complexity. Where we used to only look at clusters for deployment decisions. Now, we have to go try that deployment on an individual host. Make sure it has enough storage. We also have shared storage that is shared at the cluster level. And this has been the default since probably Cloud Stack 1.0 is we hide primary storage to the clusters for a couple of reasons. First, the hypervisors tended to have a lot of hypervisor specific desires. So they wanted to have VMFS if you were running VMware. They wanted to do LVM, a top ice guzzly, if you were running KVM. Or they wanted to run raw block devices on ice guzzly with send server. And so that coupled with the fact that we wanted to limit the number of units that could actually hit a single storage resource. So essentially trying to constrain the number of IOPS that would be demanded on any storage resource. So we had that shared primary storage that also meant that you could only live migrate workloads within a cluster, which is a limiting factor. At the same time, storage is rapidly changing and we're seeing distributed storage really starting to pick up some traction. Tools like Ceph, particularly Ceph RBD. Gluster has some new libvert integration where people are doing essentially block devices as a Gluster object. And tools like Sheepdog, which have actually been around a little longer, that are focused on providing the types of storage that hypervisors really want to consume and being able to do that at scale, which allows them to scale out their ability to deal with multiple hypervisors. So we've got the final type of primary storage is zone wide. And so you can have storage that is accessible from anything within a zone, i.e. anything within a data center. This is also, I think, important because it's essentially the same scheme that Amazon uses for EBS. And if you'll notice EBS can be accessed from anywhere within an availability zone within AWS. So if you need to mimic that underlying architecture, you can. So that got us through talking about kind of the structure of the physical resources, right? Let's talk about networking for a minute. See if I still am logged in. So this is a super simple network deployment for, this is my personal Cloud Stack Cloud that's running in a colo in San Jose, I think, in California. And essentially I'm using VLANs for isolation. So going back real quickly, we want to give people the ability to go get work done without shooting themselves, us, or anyone else in the foot in the process. And so that means that we need to provide isolation to keep people isolated from each other. So if you set up a MySQL test instance, it does not become the production MySQL instance. Other people can't connect to it. And bring it down. And so we have to assume isolation by default. And the question is, how are you going to provide that isolation? And there are really a number of options. So the traditional answer to that is you use VLANs. And people are really comfortable using VLANs for network isolation today. It's a proven and tested technology. It's got a couple of problems. It was really created in the day before we could envision anything like the scales, the economies of scale that are being put into demands on infrastructure today. Anyone know the maximum number of VLANs you can have? This thing on that work, 4,000? Yeah. So you can have 4,096 theoretically. The real problem is actually much more sinister than that. The overhead from routing and making routing and decisions for 4,000 VLANs is incredibly expensive. And so the network vendors say, no one in general enterprise needs to route more than 1,000 VLANs. And so typically for under six figures, US, you cannot get a switch that will do more than 1,000 VLANs at a time. And you're talking hundreds of thousands or millions to get something that will handle at line speed 4,000 VLANs. Which creates interesting other problems because that's also a very arbitrary number after you've invested that much money. So there's lots of people who are doing things like VXLAN, NVGRE and some of the other SDN technologies that people have been pushing and paying a billion dollars for Nacero, which uses STT. We'll talk about those in a little bit. But essentially you should walk away with the understanding that VLANs are expensive for you to consume and that if you're using VLANs for isolation, you will run out, depending upon the size of your deployment, you will run out quickly. Because you're essentially going to need to provide a VLAN to every single. We'll call it an account. So CloudStack has a concept of users accounts. Accounts are the lowest level of isolation and you could have multiple users within a single account. But with each account needing at least one and if they need to build a multi tier app and need to have multiple levels of networks, you've got them consuming three, four, five, ten, 15. Start multiplying that by the number of teams or accounts that you're going to have and that balloons pretty quickly. So this is not a new problem. Amazon clearly had this problem and they developed something called security groups. This is LinuxCon. So how many folks are familiar with EB tables? Okay, so how many folks are familiar with IP tables, which is about to be deprecated? So IP tables has been around for a while. And EB tables is its bridged brother. So when hypervisors create a networking device, generally speaking, they're doing that as a bridge. So they've got the physical connector that you plug a cable into. And that gets them that network access into the box, right? So you plug a cable into it and get some the physical network into the box. And then they create a bridge device and then link those virtual interfaces that they're creating for the virtual machines into that bridge. And so in effect, that bridge device becomes a natural choke point or point to make routing and firewall decisions. Even though it's making layer three decisions rather than layer two that you would typically get with VLANs. That also gives you some scalability though because now instead of a single router or a layer three switch sitting at the top of a rack, making all of those decisions, each machine, each physical machine or hypervisor that's sitting in that rack has a bridge device that can be making decisions. So effectively you federated out your routing decisions even though you've moved them up a layer. We've seen that work at massive scale, 50,000 of those bridge devices in a single deployment, all being orchestrated by CloudStack and essentially providing that network isolation. And essentially what CloudStack's doing in that is it's ensuring that the state that is declared by CloudStack is actually enforced by the machines and it's going and saying, hey, this is the current state of what you should have as far as filtering. Is that what you have and if not update it. Think of it as config management on the network plane. You also have SDN options and I'm not going to delve into those greatly. I don't think they have a ton of applicability to test dev environments. They certainly are out there. Today, CloudStack supports Nacera MVP or NBS if you're using the latest release. It supports just using GRE tunnels. If you want to just use OpenVSwitch and GRE tunnels, it supports Stratosphere. It supports Mitocura's Mitonet. There's just been added VXLAN support that is not in a release yet. And Contrail support is being very actively worked on. There's code in the repo, but it's not been merged into a release branch yet. So there are plenty of SDN options if you're gonna need that for wider use. So let me figure out where my mouse is. So a couple of other things that we'll talk about really what CloudStack does is it's making orchestration and allocation decisions, right? So when I go to deploy an instance, there are a number of things that we ask of the user to help us decide. And I apologize for this pain. Real people who do work don't use this, but it's helpful for illustrating things. So we tell the end user the places they are allowed to deploy a virtual machine, right? You can do this in any of the zones that you have access to. Now, we may have zones that you don't have access to, but you get to choose that. That's one of the things we'll allow you to choose. Then we have templates that you can choose from, right? So you can choose what your disk image is going to be. Now, you'll notice that some of these have a hypervisor listing. In reality, the end user doesn't get to choose the hypervisor. The disk image they choose may choose that for them, but they don't inherently get to make that choice. They get to decide what they want. They shouldn't care or worry about what the hypervisor is. Those are merely text fields that have been added. So the end user doesn't know if they're running on bare metal. If they're running on a hypervisor or if they're running in a container like LXC. Then they get to decide what kind of resources. And again, this is something that the operations folks are getting to present to them, they can define a number of instances. Or a number of offerings for those instances that, and again, these are all text fields. If we had any creativity, we would put something like how much CPU and how much RAM instead of medium instance and large instance. We also allow them to say if they want additional storage in addition to the root disk, the root disk image. Then they get to choose a number of things about networks. And so you'll notice that for the account that I'm on now, there's only a single network. I could come here, add a new network, tell you the type, etc. Essentially, if I'm allowed to create additional networks or have still networks left in my quota, I could certainly do that, but I may be presented with a number of networks. All of those factors go into deciding where you're going to actually allocate the machine, right? Because if you choose a network that's only available in a specific pod, that's going to have to be deployed there. There's also an entire science behind, where do you want to allocate things? And so CloudStack ships with a number of allocation algorithms. So the default is first fit. So we see your request, we'll start looking for resources, and the first resource we find that meets all of the conditions you've put on it, we will allow you to, we will deploy it right there. Now what that means is that every time it starts searching the list of hosts, it tends to get those in the same order. So it will start filling up host first, generally. But there's also a number of other things. You can distribute that out. You can say, there's a distributed per account. So I'm worried about a machine dying. And I don't want all of your machines living on a single physical host. So I will spread out within an account, machines as widely as possible. You also have equalizing, which will essentially spread the load out equally, so that hosts are consumed equally. And there are a number of others. You can also write your own if you need to apply rules of affinity or non-affinity, you want to have two nodes that are processing the same data. Close by so that they can pass that data back and forth easily. Or a number of other rules. Maybe you have licensing for Microsoft or Oracle products that say, you can run as many instances you want on in number of physical nodes. And so you want to keep all of those on the same physical nodes. So really, when it comes down to it, CloudStack is about creating a set of rules around an allocation. And it's got some defaults so that you don't have to do this first time out. But allowing you to set parameters whereby people can go and automatically deploy things and do things in the same manner. And so we're going kind of backwards here. We've talked about the architecture and we've talked about what CloudStack does at a really high level. Let's talk about kind of the layout of how CloudStack does that. So we obviously have hypervisors and we group those in clusters and pods and zones and then regions. But the rest of that story is that we have stateless management servers. And which is what you're seeing here is one of those management servers. It'll run on just about anything because it's all Java, but we generally assume Linux is what you're running it on. They're stateless. They will auto balance the work between themselves if you've got multiple of them. They will essentially pick up work if one of them dies. So you can have four or five of these stood up and it'll automatically load balance all of the work between them. By the way, four or five is overkill. The largest deployment that I know of is almost at 50,000 physical hosts under a single plane of management. And they have a grand total of four management servers in place. So I keep saying that these are stateless or at least quasi-stateless management servers. State is stored in a MySQL database. So you've got the data store on the back end that you're communicating to. That sounds pretty monolithic and it is from that perspective. You've got essentially a single orchestration engine that is making allocation decisions, etc. Even if you've got multiple of them, they're effectively all doing the work. We have a couple of other things that do some of those services, some of the other services within a cloud. So we have, by default anyway, we ship a virtual router. And the router offers DHCP, DNS, routing firewall, NAT, load balancing, and a couple of other network services that you can choose to turn on or off as a service offering to folks. And depending upon your network model, you'll have one per pod or one per account. So if you're using VLANs, each customer ends up getting their own virtual router. They don't get to access it. They can interact with CloudStack to make changes to the configuration. But they don't actually get to deal with that, but it is dedicated to them. And so it will automatically spin up a new router when a new network is created. Or when it is appropriate. So you may have multiple networks that are backed by a single virtual router. We also have a couple of other pieces of functionality. So you hate to have to do this, just like you would hate to have to sign into DRAC on a Dell machine or ILO on HP or Loam on old Sun hardware. But, oh, and it brought it up over here, that's why I'm not seeing it. But we essentially have a remote console that will allow you, if your networking is hosed and you can't get in any other way, to connect to the console of the virtual machine. And so we have, because we don't want to allow you direct access to the hypervisor, we have essentially an AJAX proxy to either VNC and we just recently got a code drop to add RDP in as well. So you can connect to the console of the machine from the hypervisor and it's awful just like DRAC or Loam is. You can see how slow the response is and how you would hate to use this every day. But worst case scenario, this will work. This is also, especially if you're using something more than a console, this also gets a little expensive in terms of proxying that. So we have this console proxy VM and it will add additional nodes on demand and essentially so that if there are a ton of people accessing the consoles, it will spin up more and when that load dies down, it will drop back to just a single console proxy VM. We do something similar with the secondary storage VM which deals with aging out. Aging out your snapshots, grabbing snapshots from the running hypervisors as well as deploying templates and allowing you to download snapshots or templates from it. I don't know that I intended to go as deeply into what Cloud Stack is as I have done. Before I depart though, do you have any questions about what Cloud Stack is or how it does it before we go into actually using it to build a test of Cloud? So Cloud Stack interacts with a couple of different things. We will interact with vCenter and that assumes that you'll have VMware nodes behind it. We will interact with ZenServer's API which assumes that you're going to have ZenServer probably running on bare metal. We'll interact with libvert for KVM and LXC which probably assumes that you're running bare metal for the KVM nodes probably running our bare metal for the LXC nodes too in real world. But we don't really care because we're largely trying to do everything we can to interact with APIs rather than to interact with command line as much as we can. So that means that we don't really care what the hosts are as long as we can talk to them in the same manner. We do do some version checking so if you're not on, if you're trying to run a really old version of vCenter or a really old version of ZenServer or libvert will fail, won't even connect and add the host. But really we don't care what the hosts are. As long as they, we do do some checks for ensuring that you can do hardware vert on the host especially for KVM and ZenServer. Aside from that we don't really care. Anything else? All right let me see if I, yes. So guest networks are zone wide. Actually that's not true. Guest networks are pod wide. Networking in general is zone wide. So we would assume that your pod has, your pod may have access only to specific guest networks. Those could be larger but the pod is going to assume that everything in the pod has access to the same guest network. It may be that everything in the zone does but that's not a guarantee. And certainly the network doesn't, portable IP stuff notwithstanding. We'd assume that networks are not crossing a zone boundary. So let me see if I can jump back into this slide show real quickly. All right so let's talk about what a self-service dev test cloud looks like. I think self-service is mandatory. It's certainly in most of the definitions around cloud. And I don't think you can get away from allowing people to provision their own things. If you're just looking for a shiny new toy there are much better projects to tackle than an infrastructure as a service deployment. The, we'll talk a little bit about setting boundaries in a bit because I know that allowing people to self-service is also scary. And as an ops guy I, for a number of years including up to about 12 months before I started working on a cloud project I was vehemently against cloud computing because developers don't understand and don't care about the nuances of running an operations environment. So we'll talk a little bit about how we deal with self-service. I think you have to have usage measurement because when people think that there's unlimited resources they will consume them as if it really were unlimited and didn't cost anything. We talked about isolating a little bit earlier. I think that your dev test environment must isolate. CloudStack will allow you to share networks with everyone. I think it's generally speaking a mistake to do that. I also take the standpoint that especially with dev test, but with cloud in general that it should be commodity. If you were having to pay inordinate amounts for specific technology I think that you're probably doing it wrong. There would have to be a really compelling reason to pay more than the cheap price. This final thing is dev test, clouds never stay dev test. My favorite, there's a movie company in California who did a proof of concept with cloud stack. They set it up. They started using it. They liked it. They decided that they were going to deploy cloud stack in production. So they got ready to tear down the environment. They learned a few things and wanted to improve some of their choices and they started to tear it down. They sent a notice out and people said, wait, you can't do that. We have production workloads running there. Do you realize the homepage of our movie studio is running on your new cloud along with all of our interactive properties? The cloud had been up for 90 days as a proof of concept. Be very cautious about the fact that we tend to treat test dev like a red head step child and we don't monitor it. We don't do all of the things we would for real production environment because when developers find out that it's easy, they'll bypass all of the other things and try and deploy to production, which is really an encouragement that you need to communicate frequently with the folks who are using your services. I showed you deploying a virtual machine and I do not believe that provisioning manually, specifically not provisioning by a sysadmin adds value. It should be completely automated and this should not be news to anyone in ops. We've had pixie installers with kick start or jump start or RIS for decades and there's no reason for anyone to be carrying around CD-ROMs or USB drives to install things. When you're talking though about letting people who are traditionally not ops folks, self-service takes on some new meaning and automation takes on new meaning. Do you mean letting them use the UI like I stepped through the six steps to provisioning a VM in the cloud stack UI? Boy, I certainly hope not. That's painful if you have to do it more than once. Who wants to click even if you're using all default answers? Who wants to click six times? That's awful. No one who does real work does it this way, so I wouldn't let your users either. Maybe they want to interact with an API. They've got some tools. Cloud stack has a dedicated command line interface called Cloud Monkey that's on PyPy so you can install it from PyPy. That will allow you to have essentially tab completion for deploying, for doing anything with cloud stack. Cloud stack also has its own native API that you can interact with directly or you can use abstraction libraries. Cloud stack also maintains EC2 compatibility so we have a separate API interface. If you want to use tools like Voto or Yuka tools or even Amazon's EC2 tools, although that violates the license around the EC2 tools itself, you can interact with the cloud stack EC2 endpoint and use the same tools for doing that. I think if you have really sophisticated developers, any of those options are probably realistic. I don't know that it's for everyone. I think there's also config management deployment. Anyone in here not using config management? Awesome. Anyone just afraid to raise their hand? Config management is well entrenched today. I'll show you my favorite tool and I apologize for this being so small. I just didn't know how to get everything in there easily otherwise. I really should have done some highlighting here. Up here we have the name. This is a Hadoop cluster definition. This is for a tool called Knife Cloud Stack. How many folks are familiar with Chef and Knife? For those of you who aren't, Chef is a config management tool from Opscode and Knife is essentially their provisioning tool or at least that's what we're using it for here. Knife Cloud Stack has a unique ability to define an entire application stack and then call that. This is just setting the name, a description, a version number so you can keep it versioned and update. You can set environment but the real meat of this is when you get down to the servers. We have ZooKeeper nodes here and we've defined that there will be three of them in this application stack, A, B, and C. We've defined what the base disk image is going to be which is Rail56 space and you'll notice that we use that throughout in all three of these definitions. The service offering which again is the CPU and RAM that it's going to have. Port rules are the firewall rules that we're adjusting to open and then we're essentially defining the roles that it's going to have. This is going to be a member of cluster A and it's going to be a ZooKeeper server within that cluster. That's just defined three nodes for us. Then we're going to have a Hadoop Master. You'll notice the service offering is different. You'll also notice that this has a networks field. That gets you app and storage networks and so it's adding both of those. You'll see we're also opening up three different set supports and then finally we're coming down and having the worker nodes which again are using Rail56, same service offering, port rules. You'll notice we didn't define any networks which means there's it's going to consume the default network. This is something that operations defined as this is what a Hadoop cluster looks like in our environment. A developer sits there and says Knife CS stack Hadoop cluster A deploy and it deploys all of these machines with the firewalls rules applied, access to the proper networks and then links them all together as that cluster. Rather than provisioning a raw virtual machines and then having to install Hadoop and then having to configure network access, the ops folks said hey this is how we define a Hadoop cluster in our environment. Go use this and it's a single command line for the developer to deploy an entire cluster completely configured and ready to go. The folks at Edmunds.com wrote that tool and I've been very pleased with it. You can do similar things like this with SaltStack. There's folks who are working on it for Ansible and there's similar stuff for Puppet. There are native types and providers for Puppet resource or for CloudStack resources with Puppet as well. What I tend to see, this is again perfectly viable and decent. What I tend to see most often is people have a tool and this ranges from a button on a web page to something like this. This is CloudCat written by the folks at CloudEra and this is essentially how their developers are deploying CloudStack in their, are deploying CloudStack instances in their environment. They get to set a few things. There are defaults for virtually everything and they can say spend me up 15 nodes and I'll deal with them. The only other thing that I tend to see is I see a lot of people doing reservation engines. A developer, particularly in a DevTest cloud, should not be running something forever. It should have a limited lifespan and so essentially creating a record of how long you promise to use and then be done with it and then harvesting based upon those records is something I see pretty commonly as well. So how many folks are familiar with Jevons Paradox? All right, so before you go to Wikipedia, Jevons Paradox says essentially that as we become more efficient, that demand for what we just made efficient goes up. Because we can produce electricity cheaper, we will consume more of it. When gas, when petrol is less expensive, people tend to drive farther and so when it is much easier to consume computing resources, people will and on top of that because they see no direct cost, people are often leaving AWS instances running ad infinitum and I say that as someone who's had $2,000 a month AWS bills on my personal AWS account because other people in my team spun up instances, left them running. I was getting no direct benefit and just being charged for it. So there's plenty of waste that typically is round and you need to either have a charge back or a showback mechanism and so CloudStack will actively calculate your usage and it will do that not just for allocated resources but it will actually look at consumed. So you may have a five terabyte disk allocated but 50 gigabytes of that is actually consumed and it will track both of those things. Same thing with CPU, you may have four VCPUs consumed or allocated and only consuming one of them at 100% and so you can actually look at the things that are going unused that are still consuming your allocation and you would be shocked at how effectively that that cordons off a lot of waste because engineering manager sees that they're responsible for 25% of the compute bill and things start to be analyzed. So I think you also need to deal with monitoring. I do not think and I think Tom Limoncelli is the guy who first said this that you cannot offer a service if you do not monitor it. It simply is not a service if you don't care enough to monitor it. At the same time I think from an operations standpoint why monitor it? It's dev test. We've already established that we don't care enough and it may go away. So that said even though we are the high and mighty operations folks, developers are still important and you should care a little bit about their infrastructure. So I think at a minimum monitoring cloud stack because I think it will become an important tool or monitoring whatever your cloud platform is is a reasonable start. I don't think I would necessarily monitor instances but the more important question is how are you going to deal with ephemeral instances? So think about what we typically do. How many folks are using Nagios for monitoring? So typically what we do is we define what we're going to monitor on a node and that means that when we spin it up we will say all right HDTPD is running here. We will monitor that. MySQL is running on this other box. We have a different set of checks that we will run for that. What about if you don't know when a host comes up or down? What happens when it goes up or a developer kills it? Does someone get woken up in the middle of the night? That's a question that you have to, you know, is that worth alerting on? I would argue that it's not but you also have to be able to monitor those resources to some degree. So I think you need to be choosing commodity storage. If I were designing a test dev environment today and I wasn't already paying NetApp or EMC or some other storage vendor large sums of money, I would certainly be doing local storage. Yes, so CloudStack has, we'll send you out SNMP traps. You can also do JMX for monitoring CloudStack itself. There's a JMX port that allows you to do a ton of things. There are monitoring packages or monitoring plugins for Nagios, Zenos, Zabx, and Nemsoft that actually are making API calls into CloudStack for monitoring things like host availability and amount of storage capacity left, et cetera, for your entire environment. So I think those are, that's what's being done today. Much of my chagrin, CloudStack doesn't have an SNMP interface you can query. It is trap only right now. So it will send out alerts, but it won't do anything past that. And we've really struggled to try and keep CloudStack from becoming a monitoring environment itself. It does some very limited monitoring of the storage resources you consume, the networking resources you consume, but really it's checking to see if it has space to continue allocating on those resources. We'll also check hosts if you've marked it as highly available. We'll check those hosts to make, to see if they are still up. And if they're not, we'll restart them on a different host, but it's really poor man's HA. I mean, it's not HA in the sense of, you know, core of St. Carlinx HA by any stretch of the imagination. I tried to pitch this long before, back when we were at cloud.com, of calling it really fast mean time for recovery rather than HA, but they apparently didn't like that and went to the marketing slick. So from a monitoring perspective, you can grab the traps, you can do JMX monitoring. If you're a big Java shop, you're familiar with that already. And you can also, there's a number of plugins, you can emulate them with essentially any package because it's just making API calls to grab and parse the information. So if I'm assembling a DevTest cloud, I personally think that the best value is local storage because I'm essentially saying that I don't need something that has the attributes of shared storage. I need it to be relatively fast and local storage tends to be generally the fastest. It's not resilient. I know that if it dies, it goes away. And I have hopefully said expectations that all of these should be treated as ephemeral, even if they're not. So you don't get failover if the host dies, all that storage goes with it, but it's the best mix of cheap and performance and able to get you started right away. So I talked a lot about kind of the networking strategy. If I were defining a network for a DevTest cloud, I would be choosing layer three isolation or what Amazon calls security groups. I think that you can do VLANs on a really small scale, but I don't honestly think it's worth it. I don't think, particularly in a DevTest cloud, that there's anything that you miss out on by using layer three isolation. And you can use cheap and dirty switches. I mean, switches that aren't even capable of VLANs if you're using security groups. CloudStack, and I didn't really talk about this, CloudStack will manage physical hardware or even virtual representations of physical hardware. So if you've got F5 load balancer or Juniper SRX or Cisco has a number of virtual network appliances that you can use with VMware, you can have CloudStack interact with all of those. I think it's a waste of your money for the DevTest scope. I do not think, and I think for a lot of other purposes, I think you really have to get into a very niche deployment to justify those. So the virtual routers will do DHCP DNS, load balancing, port forwarding, NAT, and a number of services you probably will never need. Let them do the work. A virtual machine is as commodity as you can get. And this also allows you to, depending upon your network, deployment allows you to scale it pretty easily. Adding another virtual router is as easy as deploying another virtual machine. So I would use cheap networking gear and I would use Layer 3 isolation as my network model when setting this up. So then when going for hypervisor, I pick a choice here. That is my personal choice. You should use whatever you're comfortable with because you'll already have the expertise. Use what everyone knows. If you're a VMware shop, you should certainly use VMware. You're already paying the VMware tax. There's zero benefit to switching. And from a DevTest perspective, there's zero benefit to using anything else, at least at a small scale. Use what you know. From a DevTest perspective, they are effectively equal. You need the ability to turn a virtual machine on, off, ensure it gets connected to a network and it has access to storage. And every hypervisor out there does a great job of that. If you are incredibly resource constrained or you're doing this at a really large scale, there are two different answers. I think KVM, if you come in knowing nothing, you're going to be running Linux, I presume, anyway. So I think KVM is the easiest to pick up if you don't know hypervisors at all. And it's the easiest to get going and running really quickly. Just eat the most out of things. I think ZenServer probably gets you closer to that and I think that's why you see Google and Amazon, both of their clouds are running Zen-based hypervisors. I think they allow you to tweak things. I think the overhead for doing so is a lot higher. But if you're doing this at massive scale or you've got to eat every bit out of performance out of it or you need to pass through virtual GPUs for actually using the GPUs for calculations, yeah, there's probably some advantages in Zen. But for general DevTest purposes, use what you know. If you're really resource constrained, LXC is far more efficient than any of the real hypervisors. And LXC is just a container. And I think LXC is compelling for other reasons from a DevTest environment. The entire container aspect of being able to define the environment I think makes it easier on folks. But if you're not already using LXC or Docker or something similar, it may not be worth the overhead of learning. Anyone find my choice of KVM offensive? Okay, I have no fellow Citrix employees in the audience. So Citrix does a lot of work on Zen and I catch black for this occasionally. So from an ops standpoint, we like the idea of not having to do menial tasks and being able to automate them. We dislike the idea of people having essentially the authority to go and do things that we consider dangerous for them to do. So we have a couple of constructs and every cloud management platform you consider should have similar things. You can limit the amount of resources that people can deploy. You can limit where they can deploy resources. And you can create a number of rules around that. Anyone know what the default limit is for a number of instances on AWS? Can you sign up for a new account? You can spin up 20 VMs. The last estimates I heard were that Amazon had well into six figures of physical hosts, which I would be willing to bet is larger than any of the cloud deployments that any of us will be interacting with. And I think Amazon does it for a couple of reasons. I don't think that they necessarily have the scale problem, but I think they are concerned about it somewhat. They're worried about fraud and they're also worried about inadvertent escalation of bills that people will end up not being able to pay. So essentially, when you're talking about deploying all of these resources and maybe you set something that auto scales based on load, spinning up 20, well that's a heavy use, spinning up 200 or 2000 because it ran away from you without you knowing. That's a lot more serious. So when even the largest public cloud in the world has some default limits, I think it's not unreasonable for you to have the same. So set some defaults. There are ways of getting around. If you've got a big project and need to have multiple people sharing resources outside of the account system, you can set up projects and allocate resources directly to that to get yourself around quotas. And the quotas will ensure that 90 percent of the time, maybe 95 percent of the time, you stay out of people's way and allow them to work within same restrictions. And those people that are going to need to consume lots and lots need to come talk to you. So it's a nice safety valve. And that allows you to deal with setting, you may be saying in your environment to say that people get no public IP addresses because you don't want them exposing things. And allows you to still make exceptions and allow different groups to do other things. So I've told you my consideration. I want to talk about how you actually go about deploying cloud stack. So what I won't talk about is the boring stuff. Cloud stack has YUM and app repos. So if you're using Ubuntu, Debian or CentOS, you can deploy easily. It's two lines to build your own packages if you want to run it on something other than CentOS or Ubuntu or Ubuntu-like distributions. So that really is the easy part. So how do you actually go about and deploy cloud stack in an environment? And that's actually a couple of things. So you're obviously going to install the packages. That's, again, the boring part we don't want to talk about. You have the things that you also have to take care of. You've got to decide network model. And invariably, you will get it wrong the first time you try and deploy it. And I don't have any advice other than to try different things, figure out what you actually need out of a network model, whether that's, you know, you're going to use VLANs for whatever reason. You're going to use security groups. You're going to try some cool new SDN stuff. And even then, once you made the network model choice, figuring out how to actually put that on to the, meld that with the physical infrastructure can be challenging. So let's talk about, we focused a lot of time on network choices for the guests. So we've talked about VLAN security groups and SDN. There's also a management network. And real quickly, I don't know if it's going to let me stay in or not. Well, maybe. So we also have a couple of other networks that you have to worry about in the process. So you probably have a public network. If you're going to allow people to expose their work to the internet, that's probably going to be a different layer two network, even if you're using security groups. And the bulk of what we've talked about thus far is the guest network. So while you can probably consolidate, you've also got to worry about a management network. And that management network is going to be trafficked between the hypervisors and the management server and potentially the management server and secondary storage as well as hypervisors and secondary storage. You can eliminate a little bit of that and have a separate storage network. And I hate the term that we call this a storage network. That is a secondary storage network. So essentially it's taking the disk templates and snapshots and giving their storage and migration a dedicated network so you're not clogging your guest networks with traffic. Storage and management will all be the same network if you don't put a dedicated storage network in place. And then completely separate to this, you can have a dedicated storage network for primary storage. And that really is outside the scope of CloudStack and it should be set up so that the routing table sees that as the local route storage. And it will consume that particular network. But that's really outside the scope of CloudStack itself and is more host configuration. And so you've got potentially all of these networks and you've got to really figure out are these real physical interfaces. Is it worth having a dedicated primary storage network interface? Should that be a VIF? Etc. Network setup is the bane of every cloud deployment regardless of platform. It is the most complicated piece of the puzzle I think. But I'll essentially, I'm going to walk you through the choices that you have to make when you're setting up and really you're talking about setting up your zone, right? So when you install the management servers installed automatically, you'll tell it what database server to point to and it will create the database and then you'll tell it to start the management server. And then we're really coming down to setting up a zone, right? So everything after that is zone setup. So let's walk through what zone setup looks like. And so we have this network choice. And so basic is layer three isolation. Advanced is everything else. So if you want to do VLANs or SDN or security groups with VLANs, it's going to be an advanced network. And basic is either layer three isolation or no isolation at all. So we'll walk through. We're setting up essentially defining a lot of text fields. Name setting up external DNS and internal DNS. So internal DNS is the DNS that will point to local resources. So things that you're not publishing public DNS for. I typically because of the way I do not use any internal DNS names at all. So I list internal DNS as public DNS and go from there. You'll see it asks what hypervisor you want. And I told you earlier that hypervisors are cluster specific. So why is it asking you for a hypervisor? Essentially, it's going to start creating some jobs as soon as we create the zone. And it's going to need to know what hypervisor we're going to have first so it can deploy some of those system VMs. So the virtual routers and other things. So let's see. We will set up a zone here. We will be grateful and use Google's DNS service. I'm going to choose KVM. Now you'll notice that LXC wasn't in the list. And you cannot deploy LXC as primary zone because it does not have a system VM. So if you're using LXC, you have to run it in addition. So we have something called QuickCloud which runs the system VM services on the management server if you're using doing really small scale stuff or just prototyping. But generally speaking, you need a KVMs and server or VMware cluster running. And that's true for both bare metal and LXC. You'll need at least a helper cluster of something else to run those system VMs for you. Then we get to network offering. And so this is the option under basic. And there are some others under advanced. The default shared network offering with security group service is essentially layer 3 isolation. You can say default shared network offering without security groups. And that will give you a flat layer 2 network that everyone shares and everyone can see the traffic on. I think that's generally a bad idea but you can do it. And you can also use Netscaler hardware as well. So we're going to use security groups and we'll say that our default network domain. We can also say if this is going to be a public zone or not. So you can create dedicated zones for specific purposes or specific users. And we see a lot of government users who create dedicated zones for different security levels. And you can also define whether you're going to have local storage in the zone or not. So we'll say this is public. All right. So this is where you're essentially setting up your networking. And you can add that secondary storage network. And you can edit the details of guest network by essentially telling it what the interface name is in KBM or the label that you would have in SIN Server or the vSwitch ID in VMware. So we've essentially we're done with zone setup at this point. So let's we're going to we will create our first pod which again is temporarily is generally a rack or row of racks. And we will say that the reserved system gateway. So this is essentially the management network that's going to be used in that pod. This is the gateway for I'm sorry the guest network in that pod. This is the gateway that's going to exist. We are we are assuming that the underlying guest network has a gateway out. I really dislike the fact that they call it reserved and for that particular field and this one and the net mask and we'll say that this is slash 23. And then we have so you need to have a number of IPs internally that are going to be reserved and dedicated to cloud stacks internal use. So think about the system VMs. They're going to have a address on this guest network and cloud stack wants you to reserve a section. So that also means that so I told it that net mask that entire slash 23 cloud stack is going to assume is its own. You can do some partitioning of that via the API but in this setup dialogue. If you hand in the network it assumes that it has rights to everything except for that gateway and and it will also hold separately the the system IP addresses. So the install guide says that you should reserve 10. I don't know where they came up with the number. 10 is a stupid number for some environments. If you're using VLANs it should be 10 plus the number of VLANs that you plan on consuming. If you're using layer three isolation 10 is probably overkill and you can get by with five. If you're using if you're using SDN it depends on the SDN technology you're using. Some of those will consume more IPs for system resources than others. But we're going to do 10 to 2. You'll notice this is still in my net mask. So I've said 10 to a 1, 1, 2 to 12 is my system IP range. So these are going to be where the system VMs get there. So now I've got to define guess. In this particular case because I'm doing basic networking I will be using the same exact network definition. And I will not be able to use 2 through 12 as part of my guess network. Guest and system networks and basic networks are exactly the same. And I'm advocating that you use this a because it's simple or it's simpler. If you're using VLANs for isolation you can define a completely different network from an IP perspective to be used for each one of those VLANs. So you could have something completely different here. But we're doing basic so it's all going to be so. And so it's already selected that I'm going to be choosing the KVM hypervisor. We're done with pod setup at this point. We're working on clusters now. And so the cluster name KVM forum was here and I can actually spell forum. So we'll name the cluster name KVM forum and now it's going to ask me to provide a host which I don't have. And after you go through all of this and actually we'll go ahead and say host name is foo. Username is root password is password. Host tags are something that allow you to set individual hosts off. So if you need to make additional allocating decisions you can say these are host with SSDs. These are hosts with really slow IDE drives. You could also tag them and say these are hosts that I want to allow personal information about individuals to reside on as opposed to the rest of the workloads. Generally in a dev test environment you don't need to worry about it and you certainly don't have to fill it out by default. So it's going to go and try. So it's going to end up failing. Trying to think of what that is. So it's looking for a secondary storage resource and the fault is NFS. You can also use object storage. Now it's going to try and launch the zone. That is essentially all that there is to setting up cloud stack. I mean obviously you should have also set up your NFS server or object store to deal with secondary storage and you should have actually had resources at KVM node here in Edinburgh. But it will try. It will fail. It will go through and create the networks and everything else which are purely software constructs at this stage in the game. But it's also checking through each of these steps to make sure that you've got something that will actually work when it comes up. So it's going to go through a number of things. Check all of the configuration values and then if you've made any errors you can go back and fix the errors. It thinks that I have a... We'll make it... But that means if you do make a config error it's really easy to go fix it and you'll see that I fixed the error. Something went wrong adding the host because that host doesn't actually exist here which we expected. And that's about it. At the same time I don't want to trivialize setting up a cloud platform because it can be quite painful. Especially as you try and do more complex things. You'll notice I did not change the defaults for network configuration and you can do many more complex things and may need to do more complex things to deal with compliance issues in your environment. But to get something up that works for dev test it is just about that easy. And I would expect that you would spend about the two hours that we've almost spent here doing it. Assuming you can provision your Linux machines and hypervisors relatively quickly it should be about that easy. So to see if I can jump back to... I'll leave you with this which is where you can go to get help when you start on your cloud project. So CloudStack docs are there at cloudstack.apache.org slash docs. We have an install guide, an admin guide, release notes with every release and we also have some networking specific stuff. IRC, there are a ton of people who are running CloudStack into production and have stubbed their toes on the same problems that you have as well as a ton of developers that hang out in either hash cloud stack or hash cloud stack dash dev. And so if I were having problems that would be the first place that I would go to. And obviously the website, there's a book put up by packet about CloudStack. It is generally, it is largely a rehash of the content from the install guide. So don't know that there's a huge additional increase in value there. And if you want to talk about CloudStack today, there's CloudStack booth and some of the folks from shape blue who are a CloudStack consultancy are there. There's some folks from Schubert Phyllis who run CloudStack in production who are there as well as a number of folks who actually actively work on the project. What questions have I not answered for you? Let me turn that around. If I handed you a USB stick with software, CloudStack software on it and a bunch of machines, do you think you could go and get it installed? Actually, installation is the easy part, right? Yum, install, CloudStack would take care of that. But do you think that there's enough context here to actually go and get things done? And if not, what's missing? Silence. So I did either a wonderful job or everyone's asleep or just want to go to lunch. What hypervisor were you using? So if you actually read the documentation, they talk about defining physical interfaces and setting them up and going and editing a ton of files, not actually. Maybe. This is actually one of my frustrations with most of this. And that is figuring the network bridges. And so it talks about, oh, you're going to have multiple VLANs on the same interface. So create all of these bridge interfaces. And then it tells you to go do all of this stuff for each bridge. And then you have to go in the management server and do that. That is a configuration possibility. I don't think that it serves you a ton of good, particularly, and again, coming back to the DevTest defense. There's a guide and I have not published it yet for 4.1, the quick install guide. And it is, I think when printed out, including front matter and index, it's 20 pages. So here is configuring libvert and configuring QMU. And that's it. So the install docs do talk about things that you can do. They are, in my opinion, overly complex because some people do actually want to have to span multiple vifs on a single interface or they want to have multiple interfaces sometimes bonded together. And the install documentation goes to great lengths to allow you to do anything you possibly could do, which I think makes it very hard for people to actually get up and going up front. I would try and use the quick install guide. The only material difference with the one that's published for 4.1 is we've changed package names, and so it's cloud stack in front of everything now rather than cloud. But otherwise, you should be able to sit down with this guide and get a running cloud stack install in an hour, maybe, assuming you have a single node installed. You're not installing 500 physical nodes, but to get a single management server and a hypervisor running on the same machine and working, you should be able to do that in an hour with the quick install guide. And largely copy paste everything into it. Yes? Yeah, so as a matter of fact, I'll show you, maybe, drop this down to... So there are native types and providers for Puppet, and so you can define an instance like this that has name. You're essentially establishing that it's going to be present or absent. If you need to destroy it, you need to change it to absent. You can tell it what zone flavor is the service offering, and image is the name of the template that you're deploying, and I'm using group, but you can also use some other settings, host data that's communicated to it as well to set facts to use cloud stack as an ENC. There's actually, if you'll look on my Slideshare account, which is slideshare.net slash ke4qqq, there's a presentation about doing, using Puppet to provision cloud stack instances, and I think we'll see. So here's one. So here is a stack, similar to what we saw with Chef, where we're defining two different types of machines. I did that very simply. You can also define firewall rules and a number of other things. Sebastian, who's a cloud stack contributor here in Europe. He lives in Geneva. He wrote similar stuff for Salt, and I heard one of the shape blue guys said that they were working on Ansible, so most of the config management bases are covered or will be very shortly. Anything else I can answer for you? Feel free to come talk to me or visit the cloud stack booth. There are tons of knowledgeable folks around. Find us on IRC. You can find me on Twitter. Feel free to email me as well. I appreciate your attention. We're at five minutes till, so I will cut us loose, and that way we can get in line for lunch first. So thanks very much. I appreciate you coming.