 Hi folks, my apologies, Chris Winston really is an awesome speaker and his experiences are very impressive at WebMD. And so I'm sorry that he's not here. Real briefly about me, I'm a recovering CIS admin and so I still know down, not across. I spent about a decade in operations and then someone offered me a chance to come work on CloudStack so I started working on that about two and a half years ago. CloudStack is now at the Apache Software Foundation and as such I'm a Committer and PMC member at the project. I'm also an ASF member so I'm a member of the actual foundation. These slides are very ugly. I will apologize to you for those. I've also contributed to a number of other open source projects so I wrote a number of monitoring plugins for Xenos. I used to be a Packager and a documentation writer and do a number of other things within Fedora and for a while I was on Sahana's security team. You can feel free to reach out to me if you want to berate me via email or Twitter. Feel free to. So real quickly, why would you want to build an infrastructure as a service cloud? I think there are really a couple of reasons. The first one is the most important to me. In some ways I think that infrastructure as a service does away with a lot of IT jobs. I tend to think that those are crappy IT jobs but we have seen the extinction of a number of classes of jobs over the years so if you look at, we used to have guys who rolled around big rolls of tape and mounted them on huge spindles. They were really the backup tape changers. So we've really done away with that now. We have cartridges. We have automated libraries that will change those in and out and we really don't have a whole lot of people who are day in and day out doing nothing but changing backup tapes. And I think from an infrastructure perspective, we're reaching that point with infrastructure and operations as well. I think there's also a little bit of an element of serving your users better. And I think there is the dangerous habit that we have with operations and with IT in general to get into this habit of people coming to us and asking us permission rather than us enabling our users. And I think that from a business perspective is very dangerous. So one of the inflection points that I tend to see is when people are essentially outsourcing IT and they're not IT that's doing it. So when the marketing department or the development teams are outsourcing core infrastructure to a place like Amazon, effectively they have determined that you are so much of a pain to deal with that they would rather plop a credit card down and worry about getting reimbursed for the expense than to have to file a ticket and deal with operations. And then finally, people hate doing repetitive things, especially CIS admins. We are notoriously lazy and no one wants to have to do the same thing more than they have to. So just for purposes of definition, I know this is probably old hat, but essentially infrastructure is a service. We are providing compute resources. We're providing network, broad network access. So essentially people are able to provide networking capabilities within predefined limits that you've set. We also have this concept of self-service. So if your idea of doing cloud computing is that operations will be making all the decisions and people will essentially ask you for things, I'd argue you're doing it wrong. We also need to have some control, because as much as we like the idea of self-service and not having to actually interface with users, it doesn't need to be the Wild West and you need to have some form of governance over that. And finally, you need to be able to measure what people are actually consuming, right? So I'd argue before you start on an infrastructure as a service project that you really need to figure out config management, you also need to figure out monitoring and figure out how you're going to deal with charge back. So config management is a real pain point. There are a lot of people who want to do cloud computing and haven't laid this framework down for actually getting work accomplished first. And just because you could spin up 500 machines in 10 minutes doesn't mean that you can actually make them do something that's valuable to the business. So you need to figure out a way to automate the configuration and actually putting those to work. And you at least need to monitor, from a monitoring perspective, this changes the game. Typically what we do is as we're putting something into production from an operations standpoint, we add that to our monitoring list. And so from that standpoint, that becomes kind of the check list as we go along. Well, when people can automatically provision resources, it becomes much faster and it's no longer operations doing all the work. It's some guy at three o'clock in the morning standing up a web server and you need to figure out how you're going to monitor that. More importantly, if you're starting to do things like auto scaling. So as demand increases, you're increasing the number of resources that are deployed. You need to figure out how to add those machines as well. And then you also need to deal with the case where you determine that there's not enough load to justify all of those resources and your scaling mechanism begins shutting things down. If it shuts things down because there's no demand on it, that's something that you should not be waking people up in the middle of the night for, because that's really a non-event. And then finally, when you start turning on self-service, one of the problems that you have is that people think that things are truly unlimited, right? And for better or worse, Amazon has given them this perception that things are unlimited in a cloud. As a matter of fact, one of the core tenets of cloud computing is essentially that elasticity appears there and it appears like there are unlimited resources to the end user. And so the end user will spin up machines and consume storage and bandwidth as if there is no tomorrow. And so you essentially need a way of providing that feedback for what they're actually consuming. So typically, that is economic. And you need to figure out how you're going to handle Showback and Chargeback. But this is really a cloud stack talk, right? So cloud stack is now a top level project at the Apache Software Foundation. It really came out of a startup that was originally called VMOps back in 2008. And when they came out of stealth mode, they also open sourced their product called Cloud Stack in 2010. There were actually some production deployments in 2009 of Cloud Stack and a couple of service providers. But really, it didn't hit mainstream visibility until May of 2010. And that was released under GPLv3. In 2012, that was re-licensed to the Apache Software license and also one little blip in between there. So like many startups did, Cloud.com had 98% of the code that was open source. They retained VMware and metering as the proprietary pieces. And in late 2011, after they were acquired by Citrix, Citrix allowed them to open source all of it. And the entire code base became open source. So I mentioned real quickly that it's been deployed since 2009 in production. It's been doing really large-scale operations for about two years plus. So the largest deployment that I know of for a single cloud is 50,000 physical nodes under management. And of course, they have virtual machines running atop each of those. Cloud Stack, as far as infrastructure as the service software goes, is really compute-centric. We consume networking. We consume storage. But really, we're focused on the compute side of the house. And so you won't see projects from us that do things other than that, really. It's all about the compute process. And we'll consume everything else. So what does it really do? So it has gorgeous user interface. And I'll show you that in a bit. But the user interface really is pretty. And if you have end users that are doing one or two machines at a time, they will be thrilled with the user interface. The reality is, though, if you're doing real work there, you're not going to be using UI. No one wants to click six times to get a virtual machine deployed. They want to actually interact with the API or with tools that interact with the API. So CloudStack has its own native API. But we also think that there's a lot to be gained by co-opting the EC2 or the AWS ecosystem. So we have some folks in Iceland back in 2010 wrote EC2 compatibility layer, which expanded to EC2 and S3. And that used to be a separate plug-in and has since been merged into core CloudStack. And so you get an AWS API interface as well. And again, we're focused on the consumption of compute resources and providing that so that people can focus on consuming those, but at the same time enforcing isolation and control. I'm not going to show you the, actually I will. So CloudStack has a REST-like API interface. And so by default, we provide three different levels of access to the API. We provide a root admin, a domain admin, and a user API. And so you'll essentially see our very REST-like API. So here is a deploy virtual machine call. And we have a couple of the top three arguments for require, which essentially are service offering ID, template ID, and zone ID. So the zone that you're going to deploy then, which is analogous to an availability zone, the template, the disk image, and then how much CPU RAM and what networking access you're providing. And that gets you, that will spin up a virtual machine then we have tons of other arguments that you can call against. So I'll show you real quickly why I have that browser up. So this is the log in with a regular user account. So this is what the end user sees when they log in. And you can see the number of public IP addresses that I've allocated, the number of networks that I have allocated only to me. And here are my three VMs. So I've got a traffic server VM. I've got a public VM that I used to build some documents and one that I'm doing some object store work with. And so the end user can come in and they can see on their own a number of things. So they can see the disk image, OS type. And they can edit a number of these things as well. So they can edit things like the compute offering. They can see whether it's enabled for high availability, what zone it's running in, et cetera. They can also look at the statistics and see how much of the CPU they're consuming, how much network bandwidth they're consuming as well. And you can also take a look at the network interface card so you can see IP addresses of these individual machines. And then, of course, you have a host of options available to you. So you can destroy instances, change passwords, et cetera, as you would expect. Of course, from the networking perspective, here's my default network. And again, you don't want to use this helpful. So essentially, we have a dedicated IP address that's assigned to me, in this case, 7252.126.71. And I first have a firewall configured for it. And by default, it is to deny everything, including egress. So you would not be able to actually communicate to the internet by default. And so once I start punching holes in the firewall, I can either set up load balancer to point to multiple machines behind the, in my isolated network, or I can forward those ports on. I can also do static NAT, but this particular IP address is not configured for static NAT. So that's what a end user sees. So from a domain perspective, we essentially will group accounts. So accounts are the lowest level of accountability that we have. So we charge things to accounts, even though an account may have multiple users. And so typically, you would see that the smallest would typically be like a development group would have an account so that you could allocate cost to them. And then you build domains and you can nest domains to say, everyone, we have five development groups. Each development group has their own account. They're in the development domain of ACME. And so really, this was designed at service provider scale. So service providers often resell their services. So you would have, let's say, Korea, actually, a better one is Alcatel Lucent, because Alcatel Lucent actually resells stuff. So Alcatel Lucent will resell CloudStack services to other telecoms. And so you might have an Alcatel provided CloudStack instance, but the domain's always going to be Kazakhstan telecom or something. And they may not use the default UI reference implementation, but they may do something and it would automatically assume the domain part. This is the reference UI, so by default, it's going to expose a lot of that. Does that answer your question? It's really more analogous to OUs in AD, but there's not a one-to-one mapping, although we naturally have LDAP support. So this is what an admin sees. So he sees the last failure, so you can see where someone was trying to exceed their limits. They kept trying to deploy virtual machines with a new network, and so it kept failing to grant them an additional network. And you can also see, essentially, the capacity that I'm running at for a number of things. So we'll talk, and I'll diverge really from slides here, because I think it's maybe a little more instructive to see beautiful images than it is. So CloudStack has, from an architectural standpoint, this is showing what a zone looks like. So we would typically think of a zone as being in a single data center. Above zones, we have regions. So you would have a region where you would be able to have sub-20 millisecond latency between data centers in the same geographic region. They would have fast interconnects, and then zones are their own, typically their own data center, although, again, all of these divisions are somewhat arbitrary. At the zone level, you're starting to make networking decisions, and so you're going to choose whether you're going to use B lands, some software-defined network set of protocols, or something more analogous to security groups in AWS. Pods are typically a rack or a row of racks, and so you tend to have end of rack or end of row or top of rack switch that connects them, and they will share guest networks in that pod. And then you have clusters, and clusters are where we really need to start enforcing some uniformity. And so we require that a cluster have the same hypervisor, the same CPU, and the same networking access across all of the hosts. That means that you can have a KVM and a VMware and a Zen server, and a bare metal cluster. You can have four of those clusters all different in a single pod, so if you want to mix and match your hypervisor, you need to have multiple capabilities across those. You can certainly do that, but they would be in their own independent cluster. Actually, I'm going to plug in, starting to get low. So clusters historically are where we share, find out where I plug in, clusters are where we tend to share storage historically. That's no longer a requirement, but for most storage types, when you're using NAS or a SAM to provide that storage, it makes sense, because you want to limit the number of consumers of a single storage resource. Typically, a cluster is going to be 8 to 16 hosts, and they're going to have somewhere between, typically, 20 to 50 virtual machines running per host. So start thinking of the storage demand on a single storage resource, and that can start to get expensive real quickly if you need to scale that. So instead, single storage resource, at least, you can have multiple storage resources, but it's per cluster. You can do zone-wide storage, but you are either going to need to be doing some distributed storage like CIF, or some really high-end storage, maybe like an Isilon, or higher-end Isilon. Yeah, so secondary storage has always been zone-wide. Yeah. And so we have historically used NFS. As a 4.0, you could use NFS plus object storage, and in 4.2 you'll be able to use only object storage, if you wish. Trying to think. So I'll show you real quickly from a networking perspective. We typically think of networking as really four or five different networks. So we have a public network that's really isolated that typically we're going to provide some kind of routing for, or we're going to use an external device. We have a guest network that all the VMs run on. We may or may not have a storage network, and that storage network is specific to our secondary storage, which is where we store images and templates, as opposed to the primary storage, which you could also have dedicated, but would be dedicated to the hypervisors themselves and the management network. So when you're looking at service providers, CloudStack out of the box comes with an appliance VM that will provide routing services called the virtual router. We actually have a couple of different of these. In this particular one, we have the virtual router and the VPC virtual router. We also have an IPv6 enabled virtual router. We can also interact with F5, SRX, Nasera, and Net Scaler, and coming in 4.2. We also add Metakuras, Mitonet, and there's another one, there's another SDN provider. You can also do GRE, OpenVSwitch with GRE tunnels. And so really, you can do a couple of different things from a networking perspective. How many folks are familiar with EV tables? How many folks are familiar with IP tables? A little more. So historically, most of this admins, and how much time do I have? OK, great. So historically, we've been very comfortable with VLANs and essentially using Layer 2 isolation at the switch. So we've got some problems with VLANs, right? So while people are comfortable with them, they essentially trust them. There are also some scalability issues. So the standards around 802.1Q and VLANs were written at a time when 4,096 VLANs seemed incredible. As a matter of fact, most of the networking gear that you get today won't handle 4,096 VLANs at line rate. Until you hit six figures per router, you're really looking at most networking gear, not supporting more than 1,000 VLANs. And a lot of it, a lot of even enterprise stuff won't even touch 1,000. So it becomes very expensive to route that many VLANs. And it's also not really practical, because no one hates hitting that limit, especially when each one of those VLANs may be a customer. But in CloudStack, if you're willing to deal with the VLAN limit, we will happily allow you to assign a block of VLANs to CloudStack. And we will consume the VLANs and assign them to individual accounts. And so each account gets at least one VLAN. And along with that, they get a dedicated appliance VM that handles routing for them. Routing firewall, port forwarding, load balancing, provided you're allowing them to access those services. So we're happy to do that. A lot of people really followed Amazon's lead with security groups. And as you can imagine, Amazon had a lot of scaling issues because they had a lot more than 4,000 customers. And they also need to keep them very isolated. So wow, that's really poorly worded sentence. Fishing isolation, networking decisions, out decisions. So essentially what this does is instead of having a centralized router VM or router itself that's making routing and firewall and ACL decisions, you're pushing this out to the hypervisor. So we're assuming a semi-trusted Layer 2 network that essentially we're assuming only the hypervisors are plugged into. And we are essentially setting up a network bridge. Actually, that's exactly what it is. It's a network bridge in Linux. And all of the traffic that passes from a VM on a host to another VM on the host has to go through the bridge. As well as traffic that passes from a VM outside of the host and traffic coming inside to the host to a VM. And so essentially that bridge becomes a natural choke point and we can make filtering decisions there. And then the problem really becomes one of you need to orchestrate essentially a firewall or router on that bridge device. And so Linux has had EB tables at least since 24. And so that allows us to do filtering at the bridge. And we essentially think of that as Layer 3 filtering. And that gets us a lot more scalability because now instead of a single router or a single firewall sitting at the ingress point of your network, you now are able to push all of that decision making down to the individual hypervisor. And they're making the traffic decisions when the traffic hits them rather than it becoming constrained. This is far more scalable. As a matter of fact, the 50,000 physical node deployment that I was talking about earlier, they are using security groups to achieve that kind of scalability. So certainly a hot topic and there's no shortage of SDN vendors here this week. But SDN really is coming hot on the scene and there's a lot of people who are very excited about it. There are very few people deploying it. We were really very happy to have one of the early adopters of Nacera MVP, Schubert Phyllis, decide that they really liked MVP, which is now NSX. And that they wanted that support in CloudStack because they also like CloudStack. They went out and built Nacera support into CloudStack. And really this is around separating the control plane and the actual function. So in many ways you can have CloudStack talk to the controller, which then interfaces. And if you go back and look at security groups where we're pushing that decision down to the individual host, this is really a very poor man's SDN type network. So CloudStack today has support for, I think, four different SDN strategies. Nacera was, Nacera and OpenVswitch with GRE tunnels were the first. And they have been available in some form since 4.0. We just added VX LAN support in KVM. It'll be in the 4.2 release that's coming out probably this week. We also have Midokura's Mitonet. And there's a force that I cannot think of at the moment. But really any of the major SDN players either have or are working on CloudStack integration. And I think there's tremendous potential. I think a lot of people are going to have problems and push back pretty heavily on deploying SDN to their network. The real problem is that we're, sysadmins have gotten progressively more automated. We're using things like configuration management, puppet chef, Ansible salt. Network admins, they're still shelling into a console and doing things by and large. That's starting to change. But two years ago, I remember that there was really only a single major network vendor that even had an accessible API for any of their devices. But networking is really the bane of everyone's existence in setting up a cloud. It is the single worst thing that you'll go through. And figuring out the right model for you is the difficult thing. We can also manage some real hardware, so SRX and F5. I don't think that's very cloudy, but a lot of people have a ton of money invested in SRX or F5. As well as, there's a number, we'll support Cisco 1000B and Cisco ASA 1000B devices as well within CloudStack. So storage, essentially anything the hypervisor can mount is fine. And distributed file systems are getting a lot of attention. There's a lot of folks who are very interested in the latest Bluster correct connect with KVM. And Seth has been long integrated since 4.0 with CloudStack, and RBD works well there. So let's see what's. So from a hypervisor standpoint, we naturally support KVM, XCP and Zen server, although XCP is now deprecated, and there's only one Zen server now that Zen server has gone up in source. VMware, we support if you have vCenter. We support LXC if you want to do containers instead of a real VM. We can also do bare metal. Provided your bare metal has IPMI support. Hyper-V did not make 4.2. OVM was deprecated, and you'd have to regress the 3.0 to get OVM support. And I don't know if that's coming back any time soon or not. And of course, you can mix and match these within Cloud Stack and manage multiple. And so I'm essentially out of time. He's giving me dirty looks back there. So if you have questions, don't hesitate to ask, but I also want to get out of the way for the next speaker. And I'll be around all week if you want to chat. Happy to converse about Cloud Stack or anything else. Thanks very much. I appreciate it. And sorry, I'm such a poor substitute for Chris.