 All right, morning everyone. Well, this is how Dreamhost builds a public cloud with OpenStack. I'm Carl, I'm the cloud architect at Dreamhost. So, giving you a little bit of background, this is mostly going to be a technical talk, but we've been in business since 1997. We're gonna be old enough to drive next year, so that's kinda cool. We spun off Ink Tank as a support company for Ceph, if you're not familiar with Ceph. It is an exabyte scale storage platform, entirely open source. We're using it for both our Dream Compute and our Dream Objects products that we launched earlier this year, and Dream Compute we launched on Monday. We really believe in the whole open cloud philosophy, and so a lot of this is going to be shedding a lot of light into what happens inside of the black boxes of what we build for public clouds. A lot of the other cloud providers don't tell you what hardware they're running on. I'm gonna show you pictures of it. So, why are we doing this? Well, Simon Anderson, our CEO, likes to say to empower entrepreneurs and developers. And this is the primary motivation for why we're building a lot of the products that we build at Dreamhost. And in order to do this, we had to come up with some design tenants of how we wanted to build this, how we wanted to make it go. And amongst those, we're making sure it's reliable. Maintenance is the normal, it's not an exception, so we don't wanna have to sit there and go, oh, it's a high availability platform. We can push a couple of buttons and the upgrade will just happen and it'll never go down. We wanted to throw all that thinking out the window because we had been using it in the past with a lot of our traditional shared technologies. It hasn't quite worked out. So, in this, the mantra that marketing wants me to say is it's built for resilience, whereas the DevOps term is designed for failure, but whatever. We also wanna isolate tenants from each other by default. So, on a relatively large cloud provider named after a big river in Brazil, the tenants are separated, but we kind of kid that it's like Swiss cheese cloth in between you and the person next to you. You know they're there and if you do the right things, you can see them and we didn't wanna do that at all and that drove a lot of the decisions of what we did in terms of networking and in terms of hardware selection. We also wanna make it easy to expand, easy to upgrade and we wanted to automate everything. These are the big concerns that went into trying to make all of this work. So, once we had figured out what we wanted to accomplish, we had to figure out, okay, these are the problems we're gonna have to serve for it and that of course presented obstacles that we needed to get around. First one was storage. We're doing shared storage on everything. All of your boot volumes sit on a shared storage backend. This is actually kind of awesome for us because it means we can perform maintenance on the cluster without having to turn off your VMs. Said large provider had a couple of incidents last year where they had to reboot all of the VMs in a couple of data centers. We didn't wanna have to fall into that trap. Obviously, if you're doing something that, that it's gotta be massively scalable. It has to run on commodity hardware. We want a single solution for everything and again, it has to be fully automated. In networking, it has to support IPv6. I'll get to that in a second. Tenants have to be isolated from each other and actually isolated from each other. Part of the reason for this is because we're a shared hosting company. We've been giving people logins to shared machines for a long time. Terrible things happen. We didn't want that to happen to our customers in this product. What Bob does and what Alice does aren't going to directly impact what the other person does unless both of them agree that they want to share information through a comment, through routers. We're using lots of 10 gigabit ethernet and faster. And we are specifically designing everything with no single point of failure. We're not using standard spine and leaf core architecture design and all of our cloud products. Specifically so, we can do maintenance and we don't have to worry about one big core at the middle of it failing and taking everything offline. Hypervisor was another obstacle we had to come across. There are many choices in the world of hypervisors. Most of them are supported by OpenStack now. And these are some of the big things that we wanted. The third option is interesting because we're interested in trying stuff that's not x86 down the road. So we'll see what that does. And we also didn't want to do anything that was going to require modifying guest operating systems which some of the larger hypervisors do. Eventually we might need to install an agent in order to do some of the advanced networking stuff that we're deploying but for right now we wanted to try and keep it simple as possible so that you can run as many VMs as you want without having to do any major changes. And we're also giving you the flexibility to run pretty much whatever operating system you want within reason. Like I'll draw the line at MS-DOS 3. IPv6 is really big for us. We plan on keeping this product running for more than six months and that means we would have run out of IP addresses if we didn't implement IPv6. It's kind of entertaining. It's also a great way to piss off your network vendors. The first thing that I do when I'm talking to them is I say, do you support IPv6? And everyone goes, yes, of course we support IPv6. And my next question is, do you route it? And that's when everybody goes, in October maybe? So it's been interesting for us but somebody needs to drive the boat on that. We need to get to the point where IPv6 is widely deployed and it gets us back to the internet that we were all promised in the late 90s where everyone is directly addressable. So we are kind of doing that which meant we needed to go make some decisions and making decisions is hard but that's okay. So for hypervisors, we chose KVM. It's built into the kernel, it has lots of eyes. It addresses all of these various issues that we wanted to go at for everything. Doesn't cost us a dime, which is cool. A lot of the other hypervisors don't cost us anything as well but it's nice. It also supports multi-platform so we can do stuff that's not X86. It gives us a lot of flexibility in drivers and there's a lot of eyeballs looking at it which is really important to us for trying to keep things up. Obviously storage, we're backed by Ceph. So we have in DreamObjects, we run the largest known deployment of Ceph commercially in production. This will be the new largest deployment of Ceph in production. Our initial cluster is going to be 4.5 petabytes of raw storage. We're giving everybody a fixed size default boot volume and then you can add however many other volumes you want. They're all backed on RBD and the cool thing about it is because of Ceph, there are some rather impressive security advantages unlike using iSCSI or other technologies like that. If you allocate a new disk and you try to read data from it, you're gonna get back zeros until you write something to it, which is really cool. You don't have to worry about somebody else getting your data once you've pulled on your instance because you uploaded a file you shouldn't have. Physical networking hardware. This we thought a little bit outside the box. We're buying our networking hardware directly from the Taiwanese ODMs. In this case, we're getting all of our boxes from Delta Networks. These are traditional Broadcom trident-based platforms, 48 gigs of 10 gig ports, four gigs of 40 gig ports. It can be four or 40 gig ports. One U pizza box. We put one of those atop of every rack. We use 10 gig to connect to the host. We use 40 gig to connect to our spine switches. The hardware is fairly fantastic. The software that it ships with, not so great. So we partnered with Cumulus Networks. These guys are in stealth mode, but they let me say their name, which is very cool. They build a Linux operating system for commodity switches. Our switches run Linux. It's the exact same command line tools as everything else, and it's absolutely awesome. It makes automating it a breeze because now all of our switches just look like servers. For logical networking, we partnered with Acera. We're doing software-defined networking on top of all of it, and yes, we are exposing it to our customers. If you're curious if we're using the bridged model or the overlay model, we are using the overlay model. This gives everyone their own network space. You can spin up as many switches as you want. And every time you spin up a switch, we assign an IPv6 slash 64 internet routable to it for free. So, oops, you can't see one of the lines on this, but that's all right. So the question is, when you're doing a network infrastructure like this, who needs spanning tree? You can't really see it, but the bottom two stacks of equipment are essentially each one is each rack, and that's our layer two domain. Layer two ends at our leaf switches. We do layer three OSPF ECMP connections to our spine switches, and they connect to other pods. All of the links coming out of the racks are 40 gig. There's two of them. This gives us the ability to do maintenance. If we need to do software upgrades on our spine switches, we just do one, nothing goes down, bandwidth goes down a little bit, so there might be some performance impact, but we're not gonna actually take anything offline. Everything has redundant links. Everything talks to everyone, even in a failure situation. All of our racks are our failure domain. If we lose power in a rack, that's fine. We just go spin up the VM someplace else because they're on a shared storage platform that spans multiple racks and multiple rows inside of our data center. We use Chef for all of our automation, and yes, everything we're doing for deploying our open, our public cloud is going to be open sourced, upstreamed to the open stack for Chef cookbooks. We're doing that now. You can actually go on to my GitHub site and see all of the cookbooks that we're doing. We're in the Folsom branch. Check it out, and we want people to check it out because targeting developers and entrepreneurs, if you've got four machines sitting in your basement and you want to spin up an open stack cluster, go for it. Develop your application on it, and then when you're ready to move to something that needs more CPU horsepower, more bandwidth, more uptime, you're running the exact same stuff we are. Just pick up your VMs and move them over. So we also use Orista switches for our internet and sand access point. These are kind of cool, but the one thing that you may notice is, did I just say sand? And indeed I did. When we were designing our infrastructure, we had the kind of the same feel that Troy Toman did this morning. We wanted all of our API endpoints for open stack to be inside of VMs. And then if only there was some open source, cloud infrastructure orchestration platform to manage and maintain all of those VMs. We were basically doing this for two reasons. One, we wanted to scale, and two, we wanted high availability. To start with, we didn't really have the time to implement everything the same way Troy did, and running open stack to run open stack is a little inceptiony. It's not necessarily bad, it's just a lot of work. So we went the enterprise route and bought some VMware licenses and are running it on an HP left hand storage platform. It's not cheap, but it works, and it gives us the ability to focus more on getting the product out the door and then we can go back, clean it up, build it all on top of open source architecture, use open stack to drive open stack, go forward from there. So if you're a CTO, avert your eyes, I'm about to show you a picture of a data center without making you sign in NDA. This is the hardware that we run on. So those boxes at the top are two use systems made by Dell, they're the Cloud C6145 series. They have two motherboards in them. Each one of those motherboards has 64 AMD CPU cores and 192 gig of RAM. All the boxes you see underneath those are storage. Each one of those drives, and there are 12 of them on each node holds a three terabyte hard drive and we just build these out as we need them. It's a tremendous amount of capacity. All of those machines have two 10 gigabit links going into one of the Delta switches. Those Delta switches have 40 gig links talking to each pod as we call them. This is our beta deployment, so it only has three pods and they're relatively small. There's a lot of bandwidth here for us to move stuff around. This is what customers see. It's horizon, we did a little bit of theming on it. So it looks kind of cool, but if you're a power user, this is what you see. We're running the OpenStack APIs. We're not doing anything fancy, we're not putting in any different authentication systems. It's all Keystone, it all follows the same APIs. If you're running OpenStack someplace else, you can easily move your stuff to us. We use Glance for our image storage. The only thing we don't do is we don't use Swift. We use Ceph for our block storage. We use Ceph for our object storage as well. And with that, I'll take questions because I'm sure some people have some. Yes, hop up to the microphone if you're gonna ask a question, otherwise I'm gonna have to repeat it. Hi, are you using Ceph block storage and the object storage in the same cluster? No, we have a separate cluster for object storage. So the DreamObject cluster is in a separate cluster, in a separate data center. And then the block storage stuff that we're using for OpenStack is all a separate cluster sitting underneath the machines with the hypervisors on it. You're not gonna join them together later on, or? We may. Currently, Ceph doesn't support multiple availability zones, but once that happens, we might do that so that we can do replication between them. And at that point, we'll probably spin up hypervisor nodes in the data center that DreamObjects is currently in. We're hoping to. Our goal is to shoot for SATA disk performance on all of our block storage nodes, which is a little bit ambitious, but we'll see what it looks like. We're hoping to be able to get it better as time goes on, but that's what we're starting for right now. And I'm sorry, I forgot to repeat the question. Anyone else? How are you isolating your customers? Are you creating a VLAN and a firewall? It's not a VLAN, it's entirely software-defined networking. So literally, every customer gets their own virtual switch. If they do a ping scan of the network that we give them, all they see are their instances. We spin them up a switch, we spin them up a router instance, we have a open-source project that we're working on called Aconda, which is basically just a service VM. We spin up one of those for each tenant, it plugs into the internet, and it plugs into the customer's switch. It is actually a router. It's running routing protocols like OSPF, your V6 stuff is routed straight through, so the address space that's on there, on your switch is internet accessible. By default, we firewall everything except SSH and ping, but you can go in and put holes in for whatever you want. We turn on router advertisements so all of your VMs can get V6 addresses. We also give you an RFC-1918 public IPv4 space on that switch, and then it acts much like Amazon. You get a private net block, all of your VMs can DHCP get IP addresses, and then you do NAT inside of your Aconda router instance. It's all API controlled, and we're actually trying to push all that upstream. It's gonna be an extension for right now, but we're working on getting this integrated in using the advanced networking services, stuff that we're talking about this week, into Quantum. So every one of those Aconda instances gets one public IP address for free, we'll sell you more if you need them, and then you just use NAT rules to connect everything back through. So essentially, everybody gets a fixed IP, you can't SSH to it directly, but you can SSH to your V6 address directly, and then everyone gets one floating IP and you can map it to whatever you want. Load balancing we are looking into, we're not doing anything at the moment. It's something that we're looking at for a later release of the product. We can run things like HA proxy on the Aconda instances, but we haven't gotten to the point where we're going to build VMs running proprietary load balancing solutions. It is something we wanna do, we're just not there yet. You'll have to talk to them about getting the software. Their software is not open source. Obviously it's built on Linux, all that part is, but there are components of it that are not. They're proprietary to the switch vendors. The switches themselves are running Linux. We manage them with Chef. Everything shows up on interfaces. We use Quagga to do all of the OSPF stuff. So we have Chef write all those configuration files and do all that configuration management for it. It just pushed out all the switches, which is really kind of cool. It's not necessary that it wasn't good enough. There were a couple of limitations that the Delta switches have with the Cumulus software that the Orista switches didn't. But for the most part, it switched diversity. Instead of being locked into one switch vendor, which has burned us in the past, with this platform we can run multiple different switch vendors. And so the idea was, let's do that. If something goes wrong with one switch, we can rip those out, replace them with different switches, still running the same software, which is kind of cool. Obviously the Orista switches don't run the Cumulus software. Delta's do, but you can get switches from Acton, you can get switches from like four or five different other providers that Cumulus will install the software on for you, which is kind of nice. And that way, if we have a hardware issue or if we have a supply issue or if a vendor is two months late shipping us hardware, we can buy it from somebody else and work through that way. So it's more diversity than it is any particular feature reason. The Cumulus stuff currently doesn't support a couple of things that we needed that the Oristas did, like VLANs. So that's why we did those for the SAN and for the internet access stuff. But there's no reason why we couldn't, we just chose not to in the interest of time. Yes? Sure, why we wanted to use Seth instead of Swift? We built it. So we kind of have an impetus to use it. That's one of the big ones. The other one is Seth gives you file system storage, block storage and object storage all on the same backend. So instead of spending a lot of time, we're a relatively lean team. So it's about us spending a lot of time learning how to stand up a Seth cluster and learning how to stand up a Swift cluster. The idea was let's just do it all on Seth and use that for everything instead of having three or four different storage technologies. Plus being the people who started the development of Seth, we have an impetus to show the world that this can be used in a production environment. So that is one of the reasons. Also, for the block storage stuff, Seth is really compelling because we can just sit there and continue to add block devices to it and just continue to add commodity hardware nodes to expand it. Unlike with some of the bigger SAN solutions, it's really expensive to get to that point. Other questions? Yes. So for volume service stuff, we are using NovaVolume. We're going to switch to Sender here shortly. But yeah, every volume is managed through NovaVolume, including your boot volumes. So when you pick an image in Glance with the new stuff that is in NovaVolume thanks to the work that was done in Sender, you can say what image you want. It'll do a copy on right inside of Seth because we store all of our images in the same format inside of Seth to create your boot volume and then your VM boots off of that. Plants for solid state storage, that is an excellent question. So there is a lot of desire for solid state storage, but the problem we have is the way Seth works is that it's a lot like Swift in that there's a lot of data moving around. We're worried that with the current state of MLC SSD drives, we'd probably burn through all of our SSD drives in six to nine months, just in the number of writes that we're doing on them. So for right now, no, we might do that later. The other thing is Seth currently doesn't support storage tiers, so it would be difficult to have one small section of SSD instances and be able to utilize them effectively. We could create a Seth pool that is just SSD backed, but then you gotta worry about where those VMs are going to go and because we're just using as a fall open stack, that's not very smart at the moment. So it's something we may investigate down the line if there's customer demand for it, but for right now, we're gonna stick with spinning rust just because it's effective. Yes. An excellent question, our friends at Piston do that. And the reason we decided not to do this is because adding a bunch of storage to a machine that is going to run a lot of hypervisors makes a lot of sense inside of a small deployment but in a large deployment like this, putting 24 drives into a 4U box with 64 cores and 192 gig of RAM becomes really expensive. Whereas if we get a box that is dedicated to be a hypervisor and a box that is dedicated to be storage, it's much more cost effective and that's really what it boils down to. Core count, we have a desire to keep, so we're not oversubscribing on memory at all and we have a desire to keep our CPU over subscription less than two to one. So Intel's current generation, excuse me, current generation virtualization technology is finally at the point where AMD was about five years ago, but their core count is through the floor. The current Sandy Bridge stuff, you can get eight cores. I haven't seen anything that's more than that right now, but the eight core chips are unbelievably expensive, whereas we can get 64 AMD cores for the price of four quad core current generation Xeon systems. So that's really what it boils down to. There are some cool features in the Intel stuff and I'm hoping AMD is gonna catch up on that. Some of the security features, some of the crypto features would be cool to have, especially in a public cloud environment if we can figure out key escrow, but we're gonna have to work through it to get there first. Right now we wanted to be able to support a large number of VMs efficiently and keep our oversubscription ratio down. Other questions? We are currently not using the Ceph file system in production. We may start doing that for log storage, but for right now we're just using the object storage portion of Ceph and the block storage portion of Ceph. We are using Quantum for networking, so we've been heavily involved with the Quantum networking development for a while now. We're using the Nasera plugin for Quantum, so all of the API calls that you do to create networks and spin up switches and plug in virtual NICs off of your VMs into the ports, all done through the Quantum API. Anybody else? If anybody has those any other questions, I'll be at the booth when I'm not in sessions. Feel free to come by, or you can just drop a business card with your question on it and I will shoot you back an email. I promise not to spam you. Otherwise I think that's it. Thanks everybody for coming. Thank you.