 Thanks, so you can read the Twitter thread that kind of started this off. Kobe thought that Dr. Nick was going to propose something that was about OpenStack stressing him out, and Dr. Nick said, no, that's just a good idea. And so I decided to take up that mantle. Kobe's talked about me. I'm a pointy-haired manager. So, you know, take everything I say with a grain of salt. We finished reading the joke, it's good. All right, OpenStack stresses me out to no end. There's a great quote from Jamie Zawinski, probably, called, that some people when confronted with a problem think, I know, I'll use regular expressions. And now they have two problems. Well, substitute regular expressions for OpenStack and two for 99, at least. OpenStack is turtles all the way down. There's a term in the web development community of a full-stack developer. And I really respect the full-stack developers. Full-stack developers do CSS and JavaScript and HTML and some middleware and databases. And maybe a little bit of operating system stuff. But that's not the full-stack. Full-stack keeps going down and down. There's hardware, there's networks, there's security, there's all kinds of stuff. I mean, you get into data centers and power and cooling and geo-replication and all kinds of stuff that you don't really always think about. There's a lot of complexity that we built over the last few decades. And unfortunately, OpenStack touches pretty much all of it. So the first area I want to talk about is hardware. Obviously, if you're deploying a cloud, as my team does, you want hardware. And you think, well, that's great. I've got hardware and I'm used to that. And it's stable and it's useful and whatever. Except, you know, I'm buying a crap ton of it. And you think, that's what my data center looks like. It's beautiful and everything looks the same. And I went to my vendor and I specced out this SKU. And I said, hey, this is what I want to buy. And they said, great. We'll give you 10,000 of that server. Awesome! I can't wait to have 10,000 of my server all in my data center and ready to go. Except, it looks like that. On the inside, right? Somebody grabbed the wrong RAID controller and stuck it in. And it's just got a little bit different version. And it works just a little bit differently. The API is just a little bit different. Performance is just a little bit different. And you have to deal with all of that. And it can be a real pain in the butt. Oh, wait! Let's have more than one vendor! Okay, that's not helping. Because now I've got multiple servers for computes and storage and all kinds of things. And again, you have to deal with that across many data centers. And then, of course, you have the movement of time. And versions change over time. And things change over time. And so you have to deal with many versions. And, oh, well, we don't sell this processor anymore. We sell this processor and this thing. And the features are different. And you have to deal with all of these things. And it's exciting. So one solution for this is hardware automation. My team works pretty hard on automating not just the configuration of OpenStack, but the configuration of hardware. And when we see differences and we see these idiosyncrasies, they go into our configuration management system just like how we're setting up NTP. And that becomes really useful. Because then when this weird piece of hardware shows up, oh, Chef says, oh, hey, I've got that. Take it over. So that's been really useful for us. Initially we actually used Cobbler for this. And it started getting unwieldy really, really quickly. Chef is able to handle a lot better. Just the ability to detect. Kind of like detects the differences better. And you have a lot more capabilities of handling it. So hardware automation is great. Next is good vendor relationships. Really working with our vendors. We tour their facilities. We talk to the actual people building the hardware. Work with them, determine your needs and this kind of thing. And the really good hardware vendors out there will work with you to get a good SKU and get the type of system that you want. So that's also really helpful. The next area is networking. Networking is not simple at all. There's a lot that can go wrong. Somebody plugged this thing into the wrong switch. You got to plug this server in two switches. It's only in one switch. And then when you lose the one switch, oh, well, there goes any kind of redundancy you have in the system. Then you have the issues of scaling the network and dealing with tenant management and this kind of thing. There's lots of problems around making sure that your multi-tenant OpenStack deployment actually handles all your tenants and gives good quality of service across the board. And so things that help here are really good telemetry on your networks so you can know what's going on. You know that, oh, hey, I've got a spike in utilization. What is this? Where is that coming from? Okay, now I know where it's coming from. What is it? And it helps you detect things really quickly. Early on, we didn't have a lot of this. And when you get a compromised VM that starts going crazy and starts packet storms and this kind of thing, you're kind of like shooting in the dark. Having mechanisms to actually figure out what's going on is incredibly important. And also having quality service features, being able to rate limit tenants and say, hey, slow down, dude. You don't need to, like, send packets out that much. It's also quite useful because you don't, you know, you can have a single VM go nuts and take out a whole data center if you're not careful. Software-defined networking. You'd hope this would make things easier, but yeah. It provides a lot of the things that you really need in a multi-tenant environment. Security, isolation, tenant isolation, quality of service, monitoring, metrics. You get a lot more control over the system now because it's just software and it's a lot easier to deal with. So when you're configuring things, works great. So OVS is awesome. It's got one kind of fatal flaw right now. It's single-threaded. Well, thanks, OVS. I bought this 32-core machine. It's 3.5 gigahertz. It's awesome. And you're using one core. Thanks, dude. So that will, you know, eat that one core, not distribute it to the rest of this awesomely powerful machine, and you start dropping packets. It's really exciting and makes for unhappy tenants. How do you solve that? More VMs! You can virtualize and stick OVS in VMs and having it run on the same machine, and then you can distribute it. And that works in our testing. Hopefully we actually figure out how to do it. But, of course, then you have more turtles. The other option is to wait for the patch, which actually has been submitted, and hopefully we see it in Linux kernel at some point so we can solve these kinds of problems. Deployment. Deployment is... Deployment's exciting. My team actually deploys lots of data centers. And, you know, they're never the same. We move incredibly fast, and that's kind of problematic sometimes. That's kind of my own problem. I highly recommend for deployment standardizing and not getting multiple versions of OpenStack out there because that causes all kinds of untold issues. Yeah, so WiiU Chef. And my team actually started the Stackforge repositories. And we've got a lot of other great groups working with us on those. I think there's about 50 Chef cookbooks and lots of recipes in there. They're uncounted because I didn't count them. The problem is when you have that many cookbooks is you have eventual consistency. And eventual consistency takes a long damn time if you're doing this over lots of servers. And, you know, Chef doesn't have a really great way of having dependencies between these cookbooks and really telling you what's going on. So what do we do? Well, we need to orchestrate Chef. And there's lots of good tools out there. We're actually considering SaltStack to basically give us the ability to manage Chef and tell Chef, OK, now you can do this and now you can do that and now you can do the other and really help bring the time of deployment down and just the whole organization around it. Yeah, pretty important. The other thing we have is RabbitMQ. Oh, my God, this stresses me out. I don't know if my experience has been as great as Sam's. We are pretty hardcore in our deployment methodology whereas after we deploy we start turning stuff off and breaking things and making sure that things keep working. And, you know, for the most part it works really well. I can shut off my L3 router and it keeps working and I can shut off parts of my Galera database cluster and it keeps working and I can, you know, turn off random servers and they keep working because they're all HA and freaking RabbitMQ, not so much. They don't want to join back and they're... They're cute. You think Rabbit's great. You're like, awesome, it's great. I can use this to, like, have this nice message boss and it's great architecturally and then it falls over all the time when you least expect it and it won't join back and it's into its cluster. And then you're like, stop it, Rabbit! You're killing me. So what's the solution to this? Well, there's other queuing systems out there. I know Red Hat uses Cupid. There's ZeroMQ. There's NanoMQ. There's plenty of queuing systems out there and how do you solve this whole queuing problem? And honestly, we're not going with any of those things. Cupid, I don't know a lot about ZeroMQ. I started researching and found people complaining about the exact same issues that we're having and NanoMQ is the creator of ZeroMQ saying, NanoMQ sucks, I now must write NanoMQ. So of course it's brand new and he's got, like, a design and that's about it. So what do we do? The solution is to stick to the Vorpal Bunny you know and not the one you don't know. And for now we have Rabbit running fairly reliably with an incredible amount of work. But it is a pain in the butt. We have lots and lots and lots of testing. Next on the list is storage. Storage is actually a pretty good story I think. That hasn't been a thing that stressed me out too much other than the fact that there are about a billion vendors for storage. And I talked to a lot of them. And coming out of that and figuring out, you know, what you want to use. And there's a lot of really good solutions out there. They all have interesting stories. Evaluating them in some sane manner is, it can be pretty difficult. But I think the most important thing is, one of the most important aspects of OpenSec, in my opinion, is storage and having, you know, people be able to have reliable data. And knowing that that data that they store in some place is still going to be there tomorrow. You can lose a lot of other things. You know, I can lose a VM. I can lose the network. But if I lose your data, you're going to have problems. And so really you want redundancy as part of your solution and you also want redundancy. So there's lots of great solutions. GlusterFS and Ceph, we had great talks on those. There's certainly hardware solutions out there that you can go with as well. Including some companies are actually being really innovative as well. Not just the big incumbents. So those are, you know, those are pretty important. So Evolution is a really big one and one of the biggest things that stressed me out with OpenStack. And it's the fact that this massive code base is changing over time incredibly quickly. And, you know, doing releases across six month periods of time isn't necessarily the easiest thing in the world. You're having database migrations and mass code changes and all this kind of thing. And it can become incredibly difficult to keep up with that. And if you, the further and further you get behind, the less and less patches are going to get backported or anybody's even going to care. And developers, I don't think, tend to test very big migrations, right? You've got a lot of people testing the very small changes. Not many people change, testing the big changes. And so what we need to do is have mechanisms and Dr. Nick talked about this in his talk, is, you know, you need to be able to embrace change and continuous deployment is the way to do that. You know, I think, I love the work that's going on with Triple-O, having the ability to continuously deploy is really important because then you can keep up. It's not easy, but, you know, one of the best projects I worked on for this was buzz.com. We deployed every day. And that was really great because it was always just a very small amount of change. And, bam, if something broke, you knew what it was because it was like, you know, 20 lines of code. So it makes you being able to react a lot better. And you also get really, really good at deployment. I think that, you know, if you suck at deployment or if deployment is hard, don't do it less because that means you're going to get worse at it, not better at it. So deploy, deploy, deploy. And it's, you know, it's certainly an unsolved problem. I got that nugget of wisdom from Dr. Ney. It's a hard thing to do, but I think it's also an important thing to figure out in this open-stack world. And I think also, along with that, is learning to migrate, learning to, and what I mean by migrating is migrating tenants. Because, you know, I feel like even with continuous deployment, you're going to run into situations where you're like, oh, shit, where do I go? What do I do? How do I, you know, go from, say, you know, network multi-host to a full SDN solution, right? That's a very difficult task. And you may not be able to do an in-place upgrade. So how do I take my tenants and move them over transparently and make everything happy? This is a problem I wish we would have solved years ago in our deployments. I think it would have reduced a lot of headaches that we had. I don't know if there was a way to do continuous deployment in the Diablo era, but migration you certainly could have done. Lastly, we have one of my favorite pain points, which is OSI level eight, layer eight. I don't know if you've all heard of this one. It's called Tenants or People. It's a pretty difficult layer to live with. I feel like my clouds would be a lot cleaner if I didn't have anybody on them. Right now they look like that. Tenants are fun to work with. I actually do, excuse me, a lot of tenant engagement and teaching people in my company about cloud and how things work and how you architect things and what to do. Some people really embrace the cloud model and some people say, give me a terabyte of RAM on my VM or other crazy stuff. Oh, great. Thanks for the virtual servers. Can you put this hardware in your cage and plug it into my virtual server? No, thank you. You also have issues around security. Security is a huge area of attendance. They don't know what they're doing. And it is very scary saying, hey, here's a VM and your root. Go do what you want to do. There's nothing to stop that person from going in and saying, this stupid SSH key stuff is dumb. I want to use a password. Passwords are hard to type, so my password is password. Guess what? That doesn't work for security. You get hacked within minutes and I've seen it happen where people go in and change it and they're done. You have hacked VMs and it really can make a mess of things. And so it's pretty exciting. Or you install Tomcat with all the default settings. We won't change anything. PHP My Admin. There's lots of tools out there that are great, but they're by default very, very insecure. When you're allowing tenants to go in and do whatever fun things they want to, open up every port in a security group, you open yourself up for trouble. So I think an important thing here is education. You know, you need to get really good at educating tenants, teaching them about security, teaching them about how to architect cloud, giving them good documentation around your cloud, and doing as much hand-holding as you can. Because really without the tenants, you're kind of nothing. You've got this awesome cloud that nobody's using. It's really important to teach them. The other solution that I like is Pass. I love the idea of Cloud Foundry, Heroku. I use it all the time. You get rid of a lot of those problems, because you can make the infrastructure decisions. You can decide how your database is secure. Hey, I can't log into it and have an SSH shell if I don't want one in my Pass solution. And you can help people just deliver applications. Pass is one of the fastest ways to get there. And I'm sure there's a hell of a lot more stress involved in actually managing one, but hey, it's fun. Anyway, OpenStack is a lot of work. My team spent three years working on it. And it's only getting bigger. It's getting better in a lot of ways, but there's always new stuff there. There's a lot of great stuff that's coming along that I'm really excited about in OpenStack. And I think almost the most important thing when you're working on OpenStack is having a really great team. And, you know, in working with the OpenStack community, one of the things that I've really enjoyed most about it is being with a group of people that really care about open source and care about contributing, and the community is very embracing of other people. And that's really helped the journey that we've had. And there's my cloud. Thanks. Any questions?