 All right, morning, everyone. Welcome to OpenStack Summit. So I want to apologize in advance. I actually had a small mishap with my laptop last night. It has served me well for two and a half years and chose to die on me. So I'm actually up here with no slides. I've got some notes I wrote down. So bear with me if this might be a little bit shorter than planned and I might tend to ramble a little bit. So hopefully not. So a brief introduction. So my name is Russell Hayring. I work at Rack Space. Rack Space, as I'm sure most of you are aware, is a hosting company. So what most of you are probably familiar with us for is our OpenStack-based public cloud. So the largest OpenStack-based public cloud in the world. You probably saw Troy's keynote. But it turns out that we've been doing that. We launched that in 2012. In 2008, we acquired Slice Hosts. We're doing kind of VPS hosting. We've been in the cloud business for around six years now. But prior to that, actually since the late 90s, a lot of our business has been dedicated hosting. And so the way dedicated hosting works is that you're a computer user, you need some computers. And so you call up Rack Space. And you speak to a Rack Space employee on the phone to help you design kind of an infrastructure to meet your needs. And then some guys go out in the data center, they put servers in, they plug them in, they turn them on. We put an operating system on there for you and then we'll kind of manage it for you on an ongoing basis. And over 16 years, we've built up a lot of servers and we've built up a lot of tooling around that. So as you might imagine at first, it's like a guy goes out and puts a CD in there, boots up, red hat, installs it on there. Over time, we develop automation around this. It turns out there's actually lots of servers in the world. So estimates suggest there's between 50 and 100 million servers running in the world right now. And it's only a very, very small percentage of those are at Rack Space. So we're not the only ones with this tooling, right? Anyone with servers is building tooling for provisioning servers, it just makes sense. And there's lots of tooling. So there's open source tooling, there's proprietary tooling and there's a lot of kind of just do-it-yourself tooling. It turns out that putting code onto a server isn't in the end all that difficult. So if you're Google, it just like, it only makes sense to build your own tooling, right? But when you think about it kind of like all this tooling being built by all these different people, it doesn't really make sense. And so about six months ago, we set out to sort of rethink, how can we make our tooling better? What we quickly realized is we're doing this dedicated stuff. We're doing this cloud stuff. How can we use OpenStack to make better tooling for dedicated hosting? And so we kind of went off on our own for a couple months. We were building some stuff. And at the end of those three months, we realized there's this OpenStack project, Ironic, and it makes a lot of sense for us to use it. So since then, we've been kind of working on Ironic. Now, I want to share kind of a dream I have with you. So you probably saw in the keynote this morning some mention of the global cloud, the software-defined data center. I want to talk about the software-defined data center. So this is the idea that today we're using cloud. I can call up, I can get a virtual machine via an API. The fundamental thing that's happening there is we're using virtualization to dice up individual computers. We can achieve much, much higher efficiency by handing out slices of computers to whoever needs a slice of a computer. But we're only doing that at the level of individual computers today. So we want to get to this point where we have a whole data center, rows, just rows of computers. And anyone who needs a computer can call up and can get that computer on demand. And so maybe you want a generic computer that just runs code. That's what we think of the server. Maybe you want some storage. Maybe you want a switch. Maybe you want a load balance. So these are all just types of computers. And so the dream is of having this data center just full of different kinds of computers, generic computers, specialized computers. And you can provision all of it via an API. To make this happen, we need software-defined networking to get good and real. We need better provisioning on servers. There's a lot of things that we need to do between now and then, but the dream is just of this data center that's all driven via an API. And we want that API to be part of OpenStack. So the idea here is that we want to take Ironic and basically drive it in that direction. And so when we first started looking at Ironic around three months ago, what we discovered is that there were really two fundamental limitations of Ironic. And none of them are, maybe fundamental's the wrong word. They're limitations of Ironic that existed three months ago and our goal is to eliminate them. So the first of these limitations is that it doesn't integrate at the network layer. Ironic is all about putting code on the servers. And if you actually look back, like so at one point OpenStack Bare Metal, for example, was able to configure switches on the fly, we want to get there. So what we've been doing since then, we have a team of four developers at Rackspace. What we've been doing since then is working on modifying Ironic so that it can control these switches. So the idea is that a customer calls up and they can provision the server and we will configure that switch on the fly, and we'll talk with the server on what are VLANs they want, just on the fly. The second limitation is around how provisioning works. So Ironic, it actually has a really great design around this concept of interfaces. So there's a power management interface. There's what's called a management interface, which you can use to modify how a server boots. But the really critical interface is the deployment interface. So the deployment interface is where kind of the magic happens. That's what puts code onto a server and makes it run. So today, if you deploy Ironic as it exists in OpenStack today, what will happen when you boot a server is that the server will be powered on using the power interface. It'll be told to PixiBoot, and it will PixiBoot what they call a RAM disk agent. And so this agent will connect up to Ironic, say, hey, I'm here, give me some code, and then it exports an Ice Guzzi target. And so Ironic will mount that target and then write out an image to the volume. And then reboot, and you're good to go. Now, there are really two problems there. One is that we don't feel that it scales because Ironic itself has to do a lot of work. It's constantly mounting volumes, writing out code. And the other problem is that it doesn't give us the flexibility to do the things we need. So like I mentioned, in order for this vision of the software-defined data center to become true, what we need is we need the ability to do multi-tenancy. And multi-tenancy requires more than just putting code on computers. So an example of this is obviously secure wiping disks. When a customer is done with a computer, we need to be able to securely wipe their disks. And today we have tooling for that, and we're trying to think, how can we modify Ironic to do that for us? And we can't do that via Ice Guzzi. There's things you can do, but it's not pretty. And so this has sort of led us to a new model. So what we've been doing is we've been taking the code we were writing before we were working on Ironic and modifying, and it's become what's called the Ironic Python agent. So it's now a part of OpenStack. And the idea here is that instead of booting this kind of simple RAM disk, we boot a slightly more intelligent RAM disk, which runs an agent that has a REST API. And so this REST API allows Ironic to do interesting things, like make a call to the API that says, secure eraser disks, and the server would just go ahead and do that. It can also make a call to the API that says, download image XYZ and write it out on disk, and the agent can do that. So the agent will actually connect up to Glance, or in our case, we're using Swift as a Glance image backend. It can connect up there, download the image, and then write that out. And the final kind of interesting thing the agent can do is it can write out a config drive partition. So who here is familiar with config drive? So config drive is a mechanism that OpenStack, specifically Nova uses, to put user-specific information onto an instance when a customer creates it. So for example, if I want to boot a web server, I can inject some JSON in there, and I can say, roles equal to web. And then when the server boots up, I can have code in the server that introspects that JSON, pulls out the roles of the server, and then causes that server to become a web server, kind of tells the server what its identity is, any other metadata that it might need to know about itself. And so what we can do by injecting this config drive partition is actually make a bare-metal server look more like a cloud server. And so this is, again, sort of another step towards that dream of the software-defined data center, right? So what I want from people here, so today there are effectively five of us plus the Ironic community. We've gotten a lot of great support from the Ironic community, but the Ironic community itself is small, so we want more help. So what I want from people here is I want you to check it out, come, you know, go check out Ironic, check out the Ironic Python agent, check out the changes that we have open to make this dream a reality, and then contribute. So again, there's four developers here. How many, who here writes code? Yeah, so that's a lot more than four. We can have hundreds of you all writing code making the software-defined data center a reality. And that's a big deal, right? You know, how much can we accelerate when we go from four developers to 200? And then I want you to tell your friends. So here are your friends. Wow, that's kind of sad. So I'm suspecting that more of you have friends. And of your friends, probably some of your friends write code. So we can have literally thousands of developers that we can network with right here and have everyone contributing to Ironic, right? So we can build the software-defined data center. Now a lot of you think I'm crazy, right? Because I'm up here like talking about this thing that you probably haven't heard that much about. Like who cares, we're working on OpenStack. OpenStack runs on servers. So it's kind of a fundamental thing that we're thinking, you know, the cloud is the future of the cloud, but what does the cloud run on? It runs on servers. And we're focusing more on the software we're putting on the servers than we're gonna, how we're gonna make the servers run that software. And fundamentally, OpenStack is just software. So this is why it's so important that Ironic succeed as part of OpenStack is because Ironic is actually how Ironic or is how OpenStack is going to run on servers. And so you can see this as the triple O project, for example. It's a project to basically use Ironic to build out what they call the under cloud. And on that under cloud, you run the over cloud. And that's a really powerful idea is that what OpenStack is about is exposing infrastructure. And then that infrastructure can be used to expose infrastructure at different levels of granularity. But we shouldn't be tied to this idea that a cloud is a virtual machine. A cloud is infrastructure. So I live in San Francisco. If you go to the Bay Area and you find a late stage startup, what you're actually gonna find is that more and more they're not using the cloud. They might be using the cloud as a way of getting infrastructure, but many of them aren't. Many of them have their own data centers. And when they are using the cloud or in their own data center, they're using different management primitives to sort of boot different kinds of servers. So three years ago, what you would have found is that you would have found people are creating a web node over here, a database node over here, a cache node over here. This idea of a server equates to a service is kind of dying. You're not gonna walk into Twitter and find them using virtual machines to divide up a computer into web nodes and cache nodes. You're gonna find them using containers. And I don't wanna make a bet on containers. Actually, I think containers are great. But the point of this is that by having these well-defined levels of abstraction around infrastructure, we don't have to make a bet. We make a bet on servers because we know that containers are gonna run on servers. We make a bet on servers because we know that virtual machines run on servers. But whatever the future holds, it probably holds code running on servers. And so this idea that bare metal is dead is wrong, right? Dedicated servers are the past and the future because they're just how they are computers. So the other part of this is that why are people actually using, say, like a virtual machine? And it's really, there's two things. There's, like I mentioned, there's this idea of different management abstractions, the web node and the cache node. The other reason is that they want to purchase infrastructure in smaller increments. Why buy a whole server when you only need half of one? And that makes sense for people who only need half of a server. But when you're, Google is not buying half of a server to save money, right? Google, they're rolling in thousands of servers a day. And so as you sort of scale, as the web scales, people are buying servers in larger increments. And so worrying more, like the amount of engineering that has gone into building virtualization technology is really incredible. And all of that engineering is fundamentally going to enabling people to buy things in smaller increments. And that's great. But like, what if we put that much engineering into allowing people to buy, you know, software-defined data centers? So I keep kind of coming back to the idea of the software-defined data center, because again, it's really the dream. But it's the ways out. So what we want to do is buy the Juno Summit. We want to get the Ironic Python agent, and it's associated deployment driver into Ironic. And again, for that to happen, we need your help. We need to make changes to Neutron. So we're using Neutron to control switches. We're using Neutron in actually a terrible way. I hope we can improve upon that. And Neutron itself has some issues, right? And so we need your help to make Neutron better. We have one guy working on Neutron today. And we're not gonna build a software-defined data center with one guy working on Neutron. So by Juno, we want to have Neutron better. We want to have the Ironic Python agent, and it's associated driver merged. We want to have Ironic able to use these improvements to Neutron to actually drive better control of the networks. Longer term, we can take advantage of software-defined networking, begin enabling more dynamic use cases. Grabbing a switch over here, an F5 over here, and some servers over here. But again, that's the ways out. But even if we can get this into Juno, we will, by the end of the year, be able to sort of enable these use cases around, if I want to deploy a cloud on the cloud, that's easily doable. I just call up, I get some servers, I put a cloud onto those servers, and then I have ways of dicing up that infrastructure if I believe in that virtual machine as a management abstraction. If I want to install Mesos or Docker onto those servers, I can do that too. And so that way, we can run triple O, Twitter can run their thing. I mean, they're probably not gonna use my software, but maybe they will. And so it sort of enables all this choice. You can do whatever you want, because when you can get computers, you have this infrastructure abstraction, you can do whatever you want. You can use computers the way you've been used to using them, but you don't have to worry about, like a crash card in a data center or pixie booting. You just make a call to an API and you get back a computer. And that's kind of the dream. So again, sorry, this was kind of a little bit briefer than I intended. Who has questions? Yes. So specifically on Ironic. Yes, so the question is about how can, or what is the state of neutron in Icehouse as it relates to Ironic? So Ironic today doesn't actually use neutron for much. It uses neutron for one thing in particular, which is controlling DHCP. And that's very important for the Ironic use case because in order to pixie boot a server, you need to be able to control DHCP, or that's sort of the assumption that it's built on. That's really the only thing that it's used for today. That's not actually how we are intending to run Ironic, so I don't know that much about how functional it is. We looked a little bit at neutron, where we're not super impressed. So it uses DNS mask for its DHCP agent and we were not blown away by that. So the Rackspace Cloud doesn't actually support DHCP today. And so that was sort of the model we were looking at is like if the Rackspace Cloud can't run DHCP, can we? And I think the answer is obviously this can be improved upon, but today, it's just not a problem that I wanted to solve. In the future, the idea is actually that instead of Ironic itself talking so much to neutron, it's really about having the vert driver and nova talk to neutron, so that we can maintain these levels of abstraction around creating what seem to be virtual ports and then later Ironic associates those virtual ports with a physical switch. Does that answer your question? Yes. Yes, and so that's actually our objective is that we can use neutron to effectively move servers between our dedicated provisioning network, which will be kind of an isolated network that we use to pixie boot the server to talk to the agent across. And then later when we put the customer code on there, we will move that server to customer facing networks. Yeah, yeah, completely automatic, but that is sort of what's happening behind the scenes. Over here. Yeah, good question. So the question is how much work are we doing to abstract away pixie so that we can do sort of other bootstrapping mechanisms. So Ironic supports again that kind of deployment interface and they have what they call today, the pixie deployed driver. And that's not really actually that specific to pixie. It uses pixie as a mechanism for exporting ice guzzy targets. But if you have another thing that does kind of the same thing as pixie, it's fairly well abstracted so you can easily replace that. Our goal with the Ironic Python agent is actually to get away from sort of these old technologies, what we view as old technologies like pixie, and get into things like HTTP as quickly as possible. So what we're actually doing is we're pixie booting, iPixie and then using iPixie so we can actually download a real image from Swift and run that. But again, the idea is sort of to bootstrap out of old and into new as quickly as possible and to the user it should never be visible that we're using pixie or even a dedicated server behind the scenes. Anything over here? Yeah, good question. So the question is how are we validating firmware after a tenant releases a machine? This is actually I think one of the biggest challenges with trying to run Ironic in a multi-tenant fashion. So the answer is basically firmware signing. So instead of trying to validate the firmware, we're using signing to try to prevent customers from manipulating that. So in our dedicated business today, we're already sort of recycling servers. What's changing with Ironic is that we could potentially, if we're able to deploy it, recycle servers more quickly. And so we believe that firmware signing is critical. We would like to see it get better. So this is actually I think one of the, one of the maybe like two year kind of challenges is that hardware is frankly terrible. Firmware is terrible. The way that signing works is terrible. Pixie booting itself is terrible. So we want to fix that, but to fix that we I think need to be bigger, right? And so by, so I mean, so you contribute to Ironic. So for example, there's a C micro driver that uses kind of interesting C micro hardware. We would love to see others sort of trying to build interesting hardware that integrates with this idea of the software defined data center. So the C micro thing, it's a box of computers. It exposes a REST API. It's like a little mini software defined data center. So I'd love to see hardware move in that direction. But another part of that is gonna be that if we want this, this virtual data center, we need better security. And so the senator process back there. Right, so the question is, so I mentioned that we can erase disks using the Ironic Python agent. And the question is, is there an intention to be able to manage firmware as well? And the answer is yes, absolutely. So first of all, we consider that sort of a core functionality that must exist. If Ironic is going to be successfully multi-tenanted, we must be able to securely upgrade and manage firmwares. But more generically, there are many things that might happen on a host that we can't, I can't stand up here and anticipate what kind of hardware you might decide to put into a server tomorrow. And so the idea of the Ironic Python agent is that it's actually extensible so that you can sort of discover hardware in your machine and then manage it however you think it needs to be managed. So if firmware upgrades are necessary, and they probably are, you can do those and that's sort of a core functionality. But if you have a Bitcoin mining ASIC in there and if you need to reset that, for example, between customers, you could extend the Ironic Python agent to do that as well. Back there. So the question is if we're working with hardware vendors to tie into their lifecycle controller. So at Rackspace, we work with many hardware vendors, but more specifically in the Ironic community. So actually HP is one of the largest contributors to Ironic. C-Micro again was working on their C-Micro driver. So one of the visions of Ironic is actually this idea of vendors contributing code that works with their hardware. And I actually think that's really powerful because it means that we can get things like these really cool C-Micro boxes. And it enables hardware vendors to innovate more quickly because they can expose this functionality to customers. Like if you're running Ironic, if you just roll one of these things in and plug it in, suddenly that functionality is exposed instead of you having to figure out how to work the REST API. So I think that's really powerful. Yeah? Right, so the idea here is that you might build, the same way that you build an image for a virtual machine, you could build an image for a bare metal server. And obviously there are some differences. For example, one of the things that we've been discussing is how do we support hardware versus software raid and things like that. But fundamentally what we wanna get towards is this idea that images are images and those images might run on a virtual machine and they might run on bare metal. And so whatever tools you can use to build an image today for a virtual machine, you should be able to use to build an image that runs on bare metal hardware. Yeah? Yes, yes, absolutely. So the agent today speaks HTTP and you can point it at any URL. So if you can put an image on HTTP, it can grab it. If you want to extend the agent to do more, to speak other protocols or to do interesting things with images or some other data once it has them to provision the server somehow, that kind of makes sense to me. Over here? Yeah. So that is how Ironic works today is you have servers and Ironic kind of manages the servers but it doesn't really touch the switches. And that makes a lot of sense in the kind of triple O use case, right? Because you effectively have one trusted user that's going to install OpenStack on the under cloud and then expose that OpenStack to other users. In that use case it makes sense if we want to have multiple users using this, most users are going to want some sort of network isolation or at least we want to be able to move servers between the sort of provisioning network and then the user facing network even in like a triple O world. We don't want triple O on the provisioning network, right? Like once the under cloud is running. So I think that it's critical that we be able to manage to the network at least for the, both for the multi tenant use case and for the way that the Python agent works. In a very trusted scenario, I guess I'm sure you don't have to do that. Over here, yeah. Yes, so that was actually the intention for the Ice House release. Unfortunately, there were sort of two things that blocked it. One is that there needs to be a migration path for users of the bare metal project today. And the other is that we need better continuous integration so that we can test Ironic in order to actually be land the vert driver into Nova. Right, did I answer your question? So he's the PTL, so. Over here? Right, so what we've done is we've actually developed our own thing. And I'm not very proud of it. So I think what we need to do is we need to sort of go back to the drawing board and figure out should we be using ML2? The way our thing works is I guess vaguely similar to ML2, but it's a lot simpler. And ML2 didn't actually solve the problems that we had. So we kind of went off and built our own thing. The way that any plugin to Neutron works in my mind isn't that pretty and kind of needs to be reworked a little bit, but I think ML2 is clearly a step in the right direction. Right, and that's certainly critical because you want to be able to have different kinds of network equipment or even different kinds of controllers. So it's certainly critical. Can you elaborate a little? Right, so what we've done today is the plugin that we've written for Neutron is specifically targets like NXOS. So we run, we just basically SSH in and run commands. This is sort of another area where, like I mentioned, hardware is terrible. Switches are basically in the same boat, and they're getting a little bit better. People are trying to expose REST API, Cisco has some kind of Python client maybe in theory. Again, hardware is terrible. So we would love to make that pluggable because we recognize that obviously in the near future there's not gonna be a standard interface for configuring switches. I mean the other thing is that as we look forward there's going to be like SDN controllers that we need to use to accomplish sort of the same use case. So it certainly needs to be pluggable. Today it's not. Right, and that's one of the reasons that's what kind of ML2 does is it abstracts that away a little bit and we'd like to sort of leverage that to accomplish this. I have one last question right here. Right, so there are standards and again, hardware is terrible and they're not good. And realistically hardware that claims to support those standards in many cases doesn't and most hardware doesn't claim to support a standard at all. So what we've done sort of in recognition of that is that the Python agent abstracts that away a little bit and so the idea is that you sort of say like here's a firmware, please flash it. And you can write code that detects your hardware and then flashes it in whatever way makes sense for that hardware. There are real challenges there. Like some hardware supports out-of-band firmware upgrades and the agent is kind of like explicitly in-band so the agent can't make a reasonable decision around that. There's other, I'm sure there's things that the agent, for example, some things require multiple reboots as part of the flash cycle and that's very difficult to keep track of state when the state is like on the machine that you're rebooting and all in RAM. So there's work to be done there. We would love to see more standardization but we recognize that in real life that's just not gonna happen, so. Exactly, yes, it's a driver model. One way in the back, yes. Right, so the question is about peripheral firmware. So there's not actually just one firmware on a server. There's firmware in the network card, there's firmware in the disks, there's firmware all over the place. That's one of the real challenges again. So we think that the solution is signing so that it happens today for some firmware but the actual signing could be better. So that's another place that we would love to see some iteration from hardware manufacturers. It's not impossible today, it's just not great. It can't be validated but it can be signed, right? So I will be around the summit all week. We're having a series of design sessions related to this and just to ironic in general to stop by those design sessions if you're free. Track me down, I'll be around. And if you've got questions, yeah, it's hard to find me in the hallway after this. Thank you everybody.