 Let's find a 150, so I'd like to get started. My name is Joe Gordon. I'd like to talk about PIP My Cloud. It's a bit of a silly name, I know. But more importantly, I bet Nova configuration and hints and tricks. I'd like to make this session a bit interactive, so if you guys have any questions throughout, feel free to ask. So a bit about me. I'm an engineer at cloud scaling and an open stack contributor and a player. And I've been working a lot on Folsom and mostly in Nova. So I know you guys seen lots of stats out there about Folsom and Nova in particular and about who wrote what and all that. I'd like to give another review of that and some different numbers out there. So this is a chart of the number of lines of Python code throughout the past three releases, Folsom, Essex, and Diablo. And you can see over time it's gone much bigger. But on the bottom here you see the code churn, which is measured by number of lines inserted and deleted, according to Git log, has actually gone down. So Essex has enough inserts to rewrite it twice, or no, sorry, 1.2 times. And a Folsom went down to half of that, so only 60% inserts compared to lines of code. So I think that's a sign of a bit more maturity in this, I hope, and the code's changing less rapidly, which I think is a good sign. I'd also like to take a minute here and thank all the other developers who worked on Nova. This is a list of the everybody head over 10 commits in Nova Folsom. There's 28 guys. I know there's been 200 people who commit overall, but it's a smaller group that actually have been involved throughout the process. And we won't be here today without everybody up here. So hopefully you guys have all seen this. Ken Pebbles around here somewhere. This is the big scary diagram of what OpenSack looks like today, and we're going to talk about the big component in the middle, Nova. This diagram itself is also pretty big and a little scary. And we're going to talk about how you could change this and customize it and do more than just sort of the standard basic things that everybody else does. So Vanilla OpenSack. Everybody runs DevSec. I'm sure everybody here has run it, if not, it's great, and you should try it. It's used all the time for testing as part of the gating framework. So every day it's run hundreds, if not thousands of times. It's all in one. It uses a RabbitMQ for backend, KVM, and MySQL. And mostly default config options. And it works great, but it's focused on mainly development environment that doesn't do that much. And you could customize Nova a lot further. You could choose a hypervisor, any hypervisor, any database, any message queue, any network model, 500 config options to play with or try not to play with if you scare to them. And you could also swap out any of the services that we see here with anything you write. We'll talk about how you could do that and what the implications are. So what makes OpenSack open? So there's three RPC backends, three plus database backends, six plus virtualization backends, and 500 config options. If this isn't open, I don't know what it is. You can run anything on the backend. Anybody could put another driver in, and you've seen that happen time and time again. There's been a few new backends for Folsom. So there's a new 0MQ backend, which is a messaging RPC. There's also two virtualization layers that've been added in, Hyper-V, which is a bit of infamy now, that it was cut out in Essex. But it's back in, and it's working great, and it's being maintained now in Folsom, and Power VM, which was put in by IBM. On top of that, 20% of the config options are new. So that's a bit overwhelming at times, but a lot of them are broken down in the categories, and we'll talk about that. While we're talking about new things in Folsom, this is a breakdown of the main features by Blueprint. So everybody, we use Launchpad to track progress in major features, and a new feature is called a Blueprint in Launchpad. So the new one that I actually worked on myself is General Host Aggregates, and that allows you to assign metadata to compute nodes that could be used in scheduling. So you could say, for example, this compute node should be running Windows VMs, and you could schedule all of your Windows VMs on one compute node or two instead of spreading it out all over the place. Another nice feature that came in was being able to disable API extensions. There's a core Nova API, and there's a lot of extensions on top of it. And until now, it was very hard to disable those extra API extensions if you didn't want to use them or support them. And now it's easy enough as you could say, I don't want to support that. I support that, and disable what you don't want. Another new one is the RootWrap Plugable Filters. RootWrap is how we get around running Nova in a non-root user, and while we need to run some commands that require root. And so this is an extra module that actually will run the root commands on your behalf. And it's gotten better before. To change it, before to add a new command to the RootWrap, you had to actually go into the Python code and modify it, and now it's all in a config file that's nice and easy to modify for any operator. It's also much easier to package for a distributor. Multi-schedule support. There's a big nasty bug in Essex, which you couldn't run more than one scheduler. There was a race condition where you had two of them, and they wouldn't know two are running, and they could potentially schedule over or subscribe to single compute node. And so that's been fixed in Folsom. So now you could actually scale out as many schedulers as you want in your cloud. Flavor extra specs. This is also used for scheduling. This means that in a flavor or instance type, it's another name for it, you could assign any extra information into that. It could be used for scheduling. An example of this is you could actually have some extra specs that could be used attached to host aggregates. So you could say, if it has this extra spec, it has to go on this machine. Or you could have extra information that goes into this use for the hypervisor back end. So this must be a KVM instance or things like that. LVM, Ephemeral Disk Images. In the past, we've been using QCOW 2 for back ends, and that's very slow. And there is a big push in Folsom to support LVM in the back end instead. That gives you roughly up to 50% speed up. Project-specific flavors. This is also everybody has, if they've been using the cloud, flavors have been global. And now you could actually have a flavor per tenant. And that makes it easier so you could have multi-tenant different flavors per tenant as you choose. And this can be combined, once again, with flavor extra specs and other things for advanced scheduling. Multi-process API services. I don't know if any of you guys have seen this, but the API services in the past would all run on a single core. And that was because it was all running, even though there's parallelization in the API service, it was all running inside of one process. And now you can actually have one API service endpoint running as many cores as you want and many OS processes as you choose. So let's talk about some of the back end choices you have. So the basic ones everybody uses, Rabbit, KVM, MySQL. But there's a lot more options than that. There's three supported RPC back ends. There's Combo, which by default uses RabbitMQ. Although I've never tried this, but you can actually do Redis, Combo, and other things that I don't think anybody's tried or wants to try. But it does support all kinds of advanced back ends if somebody wants to try something crazy. It's built on AMQP091. And it was contributed to OpenStack by Rackspace. And it's supported by VMware now. And it's written in Erlang. There's the number one that came out after that, which is QPID. And this is contributed by Red Hat. And it's supported by Apache Foundation. And it's a different version of AMQP. And from what I hear, that Rabbit is not going to be supporting AMQP010. So that's why they have a separate back end for QPID, which is a different AMQP protocol. And it's written in Java and C++. The newest back end, which my company helped push in, is ZRMQ and it's broker-less. And it's written in Python and C++. And iMadex is a strange company name. I've never heard of it myself, to be honest. But it's the company that's been pushing out ZRMQ. And these are the guys I've also worked on, the AMQP protocol as well. So once you pick your RPC back end, you have a bunch of database options. SQLite, which is used commonly for testing. And this doesn't really support that many things, but it's simple and it's easy to get up. And that's why everybody loves it. No simultaneous writes, it doesn't really scale out. No high availability options. The common one people use is MySQL. That's what everybody uses, as far as I know. It supports simultaneous writes. All kinds of high availability options are available for it. And lastly, there's also Postgres. I'm not sure if anybody's using Postgres in production, but because all the databases behind SQL Alchemy, you could write any database that SQL Alchemy supports. And Postgres is a nice example of that. There's actually been some code fixes for Postgres in particular. Yeah? Thank you. So as soon as he uses Postgres, do you guys have any problems with it? So that's the benefit of using something like SQL Alchemy to abstract out which database you support. The last column here, the native Python client. This is a bit of a strange case here, and this I'll get in a little bit later. But it's related to some problems with EventLit and MySQL. EventLit is the coroutine process that OpenStack uses. A coroutine is a way of running a function asynchronously. So you start the function and you don't care about when the return value is returned. So you continue through the code. And this allows you to be able to handle an example as the API server gets into that request, and it'll spin off a coroutine to handle that request while they continue listening on the main component. Unfortunately, MySQL, the binary, does not support EventLit because it's C and it's not Python. And so Python can't monkey patch the C module. And so it actually breaks the coroutines, and everything becomes a serialization point every time you use MySQL. And that's why there's actually a native Python client. So that means instead of actually shelling out to the MySQL back end, the C back end, you can actually use Python, and this fixes the coroutine EventLit problem. But it provides some nasty performance changes. Once you pick that, you have the big one, which is virtualization options. And you have all kinds of them here. I'm going to go through starting at the top, which is the lightest weight to the biggest ones. You have the bare metal driver, which is in-fold, so it's getting better and grizzly. There's some talks about it going on today and yesterday, I believe. And it's not really a hypervisor, but it sort of counts as one in this case, so it's a bit strange. It's fast, and it's open source, and it's obviously native speed. You have two sort of lightweight ones, UML, which is user mode Linux. And it's power virtualization only, and it's slow. It's supported by, it's maintained the Linux kernel. And I'm not sure if anybody actually uses it. I know people do use LXC, which is Linux containers. And this is not really full virtualization either. It has some major limitations about you can't run a Windows OS on top of in a Linux container. I believe you can only run the same kernel as the main OS. But it's very fast, and it's used, I believe, on the ARM version of OpenStack. And once again, it's also the Linux kernel. And you have the two big ones that everybody uses and always talking about which one is better. I think they're both great. Zen and KVM, and they're both primarily used in full virtualization mode, and they're both very fast and open source, and many, many people use them. You have QMU, which is commonly used for running VMs inside of VMs. So if you have the problem of running, you have a VM, let's say, a DevStack VM, for testing. You want to run a VM inside of it, unless you use, I think it's KVM in the three, two, or three, six kernel. I don't remember which. You're stuck with, you're unable to run KVM inside of KVM, so people use QMU, which is very slow, but it makes it very easy for testing. The last three you have here, some of them are new. Hyper-V, Power-VM, VMware. These are all the proprietary, very fast, full virtualization drivers that have been added recently, and there's plenty of vendors out there who will support you on them. So once you pick your back ends, you have a bigger challenge, and you have 500 config options. So how do you deal with that? By default, the defaults work very well for this. So a lot of this is more if you want to see what's going on or potentially try different things. So this is a breakdown of the different config options by type. So there's several types of config options. Booleans, true or false, enabling a service, disabling a service, adding a feature in or not. Floats, this is primarily used for some sort of configuration, allowing you to adjust a parameter, very easy to find. Finally, the example is how much oversubscription do you want for per CPU core? So that's actually a float, so you can do very strange things and tune things very finely. You have integers, which is primarily for related to time, or how often do you want to do this? How long should the interval be between this operation and that kind of thing? Lists, which are used for all kinds of things. Multi-String, which is an interesting one. If you specify the config option twice in the config file, both strings are read. And so this is used, as I'll show you shortly in a few examples, and the big one, which is string. A lot of new config options are false, and we're not going to talk about all of them. This is actually a subset of them. I grouped them together, but it's big and scary. Thankfully, a lot of this is actually broken down by the file that it's used in. So if you look at the sample Nova Conf file, it's easier to break everything down on what's used where. So let's say you want to do some crazy things with the virtualization options. So the defaults work very well, as I said, but you could modify everything you possibly want. The first thing you would do is you pick the RPC topic. So there's a default topic for all the RPC components. And so the standard RPC topic for this is compute, but you can name it whatever you want. I'm not sure why you'd want to do that, but if you have something else in your RPC back end, you may want to switch it. Once you pick RPC topic, you want to pick the compute manager. And so there's one compute manager in here, but potentially this is an example of where you could actually replace the whole compute manager and put your own code in there. So you can see it's actually just a full Python path, a Nova compute manager. You could say custom, Nova manager, whatever you wrote. And that's a place where you could actually inject your own components. And we'll talk about that shortly. Once you pick the compute manager, there's a bunch of options, which we'll see in the next slide. But the big one is the compute driver, which by default is Livert, but this is where you say KVM, Hyper-V, that kind of thing. And because we're using Livert here, we actually have a bunch more options. Because Livert supports KVM, LXC, and a bunch of others. And QMU. So in the compute driver options, you actually have to specify KVM. And you have a few other advanced options like Livert CPU mode and CPU model. And this is about what kind of CPU that the VM should see. So it should be a native one or it should be a specific one that you want to specify. So you're saying x86, but you can specify a slightly different version in the VM. I actually haven't tried this option personally, so I'm not sure how well it works. But this is a lot of these advanced options, a lot of these are use-it-your-own-risk kind of things. Not all of them have been tested very well. The defaults have been tested very, very well. But some of these other ones haven't been tested as nearly as well. So here's an example of actually what the config option code looks like in source. This is taken off the GitHub page. And we're looking at two in particular here, or three in particular here. Reboot timeout, which allows you that if something times that during rebooting, you could do something with it. So put an error state or instance build timeout. If you're building an instance and it takes too long, you'd say put an error state instead of just letting it sit in building forever. This is an example of something you potentially, per deployment, you'd want to set. If you care about cleaning up these VMs that get in a strange state, it's nice to put them in error state if they get stuck in building. The one we're going to talk about in the next slide, which is running deleted instance action, this is a nice wonder if you have a compute node running a VM that should have been deleted. According to the database, it's gone. What do you want to do about it? This depends on your specific use case. So if you're running your private cloud, you probably don't care about this. You probably don't want to delete the VM, because it's probably important. And something broke, and you don't want to delete it. But if you're building customers for it, you may not want to run these VMs taking up precious resources. So let's say you want to actually delete the VMs instead of just have a warning, which is what is done by default. There'd be one flag you'd change primarily, and there's another flag you could change as well to adjust it. We're going to talk about those. So there's a flag for running deleted instance timeout. So after instance deleted, how soon do you check to clean it up? Then on the bottom here, running deleted instance action. So what do you do when you find an instance that is deleted? What happens is the VM is deleted in the database, but it's still on the compute node somehow. So what do you do? Every so often it's a periodic task that'll compare the database against the actual state. And when it finds a discrepancy, it'll act upon this flag. So you tell it to repin, it'll actually go ahead and delete the VM instead of just putting a log file out there for you. So you do a similar kind of thing for the scheduler options, where once again you pick a RPC topic. And once again, the default, as far as I know, is what everybody uses. You can pick a manager. For some reason, if you want to swap out the whole entire scheduler component, you can put your own manager instead. You pick a scheduler driver. This is the standard driver, which is the multi-scheduler. And this is because in Folsom and in the past, Cinder was inside the code too, so you actually had to have a scheduler for volume and scheduler for compute. And so that's what the multi-scheduler does. It allows you to have different scheduler backends for different kinds of components. And this should be going away in grizzly, it looks like, as the volume's code gets pulled out. So once you pick the multi-scheduler, now you can pick the actual three schedulers you want. So the three schedules are the compute scheduler, which is the main one, and the one people will be changing the most. The volume scheduler is only if you're using Nova Volumes instead of Cinder. And the default scheduler, in case you have some other kind of service that the scheduler should be running. So if it's not a volume, it's not a compute node, or a compute command, then it'll go to the default one. So the default compute scheduler is a filter scheduler. And this is a wait and filter process. You have a list of compute nodes. You filter them by various parameters, such as how much RAM is available, what availability zone it's in, its capabilities, image properties, and such. And then you go ahead and you wait the resources, the remaining compute nodes. And the default way to do this is you do it by free RAM, so you could have dispersion or affinity. So if you want to spread out all your VMs across first instead of putting them all in one box, so you either fill up one compute node at a time, or it can spread everything out first. And that's another option that is, I don't think I should put it here, but you can set that as well. So once you pick your filter schedule, you have to specify your filters. And some of the filters have further commands, further parameters. So here's an example of the RAM allocation ratio. This is how much you want to oversubscribe the RAM. So if you want to have no oversubscription, you set this to one. If you want to have high oversubscription, you set this to 20. If you want to have infinite oversubscription, you just disable this filter altogether. And then lastly, we have the resource tracker. The resource tracker has been added in in Folsom to prevent the problem of the race condition of having multiple schedulers. And so here you could actually specify a few parameters that were originally in some of the actual filters themselves. And this is the reserve host memory. How much memory do you want to reserve for the host? This is used by the RAM filter to make sure you could say, I want to keep this RAM separate from the pool of RAM that I'm scheduling VMs for. So here's an example if you want to add a custom filter. So you can write your own custom filter. You can look at the examples that exist already in the Nova code. And you can modify one. And unless you don't want to push it upstream for whatever reason, you can actually run your own and add it in very easily. So the first thing you do is you modify the multi string option here, scheduler available filters. And this is a list of all the places to look for the filters. And so the multi string, because you could use it twice, you could just add in, if you use that flag, it won't overwrite the default. It'll add to the default. So you could actually say the location of your scheduler, your filter. And that'll be added into the available options. And then you go ahead and you update the scheduler default filters list with the defaults that you want to use or change them. And at the end, you would add your own custom filter. So here we have the list of all the driver manager options. There's a lot of them here. And what this is showing you is that every service has a manager. The manager is used when you start a Nova command, start Nova API, or start Nova scheduler. It looks for the scheduler manager, the API manager, et cetera. And you could actually replace this for any component you want. An example is that we have done this internally. We have our own Nova networking component, and so we actually specify our own Nova networking manager. And it's built on top of the existing manager, but it's slightly different, and we keep it outside the tree. Additionally, you could also specify the back ends. So we showed you the example of the compute back ends and the scheduler back ends, but there's bare metal back end, a quota back end, a network back end, database back end, which there's only one option right now, volume driver back ends, et cetera. So to make this a little clearer, this is where the manager and drivers fit into the picture. So the manager has actually run, for example, the compute manager. It's actually running the Nova compute service. And the Nova compute service has some sort of back end. And so you specify the back end, in this case, libvert, zen, API, et cetera, and that's how you specify which hypervisor you want to use. And here's an example. This is some resources you can find further information on. The Folsom docs, I don't think, are officially out yet. So you still have to go to the trunk docs, which are docs. OpenStack.org. The list of blueprints for Folsom is on Launchpad. The source is all on GitHub. And there's release notes on DeWiki. Yeah, I could post it online somewhere. Any other questions? I hope that wasn't too much information, but it's hard to say. The Reap one's actually a pretty nice one. If you get something in a strange state, you can fix it nicely. In many ways, it's actually too many options. And one of the pushes in Grizzly is actually to categorize these options and say, this is for tweaking, this is experimental, this is that kind of thing. This is only for zen, this is only for KVM, this is for RPC, that kind of thing. So hopefully this will get a little clearer about which ones mean what. Sorry, could you repeat that? This one? Oh, so it's not maintained because there's no other code outside the tree. So I only mention it that way because, for example, the KVM code is inside the OpenStack tree. So that's what this means here. So it's maintained by OpenStack, I guess is a better example. And there's currently a lot of progress in Grizzly on how to improve it and make it better. And it's not a virtualization type because it's, so some of the options are very, some of the further features are very different per. For example, I know that Zen has some mentioning ones. They put into the code, I think, for BitTurrent. Like you download images to be a BitTurrent now. The actual backend drivers can be very, very different. I don't necessarily have to support all the features. We use KVM personally, although we haven't tested out. We use KVM partially because we wanted to stick with one of the big open source ones. So that leaves Zen and KVM, and we wanted to stick with we decided to go with KVM. Have you worked with live migration? No. I know it works, and that's another example of that's different per backend. And if you use KVM, it's different than the live migration in Zen, and that's very specific on a lot of the implementation details of your specific cloud. Right. So it goes through this when you, so you're saying the schedule never reschedules? So that's once again an example of that there's a lot of work for improvement on making. A lot of these options are very raw and not very clear on which options work with what other options, and hopefully they'll get better over time. So you don't live migrate to the wrong. A question related to Postgres, if you could go back to the database slide, you were talking about the need for the native driver options, and I was wondering. For sure, which slide? Sorry? Which slide? This database slide, you were talking about the native Python client and how with the MySQL driver, what the performance considerations are for when you're using a non-native Python client. So I was wondering if you could talk just a little bit more about that and tell us what parts of the system there would be scalability issues with. Sure. That's actually my next talk a little bit. What type of scenario is where that would be a problem? Right, so we haven't fully flushed out all the details of this, but we switched out. It's actually very easy to switch it out. You actually open the SQL config option, and you just specify a different, instead of MySQL, you specify, I think, Python MySQL, or a different option instead there, and it's very easy to swap out. What we found actually is we measured the time it took for the compute node, and the Nova Compute and Nova API, and one was faster and one was slower. And part of that is because the actual MySQL version is slower than the binary C, but also you don't have the problem with Eventlet being serialized. So the example is you're able to launch more, you're able to launch instances on the compute node faster now, because you have, you can handle, there's a lot of database calls on the compute node still. More of an escape of progress. Right. Exactly. But it's also still pretty slow, and so if you have a lot of calls, it won't necessarily be faster. It's not that big of a difference. It's a slight performance change, but everything works if you use it native or non-native. Yeah, so can you elaborate a bit more about the aggregate feature, what it does, and what would be a typical use case for it? Sorry, which feature? The aggregate feature. Yeah. So that's an example of that, it's mainly for operators, so you could say the idea is you could assign metadata to compute nodes, and before there's no way of assigning extra information to compute nodes. All the information was coming up from the compute node, such as amount of free memory, that kind of thing, reported to the central database. But it may want to do the opposite and have an operator push out information assigning it to a compute node. So you could say, this compute node should only run this instance type, let's say. So you could actually say, you could add a tag to that compute node saying, it should only be running M1 tannies. And then you could have a filter in the scheduler saying, M1 tannies, you have to look for the metadata in the compute node. That's the aggregates. Aggregates have arbitrary amount of metadata, and metadata is a key value pair. And so you can have arbitrary amount of aggregates assigned to each node. Anything else? There are a few filters available, some sample filters, but I think there's only two or three right now. So if you want to do something that's not supported by the default filters, you have to create your own. But the sample code is in there to do something more advanced. Custom, sorry, what? Timer? Timer, for what? So a lot of the time-outs are very customizable. The example is for building a good time-out on that. I don't know the full list of all the time-outs, but that'll be a little harder to extend if there's no option available. But with 500 options, there's probably the one you want in there. That's the hope. Yes. So the example is the reap option. The periodic tasks are running on the compute nodes, and they occasionally do different things. One of them is they report data to the scheduler saying, how many resources are available? It'll make sure that the database and the compute nodes are synced up. There's a whole bunch of other ones that are available. So you could actually disable a periodic task if you don't want it running or make it run once every two hours. Some of them are different things. Some that you can't disable, some you can disable. If you disable the one that reports the amount of usage that the compute node has, how much free memory and things like that, then your scheduler may not work if that's what you're using for your schedule. But if you're using, let's say, a chance scheduler, we don't really need any information about the compute nodes because you're just sort of randomly firing things. And so the sum depends on your use case. And you actually have to go through and evaluate every single one separately. And does this affect me or not? And that's not necessarily clear how to do that. I believe almost all of them, I don't think bare metal would, for obvious reasons. I don't think UML and LXC do, and I'm not sure if QMU does, and I don't think you'd want to do that. But KVM's N, and I'm not sure about Hyper-V and Power-VM and VMware ESX, but I believe they all do. VMware does, Hyper-V does. Power-VM anybody? So I don't know about Power-VM, but everything else you're going to use in production will, unless it's LXC, but I'm not sure about that one. Great, thank you.