 Okay. So I guess we can start. My name is Ricardo. I work for the OpenStack team at CERN. Today I'll be describing a bit the work we've done to integrate Neutron at CERN and how we migrated from Nova Network. We are a big team, so I'm just representing a lot of people here. A bit about what CERN is. The team gave a good description today in the keynote. CERN is the European Organization for Nuclear Research. It was founded in 1954. It has 20 member states, 22 member states, and many other collaborate with the different experiments running at CERN and the services that we provide. Our mission is to provide fundamental research in the area of high-energy physics mostly. Description of what we do. So right now the main machine we have is called the Large Hadron Collider. It's this one here on the left on the bottom. It's a particle accelerator that is in a tunnel, a circular accelerator in a tunnel 100 meters underneath. It's located between the Jura Mountains and Lake Geneva, close to the French Northern Alps, and it spans two countries. Part of it is in Switzerland, part of it is in France. Then we accelerate protons inside the accelerator with two beams of protons, and from time to time we make them collide in specific places so that we have protocol collisions happening close to these detectors. The detectors look a bit like this, but this is a very old picture. Now it's a cavern of 60 meters high and it's full. This was a long time ago during the construction, and you can see the scale of it by the small man on the bottom. So this is where the collisions happen. With these collisions what we try is to detect new particles, new physics, and for this we track energies. So the detector is made of several layers of hardware and electronics, and from this we manage to get a lot of data out. Inside the detector there's something like a petabyte per second being generated. Of course we cannot store any of all of this, and we very quickly take out most of it out and we only store what we think will be relevant later. We still get something like two and a half gigabytes a second going to our data center, so at CERN we are currently running two data centers, one in Geneva and one in close to Budapest in Wigner in Hungary. So it's an extension of our main data center, and we have very fast network links between the two, so it kind of feels like the same, although you can feel a bit the latency is not that far, but still. We currently have around 190,000 cores, 17,000 physical boxes, around 170 petabytes of raw disk. This is not all used for storage, it's also a lot of the disk running in compute nodes, and around 200 petabytes on tape. The reason for this is that all the data we collect is stored on tape for archival and the later analysis needed. So OpenStack at CERN, it's been in production since July 2013, again 190,000 cores today. In total we've had around 4 million VMs created since the start. At the rate of 300 VMs being created and deleted every hour, a lot of people have workloads where they constantly trigger new VMs. And then this is one dashboard we have where you can see the usage of the project, so almost 3,000 different projects in our cloud. Two and a half thousand users, 22,000 VMs running, and then we use Nova cells, and I will talk a lot about this part, and 7,000 a bit more hypervisors, almost 3,000 volumes, storing around one petabyte of data in Cinder. We use CIF as a backend, and recently we added Magnum as another service in OpenStack. We have around 20 clusters deployed, and it's still in pilot, but almost moving to production, and we'll be adding Manila also soon. So briefly, I won't go into the details of the wall architecture as we'll focus on Neutron, but there's some things that we do to scale that are important. So we run dedicated controllers per service. In many cases we have to split the rabbit MQ instances, so we don't have a big rabbit instance cluster or non-cluster. We have, in some cases, separate ones for each of the services. We had Nova Network, which was the option at the start, and there were a lot of patches being done to integrate with certain networking. It's a heterogeneous environment in the sense that we have two different hypervisors, so most of the workloads go to KVM, but we also provide Hyper-V, and the Windows boxes mostly go to Hyper-V, although now we started deploying them also on top of KVM. For Cinder we have two different CIF instances, one in Geneva and one in Wigner, mostly because of latency. Keystone is fully integrated with the rest of the CIRN account and lifecycle of projects and users, so when a new user comes to CIRN, he gets an account also in OpenStack. The same is true when they leave. Their OpenStack resources are removed. All the infrastructure is deployed using Puppet, mostly using the upstream Puppet OpenStack modules with a few changes locally. The main thing is that we use Nova Cells to be able to scale. As we discovered, not only to scale, but also to be more flexible, to be able to test new features without affecting the whole infrastructure. That's where I will start. Describing a bit our Nova Cells architecture. In Nova Cells, you have a top cell which receives all the API requests and then you partition your infrastructure into child cells. This is good because it puts less load on the scheduler and all the Nova services, but it is also good because, for example, if we would tomorrow, we would like to introduce GPUs. We could dedicate one area and just enable GPUs. Also, as I will describe today, when we wanted to add Neutron, we could deploy Neutron as a service on the world cloud, but only enabled on one of the cells, which meant we could scale gradually and learn a bit about the service with production workloads before just opening it globally. In the top cell, we run the Nova Cells which communicate with the child ones and the Nova API. Then each individual cells runs the Nova controller. There you have the conductor, scheduler. From the start, we had Nova Network and again, Nova Cells to communicate with the parent top cell. For each individual cells, there's a large set of hypervisors that are served there. There we run Nova Compute. This is basically how we structure our infrastructures to be able to scale horizontally. We started with a couple of cells. With time, we've been making them smaller, more manageable and learning a bit about them. We ended up having more and more. Right now, we have 44. Every time we get a bunch of new hardware, we put it in a new cell. In some cases, we also replace all the hardware being retired. This is networking. This is really relevant because it's why most of this talk will be about this. At CERN, every device, every connected device, has an entry in a network database which is managed by another group. We provide IPv4 and IPv6 connectivity. Most of the devices have public IP addresses. This is something that is specific because we have enough. Then one very important bit is that we do isolation via IP services. To protect the infrastructure from misbehaving bits of the network, we decided to segment the network in what we call clusters or IP services. In summary, these are just different broadcast domains. Then how did we do the integration with OpenStack? This is not the traditional OpenStack network deployment. IP services are broadcast domains. What we needed is a new notion in Neutron, which would provide the segmentation of the Layer 2. If we look at it, the primary IP services are where all the hypervisors will get an IP, but to describe the problem, if we would schedule a VM and the VM will also get an IP and be registered in the network database, but that IP would be in a different service than the hypervisor, then it would never get connectivity. We have to make sure that when we schedule a VM to the specific hypervisor, it ends up in the same IP service so that it can have connectivity to the outside and the inside. That means that we broke the IP services into primary ones where hypervisors get IPs and secondary ones where the VMs get IPs. Each secondary has to belong to one single primary. It also means that if we need to extend the infrastructure, we can easily do this by just adding more secondary IPs. These are just subnets to an existing primary IP service. The mapping, initially, we had a mapping which was not tied to the cell. With time, we learned that this simplifies a lot the management, so what we've been doing is when we create a new cell, we dedicate primary IP service to it. That means that also the number of nodes in that cell will be limited, but that's also good for manageability. Nova Network was there from the start. Again, in most deployments, I guess, Neutron won't be configured like this. For us, we have a flat network, so we don't do VLANs, VXLANs, or jerry tunnels. We just do flat. We don't have tenant networks, and we do provide a network. All the infrastructure is what we build in, and we don't do overlays on top. Some patches that we had to do. Obviously, we have this network database where all the devices have to be integrated. That means we have to hook something into Neutron. From the start with Nova Network, what we did was that every time we would get a new IP service assigned to OpenStack, we would just register all the entries of that subnet range in the network database with fake names. Then when we actually scheduled VM with that IP, we would just rename by querying Nova and filling up the actual information there. Now, this all worked, and we were kind of happy with it, although we could do something fancier, but it worked. But Nova Network was being phased out, so we started looking into Neutron, how we could migrate, and also thinking that we maybe can provide new services by deploying Neutron. The initial goal was just to start with the exact setup we had for Nova Network. So again, flat networks and provider networks. This means no virtual routers, no floating IPs, no firewalls, no load balancers. So very basic OpenStack network setup. But we wanted to make it better in the sense that instead of pre-registering all the subnet range in the network database, we would just register a new device as it was created. So when a VM gets created, we add a new device to the network database. By moving to Neutron, we would also like to simplify upgrades. A lot of the code that was specific to CERN was patches in Nova Network, and it was making it hard to upgrade from one version to the other. It's doable, but it's kind of harder than it should be. So a timeline of how we did this. We did a lot of different trials in the implementation of how this could be done in Neutron. We started doing some small test beds around the end of last year, October last year, we deployed Neutron. We added the first Neutron cell immediately after. So the Neutron runs not partitioned like Nova with cells or something similar, but as a normal OpenStack service. So there's only one single entry point for Neutron. Then we deployed a cell. We configured Nova to use Neutron, that one, while keeping Nova Network in all the others. So it kind of mixed environment, but it all works. Then this was still requiring a couple of small patches in Neutron, and I'll give details later. But around the same time in the Tokyo Summit, there was a very good talk about moving these plugin codes out of tree from Neutron. And this is what we did very a very short time after. And around May, we had the first production cell. And since then, we got very confident with the service. We enabled it by default for all the new cells from June this year. And now we've started migrating the old cells to use Neutron too. So I'll give some details on the Neutron plugin implementation. So this is an overview of how the Neutron codes and how you can plug things in, how the components work. It's not the Neutron deployment, which is a bit more complicated diagram. But this one is Neutron has a set of core APIs, and all the rest are extensions, even things that a lot of people use are still extensions. So it means that it's actually quite easy to add new things to Neutron. And then there's plugins, which implement these APIs. There's the core plugins, and then there's individual plugins for each of the extensions. And then there are the agents that run in the boxes that use the code from a specific plugin. You can configure which one you want to use. And you can even use more than one. So taking this and moving to CERN, what we did is that we realized we needed two things. The first one was our own plugin to do the interaction with the network database. So when a product is created, we need to register it, things like this. But we also need extensions because the functionality provided by Neutron at the time doesn't provide this segmentation of DL2 networks. So we had to kind of create this notion ourselves. So that's what we did here in this. So the first part was by writing a custom driver to an existing plugin. So in DL2 plugin, you can actually hook your own driver for some of the functionality. Then IP services and clusters, we added an extension called the subnet cluster extension. We call it like this at the time. IP restrictions is when you create a VM, which IP should I get? There are some restrictions we have to apply so that the VM ends up in a hypervisor that provides connectivity to it. This is the one we added here. And then for metadata, and I'll talk a bit more about this, we use this magic IP redirection in Nova. I'll give details. The first deployment required patches, then we convert it to the autotree plugin. I really recommend that you watch this talk. It's very well done, and it's very entertaining. So this was writing a plugin for Neutron, and he called it human defined networking where every time a packet was sent, there would be an email to a person that would reply to the email and things like this. So it's very entertaining, but it's very clear how we could do it. And then you can check the code for our plugin. So if you want to do something similar, it can be a good start to go from. So, giving details. So this driver, the third mechanism is driver. It's this driver we hooked into the ML2 plugin. So there's this mechanism manager, and what we did, we just put our own driver there. What does it do? It does an action on create port post commit. So once a Neutron port is created, we in addition add an entry to the network database, and the same is true for deletion. When a Neutron port is deleted, we do the integration with the network database by deleting it there. And then the extensions done exactly the same way. So what's the subnet cluster? So again, the subnet cluster is taking your existing network in Neutron, which in our case it's only one because we use provider networks, and breaking it into pieces. Not subnets, but a level in intermediate level. We call them clusters. In the meantime, Neutron has been working in routed networks, so this is very, very similar. And I put a link here to the spec, and there's some code already. I don't know exactly the state of the implementation, but the notion is very, very similar. And this example probably makes this notion of a primary, secondary IP service a bit simpler. What we do is, so if you would do a Neutron net list, you would see Neutron network. If you do a Neutron cluster list, you see all the clusters we have. And these VM pools are literally pools of VMs that we can assign IPs to, and these are the clusters. So the primary IP services, all the hypervisors will have an IP that belongs to this cluster. And then for the assignment of the VM IPs, we have actual subnets. So these are Neutron subnets with an ID, which is these three dots. I removed it just for simplicity. With a subnet range, you can add whatever. You can even have different sizes in there. And then when you request an IP, what Neutron will do is look at hypervisors, see which cluster it belongs to, and just assign an IP on one of the chosen subnets. Now, this gives me the host, and I will give you an IP or a subnet, a part that is also missing. So we added another extension, which is the host restrictions. And this is an API call again to Neutron. So you can pass a host, and I give an example here, a host name of any hypervisor in the system, and it will give you the list of subnets you can use. In this case, they are similar, but the idea here is that you get one field with all the subnets you can use, and then you get additional fields that can be useful if we want to optimize the scheduling. So we can have the list available subnet, the most available subnet to have things balanced. We also have a couple of monitoring tools that give us the usage of the subnets, which is also useful. So in all this implementation of these two extensions can probably be moved to routed networks once Neutron has them. And then even the piece is missing, which is these host restrictions, we could probably implement as an IPM driver in Neutron. We didn't investigate. So we invested some code in these extra extensions. We'll be looking at this next. Now, the last bit is the instance metadata. So we had an issue with this, which is if we want to move to the Neutron metadata instead, there's a dependency on L3 and DHCP. So in this case, we couldn't use it. So the solution was just to rely on this magic IP Nova and just add the natural to forward to the Nova metadata host. And this works too. So it's a simple configuration of hypervisors, and we couldn't rely still on the Nova metadata server. Now, I'll finalize with the part last bit, which is what we are currently doing, which is migrating out of Nova network. So officially, there's no way to do this. But there are people that have been doing this before. eBay and Nectar are two examples. We took their examples as a basis. And the code for Nectar is actually available. It's a set of Python code to do this. And what we've done is test a lot. Because our problem is that the accelerator is running right now. So if we would affect a large fraction of the infrastructure while it's running, people are very, very unhappy. So it's not trivial to do this kind of migration. So we have the ability to do this per cell, which limits the damage a lot. But still, we wanted to do a good test. So internally, we have a mock production environment, which is not actually a deployment. But we mock all our infrastructure in a local Kubernetes environment. It's Kubernetes just because it's easy to orchestrate. But the main thing is that we run Docker containers to simulate all infrastructure in a laptop or in our CI builds or something like this. And it includes like Puppet Masters, network database, a mock of our secret storage and things like this. And we can try different setups for the migration. And we've been doing this for a while now, and we built confidence in it. The procedure is as described here. So it's series of steps. So for very quickly per cell, what we do is the old IP service that was in our network, we added it to Neutron and to a specific cluster in Neutron. Then we disabled the Nova Conductor, making it read-only. We reconfigured Nova in the controller of that cell to use Neutron. And then for each IPervisor, we just reconfigured Nova Compute to use the fake driver. So this kind of no-op driver, which is useful for this kind of situation. Then we just deployed the Linux bridge agent, which will look at what it should be configuring locally. And then for each VM in the hypervisor, we create the Neutron port with exactly the same information. Then we attach the port to the VM. Again, this will result in no-op in the hypervisor, because we have the fake driver, so it's safe to do this. We bring the tap interface down, do the rename to match the Neutron settings, and then update the libvert. We use KVM, so we update the libvert to match the new interface. And this is the procedure we've been trying. Depending on the cell, some of the cells have very consistent workloads, like the compute cells, batch cells. They are very similar, the VMs. In other cases, it's kind of different, and it evolved with history, so there's a lot of current cases to try out, and this is iterative work. The goal is to have minimal impact in VMs. For now, the only impact we see with this procedure is that the NOVA API for that specific cell is not available during this procedure, which is not very bad, because the VMs keep running, we just cannot schedule new ones there. The impact on the VMs is very minimal, so all we see is a couple of seconds of lost connectivity while we rename the tap interface, and that's pretty much it. So the status of Neutron at CERN. Right now, it's the default. All the new hardware we're getting, and we've been getting quite a lot, is configured using Neutron for those cells. Still coexisting with NOVA Network, and it will probably stay like this for a couple more months while we migrate cell per cell. There's a couple of cells that have more critical services running there that we have to be careful. For Neutron, we are running Mitaka, and for NOVA, we are running Liberty, upgrading soon. This is one thing we don't upgrade all the services at the same time. We try to keep the separation between the two versions, between Neutron and NOVA, not more than one, just to be safe. Then we have thousands of nodes now deployed in Neutron cells. So this summer, we had a set of, I don't know, I think 1800 new compute nodes coming, and all of them went to Neutron. So we saw the load starting to go high, but it's behaving quite well until now, apart a couple of issues I mentioned. And then we deployed Neutron LBAS. So this is the first new service we can have thanks to Neutron. We are using the Octavia driver. We've been testing it internally and about to expose to the users, and we are looking forward to adding more like this. So summary of the main issues we have. So the first one is that internally the DHCP at CERN has a glitch, which is when you create a new device in the network database, it gives you, it gives it an IP which is not exactly its final IP. This is mostly to support pixie boots. This is a problem for us because then if the VM is happy with that IP, then it will never get connectivity. We looked at providing a DHCP alternative using the Neutron DHCP agent, but there were two issues with this agent. One is that it requires a port on the network, which is not very easy to do when you have provider networks. The second one is that it runs by default with a controller, but we have this broadcast domain isolation, so we would never get the DHCP packets getting there. We would have to deploy any way DHCP agents, at least one instance per broadcast domain. So we ended up writing our own based on a library called go line DHCP. The code is there. It works, but if we could replace it with the Neutron DHCP agent, it would be good. I know there is some work going on upstream also to remove this requirement, and we might be able to do it. Again, because of this issue with the isolated IP services and isolated broadcast domains, we deploy actually one per IPervisor. It doesn't hurt. We configure the IP rules to play nice and not just broadcast DHCP everywhere. It seems to work. Then one of the main issues we've been having as we scale up the service is RabbitMQ in Stabilities. This is not specific to Neutron, but we've seen it more often than with other services. So if we run clusters with network partitions, we go into a split brain. There's ways to try to avoid it, but we still get some inconsistencies running with those modes. Then in known cluster mode from time to time, we get memory leaks and the Rabbit server crashes. This happens mostly when we have a massive restart of the Neutron Linux bridge agents. If we do a small reconfiguration in any Neutron file, the puppet modules will trigger a restart of this agent. This is very easy to fix by just disabling the restart, but then you have to manage the restarts manually when you need them. So we actually downgrade it all the way down to 335 with RabbitMQ and it still happens. There was a very good session earlier today about RabbitMQ tuning and a lot of hints that we might try. The other issue is the agent being collocated with Nova Compute. This means that at least in the hypervisor, we actually have to upgrade both at the same time. Sometimes upgrading Nova is a bit more tricky because we use cells and it's V1, so there's a lot of work to do for an upgrade. With Neutron, we could do it faster, but this means we would have to isolate it inside the hypervisor. So we might just move the Neutron agent to a Docker container and run it along the rest of the Nova services like this in the hypervisor. And looking much further, partitioning Neutron, so we don't have scaling issues right now, but we do see more loads in things like RabbitMQ, so it worked well for cells to do some kind of partitioning, maybe something with Neutron, but it's not clear this will be needed, so I put it as a question mark. To finalize, the things we are doing right now is enabling security groups, so we didn't have security groups when we had Nova Network, not because of Nova Network, but because of cells, so this was not supported in cells, so we couldn't provide it. Now by migrating all the networking functionality to Neutron, it's a centralized endpoint and not affected by cells, so we can actually enable security groups easily with the new infrastructure. And then I mentioned that we are deploying Neutron LBAS, we don't have virtual IPs because we don't do overlay networks or anything fancy, so that means that we can provide an easy way for people to deploy their LBAS instance with an HA proxy, but if we want HA, we need two and we don't have this virtual IP, so what we will do is try to integrate the Neutron LBAS with the CERN DNS load balancing, which is there since quite a while. This means that we can easily have two HA proxy instances and automatically register into a single DNS entry and do DNS round robin or something similar, so this would provide us kind of similar HA functionality. Future work, so in theory we have, we only have provider networks. There's a few groups at CERN asking to have more isolation and asking for private networks. In theory we can do this in Neutron right now, so this is what we'll be looking at during next year. Things like floating IPs would then be very useful because the machines would have their private subnets and it would be nice to then expose them to the outside world and we would need floating IPs for that. Then move to routed networks, which is this new spec Neutron, which should provide everything we require and then have some kind of software defined networking. Again, we have a big machine running. A lot of people involved in operating the machine and analyzing the data and using all the resources we are providing for this, so kind of massive changes in something as critical as networking is complicated, so this has to be introduced slowly. Hopefully there will be plenty of other work coming. That's it, so if you have any questions or comments, I'm happy to answer now or you can catch me later. Thank you. Any questions? I think there's a microphone here. Sorry. Have you guys been following the resource cluster mapping to the hypervisors? And that's sort of one of the use cases of generic resource providers. Yeah, sorry. Give it a go. Try again. Now it's on. The question was how much they've been following generic resource providers. One of the use cases is routed networks with Neutron. But I was just wondering because of the cluster mapping stuff that you guys have that will sort of be tied in with how the placement service and resource providers works and then eventually if you were going to be moving to that, I mean it's all, none of it's required right now in Nova, but we're starting on it in Newton. And then like the cells V1 limitations with security groups and floating IPs and all that, you're going to get the security groups going to Neutron, but like cells V2 again. Not complete, but by the time you guys get all feature parity up, we'll have cells V2 and that'd be awesome. So this bits in Neutron is just that we, this work started like more than a year ago. At the time there was nothing like this routed networks. So we did some code there and it works and it's really good that we can replace it eventually. It shouldn't be hard to migrate actually. And then the Nova resource provider, I guess it's something like the host restrictions we have where we can do the mapping based on routed networks. You map your allocation pools to your aggregates. Right. So that would be awesome. Yeah. Well, no, cool. Yeah, we can provide it. Yeah, definitely. Definitely. Any other question? Cool. Well, thank you then. Cool. Thank you.