 Are we ready? Okay, let's start. Go ahead and start. Hello everybody, welcome here. It's glad to see so many people interested in the L2 and L3 agent. I'm Roselas Blandido, I work for SUSE, and I've been contributing to Neutron for almost two years now. And I'm Carl Baldwin. I've been working with HP, and I've been contributing to Neutron for about the same amount of time, about two years. And I'm a core viewer on Neutron since about a year ago, I guess, and I also run a meeting and lead a sub-team that works on, and focuses on L3 subjects within Neutron. So that's what I've been having fun doing for the last two years, and it has been a lot of fun. We wanted to talk about today some of the work that we've been busy doing, for the Kilo release, and also spilling over into the Liberty release. We've been heads down working on the agents, looking specifically at both performance, and also the scalability of the code base. We wanted a code base that we could take forward into Liberty, and even further, and more easily understand the code base, dive in, and actually implement some new features in it. And along the way, we added some excellent functional testing, and improved the unit test coverage of both agents as well. We'll start with a simple old typical open stack deployment diagram. We're going to focus today on the network node, and the compute node. And we're going to throw everything else into what we call a cloud controller node. You may have these split up. You may have HA clusters for every service that may be totally split up. We're going to focus on the other two. The network node and the compute node. Everyone knows what the compute node is. That's where the VM runs. That's where the computing happens. The network node, there are typically fewer of these in a deployment, but there can be multiple. And they're the node that get data moved around between networks and to the external world. And on the network node and the compute node are where the L3 agent, the L2 agent run. This is where they do their work. And this is where we're going to be talking about them. Rosella is going to start talking about the L2 agent and tell you how it works. Okay, so the L2 agent is the agent that runs on the compute node. And this is because its main responsibility is to configure the software bridges that are on that node. Usually there are two bridges, BR end and BR tunnel. BR end is the integration bridge. It's the bridge that takes care of tugging and untugging the traffic that it's coming from the VM and going to the VM. To tug it, it uses a local VLAN ID that it's assigned to the network. And the other bridge is BR tunnel. It's the tunneling bridge. And this bridge takes care of translating the tugged traffic. So translates the VLAN ID into the segmentation ID that it's used then for the tunneling. For example, if you're using great tunnels, the segmentation ID is used to specify the tunnel ID. The main task of the L2 agent is to wire new devices. By new devices, we mean the tab interfaces that are created by NOVA and are connected to VMs. We'll see more in detail what wiring means. And the L2 agent is also in charge of applying the security group rules that are firewall rules and they are implemented in Neutron using IP table and IP set. The L2 agent communicates constantly with the Neutron server and it uses RPC to do that. Now let's look at step by step what happens when a VM is created. I tried to highlight the role of the L2 agent in this graph. So when NOVA compute gets the request of creating a VM, it issues a call to the Neutron server allocate network that you see at point one. So the Neutron server will create the Neutron port and will write the proper data into the database. And at this point, the NOVA WIFT driver will issue the plug operation. The WIFT driver, in case you don't know, is the driver that NOVA uses to plug and unplug the virtual interfaces into the integration bridge. So it needs to configure properly. Like if you use a specific Neutron plugin, you need to configure the WIFT driver accordingly. Like if you're using OpenVswitch for Neutron, you need to use the OpenVswitch WIFT driver. So when the WIFT driver plugs the interface, the L2 agent will notice it because the L2 agent scans constantly to see if a new device appeared on the host. And you see that at point three. Since the L2 agent doesn't know anything about the device, it needs to request the details of the device to the Neutron server. And you see that at point four get device details. With this call, the L2 agent will also pass the host ID of the host where it's running. So the Neutron server can now bind the port point five. That means writing in the database the initiation between the port ID and the host ID. Okay, then the last step of the process is the L2 agent that notifies the Neutron server that the device is up with the UpdateDeviceUp message. Let's look now at the workflow of the L2 agent. So the L2 agent has a loop and some specific events trigger some reprocessing. So the loop is entered only when some conditions are met. The three kind of events are OVSDBMonitor has an update. OVSDBMonitor is the tool that the L2 agent uses to know if something has changed on the compute host. Specifically, if a port was added or deleted. So this is one event. Another event that triggers some processing is if the agent gets some message from Neutron, it can get two kind of messages. The messages that are related to some security group change could be rule updated, member updated, and a message that is related to a port update. And the last event is open this switch restarted. So this of course needs some reprocessing by the L2 agent. So let's see now how the L2 agent detects that there was a port change. OVSDBMonitor, before our work, didn't really pass any specific information to the agent. It just passed a flag to indicate if something has changed on the host or not. So the L2 agent needed to keep an internal dictionary registered port of the port that he has already processed. And then when has update for OVSDBMonitor is true, the obvious agent needs to do a diff between the registered port and the current ports on the host to infer which port was added or deleted. For example, if a port is on current port, but not in registered port, it means that the port was added. And the other way around, if it's not in the current port, but it's in the registered port, it means that the port was deleted. So now let's see more in detail what kind of processing is needed when there's a change in the host. So when a port is added, the first thing that the L2 agent does is request the device details, as you were seeing in the graph before. Once it gets the device details, it knows how to wire it. And by wiring, we mean assigning a local VLAN ID to the device if it's the first device of the network that the agent sees. And of course, install the proper flows, like install the flows for tagging and untagging the traffic. Then the other operation that it's performed is setting up the port filters. So install the proper IP tables rules. And then the last step is notify the neutron server that the device is up. For a port deleted, the operation are just the opposite. So filters are removed, the neutron server is notified that the device is down, and the local VLAN ID is claimed if it's the last device of the network that it's on the host. For neutron server messages, so as regard port update message, the processing is the same as for added ports. So updated ports are processed exactly the same way. For security group change, the filters are reapplied for all the devices involved in the change. So the last event is the case that OpenVswitch restarted. So how does the agent knows that OpenVswitch restarted? The agent at startup installs a flow that we call canary flow, and once in every loop it checks if the flow is still there. If it's there, OpenVswitch is fine. If it's not there, it means that it was restarted. And this, of course, requires to reconfigure all the virtual bridges and also to reprocess all the ports on the host. We need to reinstall all the flows again. So if an exception is drawn during the agent loop, the behavior of the agent before our work is simply to have a full rethink because the agent doesn't know which device has problems, so it needs to rethink everything. So registered port is cleared and all the ports are reprocessed again. No, Carl. Okay, thank you. While the L3 agent shares a lot of the same architectural aspects as the L2 agent, it's completely different. Where the L2 agent, it gets you attached to your networks. The L3 agent, it gets your routers attached to each other. It gets things moving from network to network and from network to the external world. Same deployment diagram here. I wanted to go through kind of the evolution of the neutron router. Before Juneau, there was only one kind of router in this implementation. The router, there was one router. It was hosted on one network host. And there is some bottleneck there. So if you... And there is a single point of failure, too. So to address both of those issues, in Juneau, we added HA and DVR modes. Now, this has been six months past, but it's still kind of new. The HA mode takes that one router, that one legacy router, and it replicates it across more than one, three, four different network nodes, possibly all of the network nodes that you run in your deployment. It uses the virtual router redundancy protocol to monitor the routers and name which one is the active one, and if it fails, it moves it. DVR took a little bit different take on addressing both the single point of failure problem and the bottleneck problem. What DVR does is it takes the router, it breaks it into pieces, and it pushes it out to the compute nodes so that as soon as a VM produces traffic that needs to be routed to a different network or to the external world, that traffic's routed right on the same compute host before it even leaves the NIC. But DVR did leave a centralized part. The centralized part takes care of giving access to the internet to instances that don't have a floating IP, that share that one address to the internet with other instances behind the router. There's also things like VPN as a service which are hosted in the router and also firewall as a service, which it wasn't either possible or it wasn't obvious how to distribute that out to the compute hosts. For the centralized part, there's an API for administrators to manage where these routers live so you can move them around. You can do maintenance on a node and evacuate a router. Now I wanted to dive a little more into the L3 agent. The function of the agent is really pretty simple. It listens for notifications. And the neutron server, when anything changes in the router, it sends a notification to the L3 agent. The notifications, they can be missed. They can go out of order. The L3 agent may not be there. But the notification kicks off the process. It tells the L3 agent, hey, there's some work to be done immediately. The agent takes that and it sticks it into a processing queue because it may actually have more work than it can get done at one time. So it sticks it in a processing queue. Normally it pops right up to the top of the processing queue and it's pulled off. The agent has the capacity to work on a few routers, four or eight, depending on how you configure that. It can work on a few routers at a time. But if it's working on every router at the same time, sometimes things actually get a little bit slower. So there's one more thing about the router processing queue. There are two types of events that get put into the router processing queue. There's the event that comes from something changing in the router. This ultimately is sourced from a user action. They do something or a script does something to change how the router is configured. And the other type is just like in the L2 agent. When something goes wrong or when the L3 agent restarts, it goes through a full sync. It's a little bit of an inefficient operation to just throw your hands up and say, well, everything could have changed so let's process everything. But that's what it does. And on a restart, that's kind of what it has to do because it may have missed any number of events while it was down. That's the other type of event. The kind of event that comes from user actions, those are prioritized over the more maintenance full sync type of events so that the L3 agent stays responsive to changes that come from users. The last thing it does, it also sends status updates so that when you plug ports or add floating IPs to the router, you can see the status of that. Now, diving even deeper, router internals, they can get a little complex. We use network namespaces in Linux, which if you want kind of an analogy, a network namespace is kind of like a VM or a container for a network device. It's not like a VM, it doesn't have file system virtualization or anything like that. It's really just virtualizing the network stack part of a VM. And the L3 agent relies on L2. Just like Nova relies on the L2 agent to plug the ports and do all the L2 wiring, the L3 agent relies on the L2 agent for that. But then once those ports exist, whether it's an OVS port or a VETH pair, which is basically like a wire with two ends that you can plug in that exists in the kernel, those can be moved around into a namespace, just like taking a wire and plugging it into your device. And then the L3 agent, it configures the IP addresses on the interfaces, it configures the routing, whether that's the simple network-to-network routing or if there are extra routes that have been configured on the router. It configures all that. It uses IP tables to implement the floating IP functionality. So you allocate a floating IP, you associate it with an instance. Neutron figures out which router that instance needs to go through to get to the external network. And it uses NAT to implement that floating IP on the router. There's metadata access. There's the shared access for instances without a floating IP. And then some of the advanced services also get into the router. And they're integrated right in there. Firewall is a surface and VPN is a surface, most notably. I also wanted to go into, starting in Juno, we moved the L3 agent out to the compute host. And I wanted to kind of show you how that works, how it takes out the network node from the data flow. This is for DVR only. So it only makes sense to bind a DVR router to an L3 agent running on a compute host. But once it's there, it can host floating IPs for instances that reside on that compute host. And it can also do what we call east-west routing, which is routing from neutron network to neutron network through a router. I'll take questions later. So these two boxes, they're compute hosts. And we have a VM on what we call the red network and another VM on the green network. So let's say VM1 wants to talk to VM2 on the green network. It starts pumping out packets with VM2's address on it. Now, there's a little bit of L2 agent integration in here at BRN, where it says, okay, I know this packet needs to be routed. And I'm going to send it straight into the router that's running on the same compute host to be routed. There it goes through the routing tables, comes out. And then it comes out as if it... and travels from compute host to the other compute host, just as if the two VMs were on the same network. And it comes in, it goes straight to the VM. Now, the replies originate from VM2. They come out, the same logic in BRN sends them straight into the router on the compute host on the right. It gets routed and sent back the other way. So the return path is the same, but it's on a different host. And that's good to note. Now, we wanted to take some time to talk about how the work we've done during Kilo for restructuring. And Roselle is going to start by talking about the L2 agent restructuring. Thank you. So, the restructuring work for the L2 agent has three main areas. So I have to say that this work didn't make it into Kilo. So everything I'm saying now, it's like a tech preview for liberty. So the three areas are, get more information from OBSDB Monitor, improve the RPC calls that the L2 agent is using, and improve the RISC behavior. So as I was explaining before, OBSDB Monitor was only notified that an update has occurred on the host, but it didn't really specify what happened, which is not good since OBSDB Monitor has that information. So what we did was to create a new method for OBSDB Monitor, get events so that the L2 agent can get the events directly from OBSDB Monitor instead of scanning all the ports all the time. So it gets events like port x was added, port y was deleted, so it's much better. The L2 agent uses a few RPC calls, and we also wanted to improve them. For example, the update device up and update device down was only accepting one device. So if the agent had to send those messages for a list of devices, it had to send several calls, which is expensive. So we added a new RPC call that takes a list of devices. For get device details, update device up and down, we also added a new parameter, fail devices, and this is because the plugin before, when it was getting an exception, for example, when getting the device details, it was throwing this exception and this was causing a full rethink in the L2 agent. So to avoid that, now the behavior is different. If an exception is thrown, the plugin catch it and put the device that caused the exception in the list of fail devices. We also improved the security group's provider updated call. This call is sent when the provider rule needs to be updated. The provider rule is the rule that allows the traffic from the DHCP server. So when the DHCP server IP changes, for example, this message is sent. And before this was causing a full refresh of the firewall, which means all the filters of all the port were refreshed. And now we added a parameter to provide the list of the devices that need to be refreshed. That it's the list of the devices that belong to the network whose DHCP server changed IP. And so now we pass this list to the agent and the agent refreshes only those devices. And the last one is the port update message. We added the attribute that were modified during the update so that the agent can choose how to process the port. Not all the changes requires a full processing. For example, if the IP of the port is changed, then only the filter needs to be refreshed. So no full processing needed. Now the agent can make use of that. We also wanted to improve the rescink because it's quite expensive rescinking all the ports. So as I was saying before, we added this parameter of failed devices in the RPC calls, which are one of the main calls of an exception. So the agent now, instead of issuing a full rescink, takes the list of failed devices and it tries to rescind only those. And it can also simply ignore the failure in some cases if it's not important. So I've been talking a lot. I just wanted to provide you some real data to show that this really improved the situation. So I just ran a small test on a VM that was running desktop. I created a rally scenario, a very simple one that boots 20 VMs with a concurrency rate of two. So these are the results before our work and these are the results after our work. So I did the math for you. It really worked. The minimal time for spawning a VM is 0.6% better. The average time is 4% better and the 95th percent is 5.9% better. And so this test was run on a single host and the load was very low, like only 20 VMs because I was using a small VM. I'm pretty confident that in a multi-host scenario with higher load, these results are going to be much better. There's still a lot of work to do, so if any of you is interested, please ping me. I'd be very happy to work with somebody to improve all these items. So the first one would be to improve a VSDB monitor. So instead of using the CLI, to use the VS Python library and then to create a queue of events that the L2 agent can process so that we can use multiple workers. Also assign a priority to those events so that higher priority events can be processed earlier. And in general, we need to improve the state convergence and also the resilience in case of failure. And now Carr will talk about the restructuring of the L3 agent. Thank you. The restructuring here was less targeted at performance and more targeted at giving us a code base that we can move forward with. When I looked at the L3 agent thinking about some of the things I wanted to do with it, I saw it as a handyman. A handyman that does lots of things. This is after HA and DVR were added in Juneau. It wasn't a very good handyman. It was kind of forgetful. It would, all throughout the code, there were little cues about, if we're working on a DVR router, then do this. If we're working on a DVR router and we're on a compute host, then do this. Or if we're not doing an HA router, then do this. It got hard to read. It got long and it got hard to maintain, too. It was a jack of all trades that didn't really get good at any one of them. So I wanted to move to a different model that I call the contractor model. Now, this is the guy, the contractor. There's one for the network node, the network node contractor, and there's one for the compute host. And what they do is they pull out their plans. They say, okay, I'm on a network node and I need to build an HA router. I know who can do that. I'm going to pull in the HA router specialist, load him up and he's going to do his job. So that was the model I went for. And I broke up the code so that we could compartmentalize and we have the agent, the L3 agent, which previous to Kilo was a very long file that did everything, 20 to 2,500 lines of Python code. We broke that down. We shrunk that down. We said, okay, the L3 agent's just going to have the L3 agent in it. It's the processing queue. It gets notifications. It's the contractor. It loads the right class and it runs it. And then it sends status updates. Very simple, very well-defined job. For the routers, actually there was one place I wanted to go before that. There was also in the L3 agent, there was VPN as a service and firewall as a service. One of them inherited from the L3 agent and one of them was a base class to the L3 agent. Don't ask me which one was which. I can never remember. But we started by pulling those out and creating a signal mechanism where the router could signal the VPN as a service agent or the firewall agent to do its job. But we took it out of the code so that there was some more generic interface there. And then with the router, what we did was we used inheritance. It's not a new technique. We created three kinds of routers. A legacy router, an HA router, and a DVR router. And there was some common code so we created a common base class. And this worked very well. This took all of those if-then-else statements, if DVR this, if compute-host this, that. It took that all out. And it made the code a lot more straightforward using polymorphism to get the job done. And I think of it as loading specialists that know how to do one job well and they do it well. There's also a few more that I... We didn't get to this in Kilo, but we also want to create what we call an HA DVR router. This is... It seems like an oxymoron, but it's not. The DVR router has that central component, and that is still a single point of failure for the use case that it solves. And combining that with the HA router functionality that we have could give us a fully redundant solution. And there's also still a little bit of code. There's DVR for the central part and there's DVR for the distributed part. And they actually don't mix that much, but currently they're in the same class. I think breaking those out would make it even more simple. And why did we do all this? Well, we looked at the L3 agent, and we looked at doing this stuff. We wanted L3 VPN. We still want that. We wanted to eliminate some of the wasted IPv4 addresses in cases where we can do that. You may know that a neutron router, as soon as you create it, it consumes an IPv4 address. That may be okay for you, but for a lot of people that's difficult to cope with. And we may want to look into doing DVR for IPv6. And even some things I didn't put on this slide, like we want to look at connecting neutron to BGP routed networks instead of just statically routed networks. And beyond that, possibly even other routing protocols. We also want to improve the routing situation with IPv6. And I'll be giving a talk here tomorrow about L3 and IPv6 with Sean Collins. Looking at doing that, it was very difficult to think about getting those done with the code base that we had. And splitting it out into smaller, simpler parts with a topology that we could understand made that look feasible. And in Liberty, we're going to get to work adding some of those features. They were also, just like in the L2 agent, we also wanted to eliminate the full sync. It's kind of a wasted operation. And we don't always have to sync everything. And I think that's it. Did you have anything to add? I guess we can take questions. Yeah, thank you. We have... I think we've got about five minutes for questions. Right here. L2 agent is running on the computer. L2 agent is kind of scanning. And then you have thousands of nodes which are overloaded. Yeah, so... Running on the computer... Yeah, so the question is, when you said the L2 agent is constantly scanning, you mean the L2 agent on the computer host? Yes, I have to say, so when I say constantly scanning, it's not that it's all the time scanning. It just starts the scanning when it gets the... has updates from OBS DB monitor. So when the L2 agent knows that something has changed on the host, it scans all the ports that are currently on the host. And this is done on the computer host. Okay. Yes, but I couldn't tell you where... off the top of my head. Yeah, there's a documentation on setting up the message queue. I think there's something in the dev ref for Neutron that was added recently. I mean, if you mean what kind of messages are exchanged, how to create a new one and raise the version, these kind of stuff, it's in the dev ref. Yeah, let's see. In Kilo, we can get metadata service high availability using an HA router. So with the VRRP, it will actually, as the active router switches between the multiple copies of the router, it will actually start and maintain that metadata service in that router. So the question is about port binding failure. So which rate of port creation would cause a failure? No? Okay, so the question is which log to look at in case of some bug related to the L2 agent. Okay, so the L2 agent also has a log and you can just check that. And usually when there's a failure you should see a warning or this kind. Yeah, it's in Varlog Neutron, OVS Neutron agent, it's in that folder anyway. Now it's in Varlog Neutron, there are several logs and one of them is for the OVS agent. I don't remember the exact name, but it's there. Yes? So the question is if we want to get our hands dirty playing around with this code, can we use a kilo environment or do we need something more recent? Kilo is pretty recent. You can start with kilo and get all of the L3 restructuring improvements that we've made. For L2 some of them did not merge for kilo. You may be looking for patches to pull down. Okay, so the question we stated is are there certain... he wants to use master for Neutron? Are there special versions of NOVA and the other services? No, I don't know of any requirement to use any special version. Yeah. Okay, the question is referring to the distributed routing slide and the routing between hosts. If the VMs are on the same host, it's routed on the same host and it never leaves the host. So the question is how is VRRP achieved? Does it use KeepLiveD? Yes, it does. It uses KeepLiveD. Are the IP addresses migrated inside of the router namespace? Yes, they are. And I think we're out of time, so if you have more questions, I'll be up here for a little while. Okay, thank you everyone.