 here. All right everybody thank you for coming to our talk under the trench coat neutron agent extensions. My name is Nate Johnston I work for Comcast. This is my colleague Margaret Francis who also works for Comcast and David Shaughnessy who works for Intel. A little bit about who we are you can read this later when you download the slides but we were both or all three of us are neutron contributors. Margaret and I have contributed to neutron firewalls of service as well. So I'd like to start by talking about the history of neutron agent extensions. Neutron agent extensions have their origins in the quality of service effort in the keel and Liberty Cycles. It took a lot of engineering to do took a while. They are implemented in a separate feature branch the QoS or quality of service feature branch in Neutron by these folks who I want to give credit to. Mosha Levy, Irini Berezovsky, Miguel Angel Ajo Playa and Ihera Hrashishka. The problems that agent extensions were intended to solve were whenever a new feature needed to be implemented in Neutron they required some agent side implementation code. That code was getting just you know put into the Neutron agent in general which led to some bloat in the agent and that was not a sustainable path over the long term. Also advanced services or external projects like advanced services were unable to extend agent functionality without overriding the agent. By that I mean taking the agent the main agent class subclassing it and with you know a subclass that implemented some additional service specific logic and then have that be the main execution point for the agent. We saw this especially in the L3 agent but there was some instance of it in the L2 agent as well. And then even with that inheritance base extending of the agent when you had more than one external service it became extremely difficult to have those work at the same time. For example I think at some point somebody got firewall as a service and VPN as a service working simultaneously with the L3 agent but a required cartwheels on the code and that's not the way you want it want to build a general framework. So Neutron agent extensions arose to meet those challenges. The extension manager is a new class that was implemented subclass from stevedores named extension manager in a pattern that was proven to work well with the Neutron plugins and the ML2 mechanism drivers. It kept with that same pattern and the agent sends messages to the extension manager which then forwards them to all extensions. As I said the first implementation of this was in quality of service which is an L2 concept that extends the idea of the network port by adding regulation of data flows. So the way this was implemented a new cost policy object to specify the configuration of the quality of service was added to the port as an attribute of the port. So any updates to the port data would go over RPC would include the relevant cost policy information. So if I update the port then the cost policy gets sent to the agent. But what if I just update the cost policy? Because we took something new, the cost policy and added it to something existing, the port, we needed a method to synthesize port update just in the case that the cost policy was updated. And so a cost notification driver which is part of the cost plugin in Neutron server was created to make sure that those events would be properly synthesized and sent down the wire. So that was the first version of agent extensions which worked well for the first feature we implemented in cost which was bandwidth limiting. Second feature we implemented with cost we discovered we needed a little bit more sophistication in the mechanism. That was the DSCP feature. DSCP manages a tagging of traffic using OVS flow entries. OVS flow entries are managed in the Neutron OVS L2 agent using cookies to make sure that say if the agent restarts it can identify flows that correspond to the previous iteration and clear those out. So the only flows that are there are the ones that it is inserted. So what we did is we added an agent extension API, a programming API, not an inter process communication API. And basically that's an object that has visibility into this information that's in the agent, but that object is transmitted to the agent extensions. So they have visibility into this piece of data that only exists in the agent process. At the beginning of the Newton cycle we attacked the problem of the L3 agent which was suffering some of the same issues that the L2 agent did. And so the pattern of agent extensions looked like it would solve the problems that we were having. So what we did is we took the agent extension code, made it generic, moving it out of the L2, partially out of the L2 tree into a series of abstract classes. So we adjusted the L2 agent extensions to inherit from those. But then we could also set up a parallel L3 agent extension that would serve the layer three needs. The difference between them really comes down to the layer two agent is concerned with port events. So that's the information it forwards on. When it receives a port update or whatever, it forwards that to its extensions. The layer three agent is concerned with router events. So when a router create, delete, update is received over our PC, that's what it forwards to the agent extensions. So the second implementation that we work with with agent extensions is firewalls as a service. Firewalls as a service had a complicated inheritance based relationship where it was overriding the, or hijacking the neutron agent at the start of Newton, but that was just too complicated and that relationship was severed. So we turned the firewalls service agent into a layer three extension using this framework. It drove the creation of an agent API similar to the L2 one, except instead of OBS cookie, what it's allowing agent extensions to have visibility into is the router info. In the case of firewalls as a service, it needs router info to associate routers to name spaces so they can take action in the correct name space. So that's where that's where agent extensions came from. To go into a deep dive, I'm going to pass it to my colleague, Margaret. Oh, crap. It's off. Hello. Great. I did that. Here comes this guy. Okay, I have things that I can say without the screen. So as I was thinking about what I wanted to say in these slides, I was imagining myself in the audience and imagining myself what I would want to get out of this talk and what I would want to get out of this talk, thank you, is information that would enable me to write an agent extension. And so it was with that in mind that I wrote what I wrote. And what I'm covering specifically is the architectural details that I think somebody would need to know in order to write their own agent extension in a later section of the talk, we'll get into more of the weeds and talk about details. Maybe I'll give this whole thing without slides, which brings me to my next point, which is that this is the first presentation that I've done. And I'm nervous. A colleague of mine told me that I should just come out and say this. But you needn't worry as long as these come back because my colleague, Nate, is prepared if I faint to step over me and continue with section two. I have one more thing I can say before I need the slides, which is that, again, thinking about myself in the audience, thank you, what is it that I did that... Another thing that I was thinking about imagining myself in the audience if I hadn't yet contributed to Neutron, is what would it be like psychologically? So my own story is that I'm relatively new to Neutron and to OpenStack. And actually, I'm relatively new to networking too, at least at the L2, L3 layer. But the core and other frequent reviewers in Neutron, in addition to being super smart, and like one of the core, it's just left and he couldn't hear me say this. But in addition to being super smart, they're also unbelievably helpful to new people. And most importantly to me, they're really nice. And so for those reasons, I was able to come up to speed, I think pretty quickly. And in fact, you know, work on some of the things that you saw in that earlier slide, for instance, the generalization of the L2, L3 agent extension code. So, you know, if you're out there and you're worried that this would be too daunting for you, I encourage you to just jump in and it'll be fine. Okay, now, if I don't mess this up, I can talk about actual content. So as Nate mentioned, earlier agent extension implementations highlighted some issues that we need to be concerned about. And they are that, you know, in addition to having to wanting to run multiple extensions simultaneously, we also want to be able to implement an extension and change it, extend it, refactor it without touching the agent. We want to be able to load an agent extension without affecting the agent. And most important, I think most important, we want to be able to give the extension access to agent resources. And these concerns are seemingly in tension with one another. It seems like the natural way to address the last issue anyway is to, is by some sort of subclassing in one direction or the other. But if you subclass it in such that in such a way that the agent is a subclass of the extension, then you have thereby touched the agent. And if you go the other direction and the extension subclasses the agent, well, then it becomes really difficult, as Nate mentioned, not impossible but very, very difficult to run more than one extension at a time. So we have these two tools in our toolkit that were recently pulled out in order to address all of these concerns. And they are the stevedore packages named extension manager and the Neutrons agent API implementation. So we'll start with the stevedore package. So there are three components of the framework, three classes really that have specific responsibilities in order to enable agents extensions, at least the new form of them. So the agent needs to instantiate the manager, create one, and it needs to send the manager messages that are ultimately intended for the extensions to the manager. The extension manager is going to load the extensions, of course, and it's going to broadcast agent requests to the extension. There's an abstract base class that's used to provide the interface for the extensions. If you're writing an extension, what it needs to do is it needs to implement that interface. And then you, the person, needs to create an entry point and register it with the caller, I'll show you an example of that, and define a unique name space for the collection of extensions where your extension will live. And we're going to see an example of what I just talked about momentarily. So the named extension manager handles several, but not all of the concerns that I talked about earlier in this section. It, you know, enables you to load extensions, multiple extensions at runtime, and it enables you to implement an extension without touching the agent. So we have this last concern remaining, which is that we want to make sure that the extension gets access to agent resources. And to do that, we've implemented this notion of the agent API. I'm resurrecting this diagram from Nate's portion of the talk, because I think it really captures quite well what's happening with the agent API. In this case, we're looking at an L3 scenario. We have this router info that is the piece of agent information that we want to make accessible to the extensions. What we do with the agent API is that we have a single copy of this router info, which is what you would want, that's made accessible, it's made visible to the agent extensions. You see these windows that intercede between the extensions and the router info, and that sort of indicates, it should indicate that it's a read-only access. We really don't want the extensions to modify this agent information in any way. Okay, so now we're looking at this framework again, except augmented, and the additional pieces are the additional pieces required to bring about this agent API feature. The first addition is the agent API itself. That class is going to be instantiated by the agent, and the agent is going to provide to that instance the agent data that it wants the extensions to have ultimately. The agent API also defines several methods that the extensions will call in order to access the agent information. Now, in addition to having to instantiate the manager, the agent needs to instantiate the agent API, of course giving it the data that it needs to give it, and it sends this loaded up API to the manager. An additional task or responsibility of the manager is to forward that received agent API along to the extensions so they can use it, and then the abstract base class has two additional responsibilities, which are to define a consume API method and an initialize method, and I'll get to those in just a minute. Okay, so the interface for the extension has changed with the introduction of the agent API concept, and it now needs to implement these further two methods. The consume API method does what you would think it does. It enables the extension to receive and keep the agent API. It's not keeping the agent information, it's keeping a mechanism by which it can access the agent information, and then it needs to implement this initialize method, which will allow it, well, to do anything it wants, actually, but often what it's doing is it's doing something with the agent API. So now I'm going to show you some code specifically first on the L3 side, and I'm going to be looking at firewall as a service. It's a service that I recently did some work in, and in particular, it's on the L3 side, there's both L2 and L3 sides of firewall as a service, and I'm going to be showing you some code from the V2 version of it, which specifically applies firewalls to router ports. So I wanted to show you in the Neutron code base the aspects of this new architecture, where they lie, where these different components live in the tree. And I've also shown you here where in the Neutron foie's code base the extension in question that I'm going to show you lives. This is the L3 agent extension API. And this is real code. All of these are real code, except that anything that is extraneous, anything that doesn't have to do with this talk has been removed. So nothing runs, really, and it's not necessarily pet bait compliant. So don't look to be testing this out directly. But you can see that in it's a knit, the API takes a hold of the agent API and assigns it to self, but as a private variable. And then these other methods expose certain parts of the router info to, did I say agent API? Okay, it assigns the incoming router info as a private variable to itself. Then these other methods are going to be called by the extensions. And they enable the extensions to access the appropriate parts of the router info. The agent. So you know, the agent needs to now do several things, right? It needs to instantiate the manager. It needs to instantiate the agent API. It needs to send that agent API to the manager. And here is a representative method on the agent where a router gets added, the agent gets notified, ignoring that this is a private method. And in turn, the agent is going to notify the manager that the router has been added. And then the manager, as you might expect, is going to notify the extensions. So the extensions manager, as Nate mentioned, has both a general and a specific L3 or L2 specific side. So the general side is is to define, excuse me, is to load the extensions. It's going to load the extensions using a list of extensions taken from a config file. This is the, you know, run time loading of extensions. And it's going to, you know, tell its super class, here's the namespace for these extensions. The initialized method is going to tell each of its extensions. I want you to consume this API, this agent API, and now do, you know, do some initialization with it, whatever that might be. You'll notice that consume API comes before initialize, and that's important that that happened, because on the extension side, the initialized method and other methods as well may and will really assume that the agent API is accessible at that point. So this consume API method really needs to be just about the first thing that happens to the extension. And then a subclass of this generalized class is the L3 specific part of this, and that handles router specific behavior. And when I said that I'm showing you code and I've removed stuff that's extraneous to this talk, I should have also said that I have not included everything that's relevant to this talk. So there's more than just add router. In case you're wondering. And here is the agent API. There's, again, both a general and an L3 specific set of classes. The general defines these two methods that I've been talking about. And the specific one defines these router specific methods. Now both of these classes are an ABC abstract class. And some but not all of the methods in them are ABC abstract methods. What this means is that any subclass that implements one of these classes must implement the abstract methods. And if there's a subclass that doesn't implement all of the abstract methods designated as such with this decorator, then there will be a runtime error, which is nice. You'll see here that consume API probably the most critical method in this game is not designated an ABC abstract method as such. And that's intentional. The reason for that is that by the time this architecture came into place, there were already extensions in play that did not have that method defined. And we didn't want to break them. And it turns out that, well, they don't need it yet. Or maybe at all. So that's so it's a historical reason. Here's the extension. And you know, it kind of does exactly what you would expect at this point, right? It implements the interface. It takes in the agent API and assigns that to self. It implements these router specific methods. And then elsewhere in the code, you'll see that it's calling methods of the agent API in order to get to receive information about router info that object. And finally, I promise that I'd show you how to set up an entry point. And here it is. You've got to create it in your setup config. And then you've got to tell the agent that it's there. So part of that was cut off. You'll have to figure it out. Okay. So you can see that with the combination of these two things that we've satisfied all these concerns, right? I'm not going to read them over again, but we are good to go. And I'm going to talk now very briefly about the L2 side of things. And the reason I'm going to talk only briefly about it is because David has really great things to say, but also because the patterns that you would see, you do see on the L2 side match exactly what we've seen on L3 with with some exceptions that I'm going to call out now. Again, here's information about where these different components live in their respect in well, it's all in neutron in the tree. I want to point out the agent side of things because you know on on L3, there's not just a single monolithic excuse me L2. There's not a single agent. There's multiple OVS Linux Bridge and SRIOV. And they live pretty deeply buried in the neutron code. That first path name you see took me a long time to find that and now I've saved you some time if you've been looking. And the the cross agent, sorry, the OVS agents, the sorry, the cross agent extension is the extension that I'm going to be talking about briefly here using an OVS driver and the location of the files that are relevant are in the space listed there. Okay, there's one difference between the L2 and the L3 agent APIs. And that is that on the L3 side we have a single class that constitutes the API. And here we have two and you'll you have to remember that this is specific to the OVS agent. The the OVS agent has a special demand, which is that any flow entries and flow tables need to be designated as essentially owned by that agent or agent extension. And to do that, Nate described the cookie mechanism that that performs that role. And so we need to allow the agent to provide a cookie to the agent extension actually to the driver itself. And so we have two classes here that perform that work in conjunction. I'm going to skip through this and the next couple of slides to get to yep, to get to the extension. And I just want to briefly point out that you'll see in its initialized method, it's in turn calling both a consume API method and an initialized method on the driver. And that's because the driver is the thing that needs to access agent information, namely by getting the cookie. The those method names, the common method names between the extension on the one hand and the driver on the other hand are incidental in the sense that they they don't need to be the same. They happen to be the same if you're like me, that would confuse you. But it's incidental. And I guess I'm done and handing it over to David. Thank you very much. Okay. Next slide. There we go. So just in section three, I'll be running over the use cases for the agent extension as well as talking about some of the future work that's currently in progress. So there we go. So the use case for the L2 extensions include Quas, TAP and firewalls. So the L2 extension updates every time the ports updated. So that makes it very good for Quas where you could put limits on the bandwidth for each port, TAP where you can mirror traffic from ports to an external port just so you can debug traffic coming from those ports. So I'll just move on quickly to the L3 ones. So the use cases for the L3 extensions include the firewalls service, load balancing and VPNs. So the L3 updates on the router update so it'll send you all the gateway information for the routers. So that makes it very good for firewalls of service if you want to kind of do access control lists on the traffic coming in and out of the gateways. So for example, maybe you want to restrict traffic coming into this particular subnet from another subnet. So that'll be a good use case for that. Load balancing so you can just distribute traffic as it comes into different VMs to stop the VMs basically getting overloaded with traffic. So just for the future work as well. So the work we have planned so far for agent extensions, one is the L2 OVS Flow Manager. So the L2 OVS Flow Manager tackles a problem that came up when the Consume API was introduced to the L2 OVS agent. So what basically happens is that the Consume API was provided to allow extensions access to the open flow table. But because this is a shared resource between all the extensions and the extensions don't know what other extensions are loaded, they shouldn't either. This basically means that there's contention on how it can be used. So if you have multiple extensions on the same, working on the same flow table, there's no guarantee they'll put their flows on so that they don't interfere with each other's operations. So the other one then is the Neutron Common Classification Framework. This is the Neutron Common Classifier. So basically this is about introducing a common framework for Neutron and extensions to define traffic classifications within Neutron. This is to stop every extension defining their own classification framework and effectively having a horrifically inconsistent API across all of Neutron. So the next thing is the bonus round, so making your own agent extension. So I'm just going to go over some parts of it as quickly as I can to leave time for questions. So these are kind of some of the key parts with the last two being optional parts but I think it's very important to know if you want to try and develop one yourself. So the first one is the agent descriptor. So this kind of defines the abstract plugin, gives it the names, the aliances, the description updated and you see it gets the plugin interface. So what that does is it returns an abstract class for your service plugin. So it defines your REST API effectively. So the next one is the resource attribute map. So what this does is it lets you define custom types. So for example with QoS that would be QoS policies and QoS rules. So it allows you to kind of create your own commands and apply them selectively to ports. So this is the service plugin. So this is actually what would extend the REST API. What I have here is actually the service plugin base for the aptly named skeleton port plugin base. This isn't exactly what you'd use for a plugin base. I've actually included some things you would use in the actual plugin just to show that they're necessary. But as you can see on the left hand side of the screen as well there's a very, very tiny font. The setup CFG file and the entry point defined right there for how to get the service plugin loaded. So you can see it just says neutron.service underscore plugins equal to and then skeleton port the name of the extension equal to and then the pathway to where this class is actually defined. So that pathway is to where the actual plugin is defined. This is just the abstract one. So next. Next. L3 agent plugin. So this is the actual plugin then to the agent. So the last one attaches to QService for your REST API. This one's the actual plugin to the L3 agent in this case. So this is the bare minimum. So on setup.cfg you define another entry point which is the neutron.agent.l3.extensions. This was the name space that was described earlier as well earlier in the talk by Margaret. So and you can see there this is for the L3 one. So it's called the skeleton router which is the name of the extension and it equals to the pathway to where you would find this extension inside the skeleton extension project that is out of tree to neutron. So if you just look to the actual code on the right hand side of the screen you'll see that it just prints log information every time one of the methods is called. So you can just see as well there's a QSOOM API. So it'll absorb that and then you have add router update router and delete router. So without a without a service plugin or an RPC endpoint this will be called every time a router is being updated. So you'd have if you wanted to update it for example when you get when you have your service plugin what you'd need to do is create RPC endpoints on either one. So on the service plugin to let the other agents know hey your special variable has been updated and another endpoint here that'll listen to it to actually trigger all the all the calls once that particular RPC topic has been called. So I'll just move on to the L2 extension. This is very counterintuitive if you pull back to go forward and forward to go back. But yes, so the L2 agent plugin so it's loaded by the Neutrons agent Q agent so the CL21 same thing you define an entry point in the setup.cfg and this one has more information on that. So with the L2 agent it doesn't have add update and delete port instead what it has is handle port and delete port. So just what I put in here was a bit of fluff that basically it keeps an internal list of the ports that come in. And if one isn't in the list it'll call the create driver and if one is in the list it'll call the update driver and then it'll pop it when it's deleted. Yes, so you'd define the RPC endpoint as well if you were tracking your own custom variables and data types in here as well. So the next thing is the L2 agent drivers. So this isn't as important for the L3 because there's only one backend at the moment but with the L2 you have the OVS, the Linux bridge, SRIV just to name a few. And what's important then is that when your extension is called that it has the right backend because using the Linux bridge one when you're using OVS isn't the best. So as you can see at the top I've defined entry points for the OVS, Linux bridge and SRIV and these all have pathways to different drivers that are inside the custom out of tree extension. As you can see on the left that's the initialize from the previous driver and the first call after the log is to assign the skeleton port driver and as you can see it makes a call to the new Tron manager. It'll use the driver type to associate that with a driver backend and then it'll put that there and on the right then I've defined the abstract class for the driver. So as you can see it's all ABC so if you don't implement one of these it'll throw an exception at runtime but as you can see here as well I haven't put that on the consume API and the reason for that is there is no agent API for Linux bridge and SRIV. So yes also actually if you have any errors as well inside the driver or trying to load the driver it won't actually tell you what it is it'll just say it failed to load the driver so just something to keep in mind when you're trying to debug any problems you might have trying to load backend drivers. So the next one is extending the new Tron command line. So the new Tron command line is deprecated by felt this was important just for debugging purposes if you wanted to make sure everything was set up right. So you define the entry point as well in the setup.scfg and then to the right you can see you kind of just define a typical new Tron command file and then as a yeah and just link it with the extension name. So the last thing I just wanted to cover was a DevStack plugin so to create a DevStack plugin for an out of tree project all you do is create a DevStack directory and then you put inside a settings file and a plugin.sh file. So the settings file what you can do is you assign variables and these can be overridden inside your local.conf when you're stacking and in the plugin.sh file it's a script that'll run at the different stages of DevStack as it passes in the different arguments. So for example you see here there's two for when it's stacking and when it's installing and when it's stacking and when it's in post config. So for example when it's installing it'll just put it in the Neutron Skeleton extension directory that you assign or you don't which case it defaults to the destination with Neutron Skeleton extension or post config where it actually writes in the different agent parts into the different it writes in sorry it writes in the extension names onto all the agent extension fields at the end of it. So these are just some resources that we think were very useful in writing and making this talk and just some legal notes and disclaimers and logos, logos. And here are some QR codes. So the one on the left is for a repo to this actual agent extension if you want to have a further look at it and just see if there's anything of interest in there. The one on the right is for this talk. So if you want to look at this later or get the references from the talk you're very welcome to do that. And yes, are there any questions for any of us? Were we good? Good. Thank you very much. Thank you very much.