 All set? Hello, everybody. We'll be talking about networking acceleration today and how we can enable that with the Cyborg service and OpenStack, OK? OK, so we'll talk about why we need hardware acceleration in the first place. We'll talk about some usage models for how we can use networking, et cetera. We'll talk about where Cyborg is today. We don't have networking support yet. We'll talk about how to enable networking support in Cyborg. And in 10 minutes, that's pretty much all we can do. We had a more detailed discussion of Cyborg in a project update. We can talk about that also later. So why do we need to accelerate networking at all, right? So if you look at where we are with OBS switching and other forms of network switching, they consume a lot of CPU, especially as we move towards 100 gigabits per second, the packets rate go up. So there's an increasing need to offload them to some kind of hardware accelerator that's also needed for improving latency and jitter, et cetera. And this is one primary reason. As we move towards 5G, it's another important trend. We are seeing this strict requirement for latency and jitter needs. And we are kind of not there yet to software-only implementations. And furthermore, the 5G standard is also evolving. It's not some fixed thing. So we not only need accelerators, we want them to be programmable. So it's important to have a notice to some kind of AC accelerator or the ASSP, but some of these can be programmed repeatedly as the standards evolve. And the third motivation is edge computing. This actually adds on to the previous two. As we enable new applications like augmented and virtual reality, they are stringent latency needs of their own, and that's going to complicate things further for other needs. So we want accelerators to programmable and also to deploy it either in the edge or the data center, possibly a combination of both. So these are the reasons which are driving us towards that. So with an open stack, as you may know, we've got a cyber project for enable accelerators. And what it essentially aims at is a life-cycle management for all accelerators. It's not only for GPUs or smartnecks or FPGs, any one of them or all of them. We already have support for many of these types. We don't have networking support yet, but we are increasingly hearing from various partners and customers. We should be supporting some form of smartnecks and programmable accelerators and so on, okay? So there's a quick overview of Cyborg. It's only 10 minutes, I won't spend too much time on this, but basically Cyborg is a service. It's vendor-neutral like pretty much any other service. It's also supposed to be hypervisor-neutral. The way it works is that we've got an API server in the controller along with NOVA, Neutron, and everybody else. And we also have our database and a conductor. So it looks like a regular project in that sense. We also have an agent running on every compute node and that agent has a bunch of libraries to call the drivers. The drivers in this context are basically libraries which the agent can load on which are device-specific. So for example, we've got a Cyborg driver for NVIDIA GPUs, we've got a Cyborg driver for Intel FPGAs and so on, okay? So plus for FPGAs, we also use the glance storage for storing the bitstreams along with the VME majors, okay? Please stop me if you have any questions. So why do we need Cyborg in the first place? If you are looking at GPUs today, we typically use them at PCA white lists and they're typically difficult to use. They're basically it's a configuration file that you need to keep updating. We had a forum this morning where we had several operators present, they also pretty much conquered that it's a difficult thing to use. But when it comes to programmable devices, it won't even work anymore because you could have a FPGA with a certain PCID, let's say Gzip inside it. Another FPGA with something else like a VRan workload inside it. So looking at the PCID does not tell you what's inside it. So you can't ask for a VM with a Gzip offload or a VM with a VRan offload. You have to necessarily go by a PCID. So with programmable components, you can't even use PCIDs anymore. Cyborg solves those problems by doing two things. You don't need a config file, you don't need host aggregates. You're going to use the drivers to discover the devices and populate them in placement automatically so NOAA can query it and get everything in a holistic fashion. Plus, we also find the things inside the drivers. The Cyborg drivers can find out what kind of functions are implemented, what bitstream is programmed, and so on. So all the information is available to enable use cases like, I want an accelerator with VRan offload. We can make requests like that, okay? So the key idea here is that we're going to leverage placement, right? So we're going to create resource providers in placement for every accelerator type. A smart maker, a GPU, or a FPGA, we'll create a resource provider for that. And the device properties will become traits in placement. And we've got a whole thing worked out. So the way you would use that today is that you would configure the drivers. So if you've got an NVIDIA GPU, for example, you'll put the GPU driver. If you've got a smart maker, you'll put the corresponding driver in the future, et cetera. Then you'd load up any bitstreams you may have for FPGAs. Then you'd define something called a device profile. So normally, you know I've got a notion of a flavor, right? In the flavor, you define extra specs for additional resources. We just took the same concept and factored them out because we put everything in flavors, the number of flavors they explode. We don't want a flavor explosion. Typically, operators do use flavors for billing and accounting. So we don't want too many flavors lying about. We put them in a separate thing called a device profile. There's an example of a device profile in the bottom. So essentially it looks like extra specs. We've got notion of resources and traits, except that instead of plain key-value pairs, we don't make it into kind of a JSON format. It's kind of easier to manipulate, okay? So you do all that stuff and finally you can just create a VM with a particular flavor. And we've got a whole workflow, worked out with NOVA for how we can launch a VM with this flow, okay? So this is what we have today. And the question then becomes like, how are we going to adapt this for networking to understand that we first need to understand how it works today without networking? So the very broad idea is that when the NOVA sees a device profile name, the flavor, it's going to query Cyborg first and say, give me your device profile information. So it gets the resources and traits requirements from Cyborg and merges them with other request groups from the flavor and other sources. It makes a consolidated list and then queries placement with that. So when placement returns the values, it returns not only the compute host, but also the devices. It's going to say, here's the compute, notice the resource provider and the device is a nested resource provider. So essentially it's running as a tree of candidates. So basically the various candidates are available to NOVA and the NOVA schedule is going to pick one of those. When it picks one, it's essentially speaking a host and a device in contrast to today's flow as only a host involved, right? So once it does that, it's going to call back into Cyborg. For those of you familiar with Neutron, we've got a notion of port binding. So we have a similar idea here in Cyborg called accelerator request binding. So essentially you choose an accelerator request and say bind this to this host and this device. So once you do a selection call to Cyborg for binding and binding in this case asynchronous because it may involve potentially preparing the device or reprogramming it. So NOVA is not going to block on that. It's going to issue the bind request and move on. It's going to call into Neutron and send on everything else. And while that's going on, it's also sending a message to the word driver. The word driver will finally query Cyborg to get all the details. So the broad idea is that instead of getting a PCI device as more whitelist, they're going to get the PCI device on Cyborg after the device is prepared. That's the overall idea. And this is what we're currently implementing. We already have this in train. It's been implemented on the Cyborg side. On the NOVA side, we've got a bunch of patches. They are still being reviewed. So it's not fully merged. But this is what's been implemented on the Cyborg already. The question now is, where do we go forward? First, we need to complete the NOVA integration. And we already have many basic VM operations working with the accelerators today. That includes creation and deletion of VMs. You can pause, unpause, stop, start, resume, all of those things. But we don't have networking today. So how do we get networking? So here's the meat of the idea. This is more like a proposal. We are kind of working on some ideas right now. It's not yet proposed in the community yet. But basic idea is, instead of sending a device profile name and the flavor, what if I send a neutron port instead? So you may want to create a VM with one management port and let's say one offload port. The management port stays the same. You don't need anything new for that. There's no offload there. So do it the old fashioned way. But for the one which requires offload, you put this device profile which says, I want this straight and this kind of device. So if you put that in the neutron port, NOVA should be able to query neutron, get those device profiles, combine them with other request groups, and do the placement query as before. So when it's choosing a candidate, it's actually taking the neutron needs into account. Neutron today is not influenced scheduling, but with this process we are hoping that it can influence this NOVA scheduler itself. So we do that. So net result is like the old way that instead of getting a PCI BDS, no whitelist, we're getting it from Cyborg. Cyborg would have taken into account what NOVA requested in the binding process. Once the Cyborg binding completes, you get the PCI BDS and you pass that to neutron for binding. So neutron ports today get the PCI BDS in the whitelist. Instead, you get them from Cyborg and then pass them to neutron. So from neutron's point of view, it's very little change. The only thing we're proposing is that in the port binding, that should be a device profile name as a key value pair. It's only changing it in neutron. At least as... Sorry? We're proposing the U-cycle whenever it gets merged. It's very difficult to force you when it's going to get merged. But we want to propose this as a patch in the new cycle. Absolutely, yes. So we propose to submit a specification on our associated patches. We also talk to you guys in the course of PTG. Yeah, cool, okay. Yeah, cool. Yeah, that's the basic, yeah. Sorry, can you please use the mic? The question was, can you get rid of the NOVA compute? Without open stack in play. So... Can we use Cyborg agent and drivers without... Okay. Suppose now today I have a bare metal. Okay. So on top of that, I would be having my DPDK drivers and then doing my abstraction with the DPDK for the underlying NICs. Right. So my DPDK provides me the flexibility to whitelist the NICs and move it to the fast data path. Okay, okay. So can I inflict a Cyborg agent and drivers there? Like, I not even have my open stack agents coming into play. Sorry. I'll talk you of DPDK inside a VM or DPDK in a host? DPDK on a host. On a host, okay. Yeah, it's a slightly different use case, yes. Right now I've been focusing mostly on VM level offloads. Okay. I think what I'm thinking about something like OVS offload or offload from the host. Right, even if that is the case, can I spawn my VM on a bare metal using lipvert and I have my, can I have my Cyborg agents there? Right. Yes. Or it is heavily dependent on NOAA compute as of today. The fundamental operation of Cyborg in terms of programming by itself, the discovery and programming are almost a standalone. Okay. It should be possible to list your inventory, do programming, et cetera, without NOAA being involved. The reason why we're involving NOAA is to put up VMs. Okay. The act of spawning a VM through world drivers, scheduling everything is in NOAA today. So say suppose I have an alternate in, in fact, NOAA, which spawns my VM. So my Cyborg can still be used there. Yes. In principle, yes, we need to have the right way to call into Cyborg to do all the preparation. If it's like API compatible with the way NOAA works today, it should work. Yeah. Right. Thank you. So you're suggesting API compatibility for taking off compute, NOAA compute from there? API compatibility for the way you call into Cyborg. Cyborg already defines some APIs in a standard way. NOAA simply calls into them. Standard REST API is documented. Correct. It's version two API which is published in train. So just call it to them, it should work. So the subsequent question to that is, OpenStack Controller is there, right, in the left side? Yeah. I don't need Glance Neutron Keystone. If I put a Kubernetes controller, how does it work? Oh, I see. Okay. That's a little different question because Cyborg API controller, the server, is actually running on the OpenStack controller. Oh, I see. Right. It's part of the REST API server. So you don't have standalone version right now? That is correct. Okay. Yeah. It's tied into the OpenStack infrastructure. Yeah. Thank you.