 Good morning folks. Hi, my name is Pooja Akumri. I work as a software engineer at platform line systems and today I'm going to talk about how do you deploy a virtual network functions with Kubernetes, parts and VMs. So the agenda, this session is to cover the following items. Firstly, and talk about virtual network functions in general, like what it really means and what are the advantages of running virtual network functions and what kind of application performance enhancements we can do use the two technologies here. One being SRIOV and after that we'll talk about OBS DPDK. For each of these technologies I'll talk about at a high level what it really means. How do you really deploy it on a Kubernetes cluster in terms of any kind of configuration that needs to be done on the host. And once that is done, how do you actually deploy a VNF application using the Q Fort virtual machine platform here and do a quick demo after that demonstrating VNF applications or other VMs and parts that can be created using SRIOV and DPDK networks. So let's get right into it. So firstly, what's a virtual network function right so network functions are basically all the networking capabilities that you typically expect from effect to run on hardware that's dedicated for running such kind of applications and virtual network functions is a new way of kind of doing it, using the virtualization layer and there are several advantages to it that we'll talk about next. Before we get into that, we hear these terms NFVs and VNFs a lot so I just wanted to first kind of give an introduction to what's a network function virtualization or NFV in short and then move on to VNFs. So NFV is basically the architectural concept where it allows you to abstract away network functions from the physical hardware. Like I mentioned in the past a few years back you would have dedicated hardware that's very specialized and capable of running these network functions like routing and firewall and security rate. These are all of those applications and the new way of doing it is using virtual software based applications. So NFV is more of the entire architectural paradigm and in terms of the three basic components of that. These are the three framework level components, first being the actual application which is your virtualized network functions. Secondly, for running these applications you need infrastructure to run it on. So that's the NFV infrastructure or NFVI. And lastly the management automation and network orchestration layer, which is basically abbreviated as Mano here and I'll talk briefly about all of them right so. With the telco cloud, this is like the new way of doing it without having dedicated hardware for each network function. And the primary goal is to improve agility and scalability for different telco service providers right for being able to add new applications on demand and that shouldn't acquire additional hardware resources. So if we talk about VNFs, these are the software applications that would deliver functions such as file sharing, directory services, routing, firewall mechanisms and all of that rate. So there could be different applications that you can run as VNFs and this could run either in a virtual machine or a Kubernetes part. The next thing in terms of the infrastructure, the NFVI components. The three basic ones would be compute, storage and networking and in terms of the hypervisor, it could either be running on KVM or VMware, any of those are a container management platform like Kubernetes with Q-Fort. So apart from running these network apps, you also need a framework that's capable of managing the NFVI infrastructure itself and secondly also handle the lifecycle of these new VNFs that are getting deployed. So that's the third component, the management automation and network orchestration there. So now we spoke briefly about what VNFs are and primarily they are applications that run on top of this NFVI infrastructure like I said and mostly they are going to be deployed as virtual machines and that's the VNF part of it. And corresponding to that, there's also a CNF technology which basically runs it on containers. The common VNF applications that I spoke about, these are just some examples that you can run and the goal here is to not run it as one monolithic VM rate. So the way it kind of does it is by doing a service chaining in the sense that you can have different functions from as part of independent network functions and you can kind of put them together as building blocks in a process and the service chaining of the entire network function components is something that you can simplify using VNF technology. So what are the benefits of using VNFs? So network functions have always been there and with proprietary hardware you could get a good performance out of these applications. The problem there was that it kind of becomes monolithic at a certain point and scaling individual components is a problem. So with virtual network functions you get improved network scalability because each of the network function is running on its own VM, has its own resources and you can enable it, disable it or manage it as needed independent of any other components in that service chain. Secondly, because you're able to run it as VMs, for example, you can pack more number of VMs on a given hypervisor in that case. And with that density of VMs you get an efficient use out of your whole infrastructure, right? So it's getting utilized to its maximum potential with that. The other advantage of doing that is because you are relying on maybe lesser commodity hardware because you're running it with either containers or VMs, the power utilization of your data center goes down drastically. So that's another side benefit of it. You can also implement better security policies with VNFs because you have more fine grain control over each of the functions. Obviously this saves on any physical space in your data center that you would have otherwise needed if you have like a hardware setup rate, any hardware appliances for each of your functions. Apart from all of that, in terms of operational and capital expenses for running a data center with a telco cloud, you can get significant savings with running it as a virtual network function. So these are some of the benefits that you would achieve with the VNF department. And in terms of optimizing the performance of the VNF applications itself, the primary goal here is how do you get heavy network traffic optimized for VNF applications when you have multiple VMs running on a hypervisor in your cloud. Since in our NFV environment, there are going to be multiple individual VNFs that are joined together and you're trying to create a single kind of a super meta service out of it. You're looking at the efficient memory access and resource allocation over for an aggregated system rate. And when you're dealing with a high amount of network traffic. So the performance that you would achieve with the native Linux kernel stack that's not going to suffice in this case because the number more the number of VNFs that you pack on a single host it's going to drive up the aggregate usage to a very high level that cannot be met by standard Linux stack. So the SRIV and DPDK are two kind of solutions that can be used either independently of each other or in a combination and that helps you achieve like faster packet processing that's needed by these VNF applications. So first let's talk about SRIV. That stands for single root IO virtualization. And with SRIV the benefit that you get is, you can have dedicated PCI devices for each of your virtual machines. And the advantage that you get out of that is because you're creating multiple virtual functions outside of this out of a single PCI device, you can allocate them to independent VNFs and not have any overlap between them. So this kind of gives you a good isolation of the resources and it's better for performance as well as security reasons. And it's also easier to manage because you can basically deploy one VNF independent of the other and add or remove SRIV networks to it as required. The performance benefit that SRIV gains, it comes from the basic fact that in this case it's by the network traffic is bypassing your hypervisor and without getting any kernel interrupts for any data that comes in or out of the VN, you are able to access the NIC directly. And that that's primarily how it's able to give you faster packet switching. To actually run SRIV on a host, you need support in the BIOS level, as well as at the operating system level. So those are kind of the two requirements for running it. And in terms of how it's presented to a VM, it doesn't really make a difference to the VM because it's on the host level that you are kind of slicing a PCI device into multiple virtual functions. But from the guest point of view, it's only going to see it as on the NIC card. So the application running inside it does not really change a whole lot. In terms of the terminology used here, the physical functions are basically your full featured PCI functions. And when I say full featured, it basically means it comes with its own set of configurations, right? You can configure the PCI device and each individual PF. When you slice it into multiple virtual functions, these are lightweight PCI functions in the sense that you don't have the same configuration capabilities that you would get with the PFs. In terms of uniform in terms of the device type that would match the physical function that you have. So this is just a pictorial view to kind of explain how SRIV really helps you bypass the hypervisor. In this case, for instance, if you had a VM utilizing the obvious bridge, that's the open V switch bridge on the hypervisor. With SRIV, any data packets that come in or out of the VM, that would create a kernel interrupt. And when it processes that, there's obviously a switching over at that happens. And that slows down the network performance, network throughput rate. So with SRIV, because you are giving each virtual function direct access to the NIC, the network throughput increases significantly. So that's how your VNF scan benefit from that. Now, when we talk about deploying VNF apps on the virtualization platform, the solution that we're looking at here is Cupid, which is a virtual machine framework that runs on top of Kubernetes. With Cupid, there is inbuilt support for SRIV. You just need to enable a feature flag for enabling that. And once that is done in terms of actually being able to use SRIV, there are certain plugins that you also need on your Kubernetes cluster. The first one of them is the SRIV device plugin. The device plugin is the one that's responsible for kind of detecting and discovering the any SRIV resources that are available on a host. Once it detects that, it also advertises them to Kubernetes. And the resource Kubernetes resource manager will then allocate them as it's requested by any ports and VMs. This is a read-only kind of service in the sense that it won't modify anything on the host in actual actually and the only thing that it does is basically advertises this resources and updates the capacity section of each cluster node, so that that can be used by the Kubernetes scheduler when allocating resources to VMs. Any virtual machines that you run on Cupid, it allocates a VFIO device for each of those parts and this is the only driver that is supported today with Cupid. Secondly, there's the SRIV CNI plugin. This is the plugin that's responsible for actually configuring whatever SRIV resources allocated to a specific VM. And in terms of configuring, it will kind of modify the host resources to prepare them to be used for a virtual machine instance. This is not a read-only plugin in that sense. And whatever commands it runs, it basically uses Netlink, etc. to move these SRIV virtual functions into specific pod namespaces. So depending on the namespace in which you're creating a VM, it will allocate, it will run the Netlink command and move over any virtual function or physical function into the respective namespace. Lastly, Maltes. So Maltes is more like a meta plugin. So when you create a network attachment definition in Cupid or in Kubernetes, you can basically use Maltes with the SRIV CNI plugin whenever you attach a VMI to a SRIV interface. So this is something that's part of the network attachment definition CRD object that I'll show shortly. And what it basically helps with is it identifies the object annotations and based on the resource name that you configure when you set up the device plugin. And whatever reference you put in there, Cupid is automatically able to fill that in in the work launcher pod that is running a specific VM. And it would go ahead and update the request and limit section for those parts there. So by using Maltes in combination with the SRIV device plugin, the resource name annotation is something that the pod will receive a device from the pool that is allocated by the device plugin. And this allows any, this allows us to basically avoid any manual intervention and passing in the required PCI address. So this kind of simplifies the process of a lot. In terms of any configuration on the SRIV host rate, the cluster for the host that you want to use for SRIV VNFs. You basically need a net card that is SRIV capable so not all hardware is capable of supporting SRIV by default. Once you have that there are also certain bios settings that you need to enable to actually utilize that net function. And on some of the physical servers you would see a setting and bios that's there's a global enable flag for SRIV that needs to be enabled and there's also a poor next setting that you can toggle and enable it for specific mix. Once it's enabled in the bios, you also need to enable IOMMU support in the kernel level. So in the kernel command line, you can pass in the below flags. So if it's Intel processor, then you would pass Intel IOMMU equal to on if it's AMD would do AMD underscore IOMMU. And similarly would also add these PCI equal to reallocate and PCI equal to assign buses command line parameters to the kernel config. So if you do make that change in the graph command line, then you need to kind of save that and rebuild the RAM disk and reboot your host in that case. If this is the first time that you're sitting up the host for SRIV cluster. Like I mentioned before for Q4 that only supports the VFIO user space driver to pass through these PCI devices into QMU for running your VMs. So you need to load the kernel module VFIO PCI for your VMs to consume it. In terms of how do you actually create a VMI in Q4 using SRIV. This is how the spec word kind of look like. So in the interfaces section for a given VM. By default you would always have the part network for the Kubernetes listed as the first interface. And there are two modes that you can put that in. So the default network here is put in the mass create mode. I'll not get into the details of that here. And the second interface is attached to SRIV network. The SRIV net as you can see on the right side it specifies that the network type for this is Maltis. And you can specify the network name here, which corresponds to a network attachment definition. So this is how a network attachment definition for SRIV would look like. What I was referring to earlier is the resource name that you specify as the annotation in metadata section. And here this is a custom prefix that you can have. In this case I have it said to entail.com slash entail underscore SRIV, which means the Kubernetes scheduler will look for any nodes that have resources of this annotation. And accordingly schedule parts or VMs on such a note. In terms of the spec of this network attachment definition, you will see that it has the type as SRIV with a specific CNI version. You can have optionally a VLAN ID added here, but that's something that you can choose to not have if it's a flat network. The IPM section, the example I have here, it's using whereabouts plugin for IPM, in which case you would specify a range of IPs, a subnet along with the allocation range start and end, and you can also specify that way. The IPM section is not really specific to SRIV, so the only two distinctive things here would be the type field and the CNI version that would apply. So we will cover the demo for SRIV at the end together with OVST PDK. So before that I'll just talk briefly about what this technology means. So the first thing just to kind of talk about OpenV switch. What it really is, it's a production quality multi layer virtual switch. In terms of the main components of OpenV switch, we have the forwarding path and the V-switch D. So forwarding path here is basically your data plane network forwarding path and the module is implemented in kernel space to achieve higher performance. In the second one, V-switch D is the user space component of it. This is the one that actually does traffic switching. So if you have an obvious bridge created on a hypervisor that's running multiple VMs, it would basically switch packets from one port to another depending on what's the source and destination of each packet. So basically you'll see that there are components in both kernel space and user space and we'll see how that really matters when we talk about DPDK. So what is DPDK? DPDK stands for Data Plane Development Kit. The goal of DPDK environments is basically to allow you to do faster packet processing for any telco apps that require a higher throughput. Similar to SRIV here, it tries to achieve that by bypassing the Linux kernel network stack. For implementing switching in the user space, it relies on pole mode drivers and DPDK is also something that you can combine with open V-switch. So when you have open V-switch on a host and you combine that with DPDK, you get accelerated performance because it's bypassing user space in both the layers. So in terms of comparison of network throughput with DPDK and SRIV, if you conducted a study where I tried to compare the performance in both the cases and what it basically concluded was that if you have a traffic between VNF apps that are running in the same server, in that case, DPDK would be a better alternative than going ahead with SRIV, which is more desirable if you have north-south traffic that kind of exits the neck, in which case you can actually take advantage of virtual functions. Having said that, you can also use both of them in combination so you can have obvious DPDK with that. So this diagram kind of shows very well how you would use a pole mode driver when in obvious with DPDK case. So on the left-hand side it shows the forwarding plane which is running as a kernel space module and in user space you have the switch ID. Now here what you basically lose out on is any interrupts that happen in the kernel space have to kind of make up cause a bottleneck there. Whereas on the right-hand side, the user space module has a component has a DPDK forwarding module which like I mentioned before it's relying on the pole mode drivers to do the packet switching. In each of the case you'll see that the VNet that's associated with every VNF will kind of go through OVS, but it will do the packet processing on the forwarding in the user space instead of doing that in the kernel module that was used for forwarding path earlier. So keyword support for DPDK is it's still kind of pending, I've linked here the GitHub PR that was trying to add support for DPDK. This is still not fully merged upstream. So the demo that we do will be based off these changes and we have kind of implemented that on top of the upstream keyword version, the latest one that we have. So the main components that you would need for implementing any DPDK apps with keyword, the Intel user space C&I plugin. Again, you would use the multist meta plugin to attach interfaces to our keyword VMs. And in terms of the packages that you need on the post you would need OVS that's specifically built with DPDK support. So these are the host configuration steps that you would need to perform on the host. Firstly, install the appropriate DPDK and OVS packages on host based on whatever distribution you're using. The obvious DPDK package is part of the overt repo so you can install it from there. You typically need to configure some number of huge pages based on your physical hardware capacity. So you can configure that using CETL and once it's persisted in that you should be able to utilize huge pages in your VMI spec. So for setting up DPDK devices, you would be using the VFI or PCI driver. So initially, you may have DPDK devices that are getting managed by a different driver. But when you want to utilize it for keyword, you would basically set override for a given PCI address and set the driver name as VFI or PCI. So the module is loaded and you have the OVS bridge and DPDK port setup done and DPDK configuration done. You would need to create a DPDK port in OVS as well. For that you can use the OVS VS CETL command line. I've just given you like one example where you first add a bridge with data path type as net depth. So instead of using a host net device, it's going to use net depth and you add a DPDK port to that bridge for every physical DPDK device that you want to associate with that bridge. And that can be specified using the DPDK depth arcs argument here and you would specify the PCI address that would match whatever was the ID that you passed in the command above with driver CETL set override where you attach it to VFI or PCI driver. So similar to the keyword VMI spec for SRIOV, here you would pass in the interface names and so the network name as V host user net one as an example and the type would be V host user for the second interface. The first interface is the same default port network in this case. The network name that you specify under multiple section on the right is a net one here and net one is what would be matched with your network attachment definition name. So here is a sample of the network attachment definition for DPDK. In this case, contrast in contrast to the earlier YAML find that we looked at the type here would be user space. And then you specify a CNI version and then there's this host and container section that you need to specify. Under the host dictionary you have the engine type as OBS DPDK. The interface type here is the host user. So typically with OBS DPDK you can either have the interface type as V host user or V host user client. And that's the one that we have specified here. So V host mode is client in this case and what the bridge name that we added with OBS VSTL ad bridge, that's the one that you would specify under the bridge name section. So under the on the container side, you will specify interface type as V host user and this is the server side of it. So what what it really means is you create a DPDK port on the host, which will be a client and inside the container where you're running a VM and using QMU and attaching the port as an interface to that VM. You can act as a server. So those are the two different modes where either host is acting as a client, in which case it's the host user and the other one is where container can act as a client or other QMU can act as a client. And all it really means is that you are one of them is responsible for creating the socket and the other one is going to try and establish a connection to that socket. After this will cover the demo side of Qvert and see how we can deploy VNF apps using SRAV and DPDK both. So let's get into the demo for running VNF applications using Qvert VMs. First cover SRAV interfaces attached to VMs and show how you can create a VM using Qvert and then move on to creating a VNF using a VM that has DPDK interfaces. So for SRAV I already have a cluster created here. It's a single node cluster. So the master node is also running workloads. And I'll show the parts that run as part of Qvert since I have that installed here already. So for the cluster you'll see the word API, word controller and word operator parts running. And for every host there is a word handler part that runs since I just have one node in the cluster. So it's a single word handler part that's running here and it's responsible for launching any VMs that get scheduled on that specific cluster node. I also have CDI which is for basically creating data volumes and associating those with those as disks for your VMs. Since that is already running. So you would see CDI parts for API server deployment operator and upload proxy running here. In terms of the other component that we have here, I have Luishi which is the open source plugin for configuring SRAV. So in that Luishi namespace you'll see a controller manager part running here. So that is something that will monitor any network plugins that you create. And if you create a network template with SRAV it will help you create virtual functions in an automated way without really having to do any manual intervention there. So on this host, I have created virtual functions before. So if you look at an interface and under the sys class net interface name device, there is a file called SRAV numvfs which will show you how many vfs have been created on this interface. So here since I'm using Luishi, I'll also show the YAML file that you would create for configuring your device plugin that we spoke about in the CNI section. The resource list here is what specifies resource prefix and resource name. So this is what we would need to use later on in the network attachment definitions specifying resource prefix in combination with the resource name to kind of decide which interfaces you want to be enabled with the SRAV device plugin. There are different formats in which you can specify the selector section. The one I have used here is just straightforward physical function names. Next here that are capable of SRAV, ANO3 and ANO4 and Luishi would basically help configure them with the VFI OPCI driver. So that's the driver I'm specifying here in the config map that's used by a device plugin. Once that is done, so this is the actual host network template that can be applied using Luishi. What it looks at in terms of the nodes that it can run this template on or rather configure this template on, you have the node selector. So any node in that cluster which has network SRAV capable set to true, it would configure SRAV on that only on those selector nodes. For each physical function, you can specify the number of VFs, the VF driver to bind using and the empty use size if you want to set that. So this is a sample template using which I had created virtual functions and that's why you see seven VFs that are present on this host. After that, that is created, you would basically apply the network attachment definition file and I'll show the contents of that. So the network attachment name here is SRAV network ENO which would be part of your VMI spec file. The thing to note here is basically that type SRAV line and optionally you can set a VLAN ID and for IPAM you could use whereabouts or any other plugin as desired. So I'm going to create a VM here. This is the VMI spec file. In the VMI spec queue, it's similar to how you would create a typical virtual machine instance object in queue form. I'll focus on the parts that are specific to SRAV here which is basically your network interfaces section in the domain spec. The first network being the part network and the second one being the SRAV type and the network section here which specifies the multis network name. And this is what matches your network attachment definition that we looked at earlier. In this case, I am just using a Fedora test image here. So this image is something that you can replace with your specific VNF application and put any associated cloud init script as part of the cloud init data here. So let's go ahead and create that VM. So VMI SRAV2 was already running on this cluster and VMI SRAV1 is the new VM that I just created. So let's give it a minute to start up and we can go ahead and connect to this IP address and see that we are able to reach the other VM there. So if you look at the network interfaces inside this VM, you'll notice ETH0 that's attached to the default part network and then there is ETH1 which is your SRAV interface in this case. And here this is again the IP of the VM that was already running but this is on the default part network. And with SRAV it doesn't really get an IP assigned to inside the VM, so you would need to configure that via a cloud init script. The IP that was associated using whereabouts will just get assigned to the part itself but not to the actual VM that's running. So that's something that needs to be done either using an external DHCP server or using the cloud init script here. So that's basically how you would create a VMI using SRAV, it's really straightforward, the only thing that changes here is how you create this network attachment definition. And once you have that, you basically just need to utilize the network name in your VMI spec, that's the only difference. Moving on to DPDK, so again I have a pre-created cluster here. Let me just explore the cube config and you'll see again that there's a single master node that's available here. It again has the exact same deployments here, there's cube word installed and CDI installed. So since Luishi doesn't support DPDK today, that's still some work in progress item for adding to Luishi. I'll just showcase the OBS level info in this case. So if you look at the OBS via CTL show output, you will see that there is an OBS bridge that's created. OBS bridge created here has two interfaces, two ports basically ENO2 and ENO4. Again, these are two net cards that are being utilized for DPDK traffic. In terms of configuring or adding the port, the option that's utilized is DPDK dev arcs to which you assign the PCI address. So since I'm utilizing the entire ENO2 and ENO4 interfaces here as DPDK ports and specifying their respective PCI addresses as part of the port config. So these are the two ports and you can also use OBS in a bonded mode if that's what you desire or you can add the ports directly here. So after the OBS bridge and port configuration is done, I'll show the network attachment definition and how it differs from SRRV. So we spoke about this earlier. The host and container sections are the main ones that differ for the user space type network attachment definition. The bridge name here again needs to match what you created an OBS. And the host side should be in a client mode and container side would be in a server mode here. Same thing, the IPAN section could be anything. Whereabouts is just an example here. The network name net one is what would be part of the VMI spec. And notice here, this is a virtual machine named test DPDK. It has two interfaces, first one being the Kubernetes default for network. And second one is an interface of type we host new user. For this DPDK network, you specify it using the multis network name config and put it as net one here. Again, I'm using a Fedora image, but this could be replaced with your VNF applications image and the corresponding cloud and script here. So just go ahead and create a VM using this. So for the SRRV one, we are utilizing a VMI, a virtual machine instance type. Okay, so once we have the VM running here, we should be able to SSH into it. So inside the VM and just running IPA command and here again you'll see two interfaces. Each zero is a default part network and since it was using masquerade mode. This will show you the internal IP address. That's not it and on the part level of you would see the IP address that's shown up here. The first one is your actual DPDK interface so when you run a VNF application here using each one as the interface for your data traffic, you should be able to get the performance benefits out of running the VNF with our OBS based DPDK. So that's kind of the only change again on the DPDK side. The first level config that was needed right and the only packages that you installed here would be OpenVSwitch and DPDK. So I just want to quickly show the packages that you would need for DPDK on this host since I had already configured it. So that's using the DPDK package itself, which is for the runtime and even for OpenVSwitch it needs a package from the OverTriple because that's the one that's compiled with DPDK enabled. So that's the only config that you would need here outside of what you configured with OBS VS CTL show. In driver CTL you would run the command to basically set an override for the VFI OPCI driver like we spoke about earlier. So that concludes the demo on running VNF applications using QPort VMs with both SRLB and DPDK. Thank you for listening.