 Hi, good afternoon. Thank you for coming to this presentation. My name is Leonardo Milleri and I work for Red Hat in the virtualization team. So today we are talking about building, containerize the workflow for... VVDPA. I'm gonna provide some background about the technologies, VDPA and thenl vsrednje, da je vseh zelo vseh, in tudi vseh, da je vseh vseh in vseh. Vseh zelo vseh zelo vseh, ki je zelo našel, zelo, da je zelo našel, zelo, da je zelo našel, zelo, da je zelo našel. in drugi vseh, da je zelo vseh. Srečo, zelo vseh sešil, na različenje. Vseh, nekaj prič, kako se, je vseh, da je zelo vseh, kako je prejzout, zelo vseh, zelo vseh, zelo vseh, zelo vseh, So, one of the most popular solution is SIOV. That stands for single root input-out-put-virtualization. So, this solution is dependent on the physical nick. So, this is why the VDP comes into play by providing containers and VM with, let's say, decoupling from the physical nick. So, accelerating means to forward packets as fast as we can from the container VM to the physical nick. Quick mention to Vertio. Vertio is a specification for interfaces, different type of interfaces for virtual machines, especially like networking storage. It also defines the layout of the device and the interaction of the device with its drivers. Some of the things relating to the interaction or the feature negotiation between the device and the driver in order to establish, for instance, verticals that are able to let you to exchange data between the host and the guest and transport parameters like PCI. Now, okay, what is VDP? VDP stands for Vertio Data Path Acceleration. So, the basic idea is to take the VertioNet interface and push it directly to the physical nick. There are two main aspects of Vertio. One is the data plane that follows the Vertio specification. So, it is standard and the control plane is vendor-specific. So, for this reason, it is translated by a shim layer called the VDP framework that is able to convert it to a generic control plane. Okay, just a quick mention to the VDP framework. I'm not an expert of this, but just for completeness. So, there are, from top to bottom, for instance, containers and VMs. The container is provided a VertioNet interface that goes directly to the current subsystem and the VM is instead provided a character device that goes down to the V host subsystem. Then, in the middle of the picture, we have some components, the main components of the VDP framework like the VertioVDPA bus drivers and the VOSVDPA bus driver. And the VDP, the rest of that is the abstraction for the physical device. Okay, in the bottom part of the picture we have the hardware blocks. So, this is a compliant VDP card. As said, we have a VertioDataPath, standard one and a vendor-specific control path. Now, I'm going to talk about the work that has been done for integrating in Kubernetes and OpenShift. This is for introducing how a worker node is deployed in terms of main components and the OVS hardware offload. So, from top to bottom we have the OVN controller that is complementing the capabilities of OVS and providing virtual network abstractions such as Layer 2 overlays. And down below we have OVS that is a multilayer virtual switch for forwarding packets from between the pods and from the pods to the physical network. And then in the hardware blocks the physical neck is configured typically in switch-dev mode for the melanox, for NVIDIA cards, for instance. So, this is a software abstraction that provides open and standard the Linux interfaces that can be used by applications on top of it. And those, so they are called port representatives, P0, P1, P2 in the picture and they are connected to the OVS bridge. Also, with technologies like SRV the virtual neck is partitioned into different virtual necks that are called VFs. And in the end, when we set up the virtual data path we are having a one-to-one connection between the container or VM directly to each VF. So, how does it work in terms of packet processing? Let's say, when the first packet is received by the... OK, it is handled by OVS software so this is called the slow path but then any subsequent packets are matching a flow that is installed directly in the virtual... in the neck card by OVS and TC flower. So, this is the fast path that is actually improving a lot the performances of OVS and also making sure we don't have a high CPU load on the host. OK, that was just an introduction of the hardware offload. Yeah. Now, I'm gonna talk about some Kubernetes internals which are the components that are the most important components. We have a kubilator that is not the node agent and the SIOV operator is sort of software extension to Kubernetes that is... So, it is making use of custom resources called CRDs and those are used for managing application components in Kubernetes. So, the main... the important steps, the operations that the SIOV operator does are mainly to configure the neck in switchdev mode and then to create the devfs with SIOV also some VTPA devices will be created on top of each VF so there will be one-to-one relationship between the VTPA device and the VF. Also, the operator will configure the required drivers in the current space like VTPA drivers and vendor-specific drivers like the Melanox 5 driver and finally in the right hand of the picture we have this manifest that is generated by the operator that is used by the device plugin that we'll see in a moment. So, yeah, this is... What happened? Oh yeah, I must have... Okay, this is the second slide of the workflow. We heard the device plugin that is responsible for discovering the VTPA devices and to advertise them to Kubernetes. So, let's say you can define some resource pools for instance, you would... you can say, okay, VF from 0 to 3 is pool 1 and from 4 to 7 so you can arrange resources into resource pool and then when you create your first pod the pod has to reference this resource pool in order to... for the device plugin to allocate the VTPA device in phase of photo creation. Okay, the final picture here we... introduce in CNI that stands for Container Network Interface it's a specification in libraries for writing network plugins that are then responsible for configuring the network interfaces on the pods. One of them is Maltus that is sort of a meta plugin that is able to invoke different other CNI plugins underneath and its main purpose is to actually create attach multiple interfaces to the same pod because normally Kubernetes doesn't allow you to have more than one network interface apart from the loopback. Over Kubernetes is the other CNI here that is delegated from Maltus CNI so, okay, coming back to the workflow now we have this network attachment component object that is for defining your network object and there we have a mapping between the resource pool and the... dexinated CNI plugin in the specific case of Kubernetes. Okay, so when we create the pod we have to specify which is the network we are gonna use defined by the network attachment then Maltus CNI would delegate to take the VDPA device and move it inside the pod name space and the other thing would be to take the port representor and add it to the obvious page. Okay, so by the end of this complex interaction between Kubernetes components we are having a standard VDP interfaces created in the pod that is the ether and zero in the picture. Okay, we are gonna have a demo so this is just for introducing the setup we have two servers, bare metal servers in one we are running three control play nodes in virtual machine so it is actually a hybrid cluster because the working node is running instead in bare metal directly and the two machines are connected back to back with an invidia dual port nick the first port is be used by the default cluster network and the second port on the left is be used for the VDP demo. Okay, here there is a link to the demo Okay, yeah, it's going on. So I'm creating two pods on the same machine on the same worker node and demonstrating that a camping between the two pods using VDP Of course we could have tried out using multiple workers but just for simplicity I had just this setup First of all, we check the state of the cluster and we create a machine configuration pool and then we will make sure the worker node would join this pool. This is done by using labels we add this mcp label to the pool, to the worker node together with another label that is the SIOV capable node that is used by the operator in order to select the proper worker nodes then we create the SIOV network pool configuration and this commander will actually reboot the node and will configure the OVS hardware offloading on the worker node. Probably something I haven't mentioned, this is the SIOV policy so the operator, the cluster administrator instruct the server operator with this policy so we are telling the name of the policy the node selektor is used for selecting the proper worker nodes we want to use for the configuration we specify the resource pool and the number of vehicle functions we want to create for this purpose in this case is 2 the NIC selektor is used for filtering out depending on some parameters like the device ID or the vendor, the multiple way of filtering the NIC devices and finally we set up the switch death mode on the NIC card and we select the virtual VDPA interfaces to be created just a check that we are having no VFs before running the policy so we can see there are no VFs created yet as soon as we create the policy the node will reboot again check again the state of the cluster after the reboot all the nodes are in the ready state and now we should have 2 VFs created VF0 in the VF1 and also 2 VDPA devices created on top of them the NIC card is in switch death mode and with hardware offload enabled if we look at the interfaces that are being created in the worker node we have basically 2 port representors V0 and V1 that are connected to the obvious bridge and the driver is actually a melanox driver then we have 2 VDPA interfaces that are V2 and V3 and as you can see the driver is Veltajonet, so a standard driver we created the network attachment definition here as you can see there is the binding between the resource pool and the OVN Kubernetes CNI and of course the name of the network that is OVN Kubernetes Lossier it's time to create the 2 pods we are creating them in the VDPA name space so it is just enough to put the network we want to use in the pod manifest and automatically the pods will get the VDPA interface created here there are few checks about the IP address and as you can see the driver is the standard VDPA driver and now we are ready to test the connectivity between the 2 pods, we ping from pod 2 to pod 1 it is working as expected so yeah, this is the end of the demo we have successfully demonstrated how to ping between 2 containers using VDPA and we can go back to the slides I hope, maybe this one yes, okay quick reference to the current status of art so we have implemented this in the primary interface in the pod interface using OVN Kubernetes if you are interested in the source code of the community we have a bunch of repositories the network operator, device plugin Kubernetes and GoVDPA, that is a Go library and next step in the development are to support the secondary interface on the pod, this is for enabling some other use cases like dpdk applications for having user space packet processing, continuity virtualization for running VM workloads alongside container workloads and providing also accelerated standard interfaces to confidential computing okay, this is the end of my presentation I think we can take, we have time for taking some questions if you have any okay, yeah, the question is but okay, so we are having here so the vendors are implementing the data plane in Vertaio but they are not doing the same for the console plane, so you are asking which is the reason so the reason, okay we have some vendors that are that now have implemented VDPA data plane, like NVIDIA think Intel, Pensando so there are a bunch of vendors but it seems that the control plane the full Vertaio offloading so basically implementing the control plane with Vertaio seems to be more difficult for the vendors so this VDPA is sort of helping out the vendors in order to give them more time, so it is, let's say, covering up their need and simplify their life for this transition. Any other questions? Yes, because, okay, so the question was with the network touch and definition we are using the default network the question is if we are overwriting ah, yeah, okay so you might expect to have two interfaces on the pod because we are using network touch and definition but as I said this is the first step in the implementation so we are actually using the default interface, the primary interface for this implementation of course suboptimal so the next step as I mentioned would be to create instead a secondary interface and then use V host VDPA instead, that can be also beneficial for other use cases like QBvert, I think and other things okay, the question is if we are looking already at VN Kubernetes since the feature of support of the second interface has already landed into OpenShift and yes, the answer is yes because so we are now taking the first step investigating which are the missing bit for the solution and yeah, I think that is the plan, yes without multus so the question is we are using we are configuring the primary interface but we are making use of multus, so the question is why well, I think we could have avoided probably to use multus but it was convenient let's say for this implementation because multus is brought in by default by OpenShift it is a convenient way for doing this, of course will be more useful in the future with the secondary interface in that case we can't avoid to have multus for this purpose since we reached the end thank you