 All right, good afternoon. We made it through the week. So today, we're going to talk about Kubernetes networking, but an infrastructure offload for Kubernetes networking. So we've assembled a panel of folks from different areas to talk about how do we even just define what it means to do an infrastructure offload, and how do we do that infrastructure offload in lots of different contexts? Are we doing an infrastructure offload in the public cloud? Are we doing it inside of a cluster that I built on premise? Or are there some combinations of those two that we want to be doing this offload in? And so we're going to be talking about infrastructure offload from lots of different perspectives. We have two presenters who weren't able to make it. And so midway through, we're going to show a short video of their presentation. But we will introduce each other as we go through. And I'm Dan Daley from Intel. And I work on software architecture. And I work on a new project called IPDK, which is now also part of the Linux Foundation and is part of a larger umbrella effort called Open Programmable Infrastructure. So Open Programmable Infrastructure uses that infrastructure word. It's really generic of a word. What do we mean in this context? When we're talking about infrastructure in this case, is that we are taking the business logic or the worker logic that's running. And we're going to provide certain services. Again, another overloaded word. We're going to provide certain functions to that worker node. And in the case of infrastructure, we're thinking of it as a baseline, as infrastructure, as a service. So I'm going to provide it a little bit of networking, a little bit of storage. Maybe there's a little bit of acceleration that we're providing. So we're providing these basic abstractions that you would get from a cloud. And we call all of that infrastructure, we group it together. And so when we want to talk about software-defined infrastructure, that's what we mean. It's software-defined networking, software-defined storage, all grouped together as one. And so in this space of programmable infrastructure, we're trying to define the dividing line between what is the business logic, what is the function running in your pods that is doing the real work at hand, versus what is the logic underneath it that is supporting it to secure it, connect it, these types of things. And why are we creating this new separation is because we think that this abstraction, which has already been really useful in infrastructure as a service, gives us a bunch of great features that we can then apply to Kubernetes networking. So the first one is the idea of having a gap for security reasons between your logic that's running and all of the other logic that's supporting it. We call this the air gap, but when this is implemented using hardware functions, then we can give hardware isolation to the work logic running inside a server. The second functionality that we're looking for is to be able to start moving tasks that may be using cycles, creating latency or jitter, just getting in the way of things that's currently running alongside your work logic and pushing it into the infrastructure. And initially to save cycles, but you start to get to the third and fourth value, which is once you've pushed that task out of the work, CPU, into the infrastructure, you can now re-implement it more efficiently. Maybe you implement it in really optimized software. Maybe you implement it in hardware, which is something that we're going to talk about today. And then the fourth value is once you can re-implement the infrastructure with a separate cadence, then the software being loaded on the work side, you can now add features to the infrastructure a lot more quickly. So maybe you want to change the way the networking policy works, or if you want to add a new construct around how your storage is accessed, or you want to add computational storage, combine the two of them, that feature velocity is enabled by that abstraction, because you've pushed all that extra stuff that isn't generating revenue, and you've given it to the infrastructure who can now go and innovate at a separate cadence. So that's why we've created this abstraction. And so now I'm going to pass it to Nabil, who's going to talk a little bit about the types of things that he's looking for as a customer looking to do infrastructure offload. Thank you, Dan. So we'll talk about what is IPUDP offload, and some goals and requirements that we're looking to achieve from that. So first, let's talk about what we mean by offload, from a networking point of view. So this is usually mainly a target through the offload of data plane processing functions from the host CPU to the hardware Nick. That's what the relational offload meant. And these NICs usually have been called smart NICs or performance NICs, and they went by that name. With the infrastructure data processing unit or data processing units, IPUs, DPUs, those are considered as evolution from the smart NICs and the performance NICs to include really a compute complex which composed of CPU and memory, in addition to ASIC FPGA usually for the fast back processing, depending on the implementation. The goal in here is usually to improve scale, so you could have dedicated memory that allow you to have larger fib, larger set of policies and so forth, to have better performance and support additional capabilities. Because now you have the compute complex, you could run control plane functions on the IPDPU and that CPU memory complex. You could support additional capabilities like storage networking that some of the IPDPUs are supporting. We could move load balancing functionalities and so forth to the IPDPU. As examples, if you're aware what's going on the market, there are multiple such NICs available today on the market from Intel, the Intel IPU that has been announced, I believe not long ago. The NVIDIA DPU and there is a Pansando AMD and this is only to mention examples, right, not to bias for these. So what are the objective in doing so? Obviously, the objective will be to improve performance and performance is really measured usually by latency and throughput that could be achieved on the host. Some applications also require very low jitter as well you could try to achieve that. The other one is improved security as then touched on is really provided via the isolation of the infrastructure from the host. So the host gets compromised, for example, that you're protected from that because you're running on the infrastructure processing unit such the IPDPU. And the other one is really to improve the overall efficiency from PowerPoint view and increasing compute density in the center that you'll have. And the way to achieve that is because you're freeing up what will be spent in CPU cycle and doing packed processing such as doing forwarding, policy enforcement, load balancing and so forth. You're saving those CPU cycles by offloading that functionality to the IPDPU and you're freeing up for application processing and that's what the CPU jump or the CPUs are really built for, it is the application processing. And the other one is really by doing so you're leveraging the IPDP for networking where the purpose built ASIC usually or fast data path is being implemented to do the packed processing is in the most efficient way. That's how you get to the efficiency. So what are the goals and requirements? Obviously in here, we've been talking about offloading the data plane processing functions to the IPDPU. So examples of that will be to have the state of the data planes such as FIB policies and so forth maintained on that on the IPDPU. You could have FIB as a state and the routing functionality or the forwarding functionality as action that's taken in the pipeline. Similar could be said on policies. You could have the policy states, meaning firewall rules that are maintained as state and then you could have the policy enforcement function done on the IPDU as an action and so forth. I don't wanna go through the list but just to give you a flavor, there are many things that could potentially be implemented on the IPDPU. The other goal is to really to have an industry standard abstraction layer for the hardware so that we could allow ease of integration between different solutions whether coming through open source projects and or proprietary solutions from vendors with third party hardware that are providing or hardware vendors hardware that are providing IPDPU solutions as well. And there is an open source initiative that's being targeted for that. From control plane aspect, there is a goal in here to really look at optimizing the control plane for both compute and networking to be fitting better to the IPDPU paradigm. And that has not been really considered before and I'll talk about a potential way to get that at a high level from architecture point of view. So what is this really to look at what is needed for the control path between the compute and network controllers to the host and then how to offload the control or distribute the control plane function between the host CPU that's on the server host and the IPDPU on the same server host and how to interrelated or coordinate the control plane between them. The target is also although in here in the Kubernetes conference in KubeCon, the target in here to be able to support the IPDPU for multiple compute endpoints that could include bare metal. So there are many applications that run on bare metal to support also compute endpoints that could be virtual machines and there are many such applications or to support really pause or container endpoints that could be running work nodes that could be either bare metals or VMs. That's kind of the scope of this. So I wanna go through this diagram. It could take a lot of time to go through that by itself but to illustrate what we mean by data plane function distribution as well as control plane function distribution. So overall function distribution between the IPDPU you could imagine a server or a host in general that has two main blocks that is the host the host CPU and its memory complex which is the upper part in blue and then you could have the IPDPU as another NIC card plugged into the same server host. Now that IPDPU usually will have its own CPU memory complex as we talked about and will have hardware acceleration for the data about function that could be implemented via ASIC and FPGA. That's kind of the setting of this. Now when we talk about again the distribution of functionalities the one path that could be looked at that today if you have an SDN controller or Kubernetes controller they'll interact with equivalent control functions if you will that are run on the host CPU. So offloading that to the IPDPU so that the control path could come from the controller let's say the network controller to an agent sitting on the IPDPU that will interact with the centralized control plane and via the offload function it will learn for example policies, routes and so forth that they're coming through it or from a local routing engine that could be running on the IPDPU and will go and program the data path which will be on the ASIC FPGA via different really adapters or drivers to do that. There are many functions that could be implemented via that offload whether in programming state as well as programming probably the pipeline as well as getting data from the fast path such as stats, flow logs and so forth that would be gathered for you there. The other function in here is that in coordinating with the host CPU that how does the offload engine that's running on the IPDPU will contain to splice the data path if you will with the containers that are running or the pods that are running on the host CPU. So there is interaction that has to go on between the offload control block that's shown in this diagram and the onload control block that's running on the CPU to really create the necessary interfaces whether a SROV interfaces or to do the programmability of the virtual ethernet port that could be related to the pod and so forth to create the data path that comes from the pod through the host on the host CPU through the IPDPU and back to the ASIC which will basically be interfacing with the physical next or throughout traffic between two pods on the same host CPU. So everything will be going via a trusted path which is related to the infrastructure represented by the IPDPU rather than being exercised if you will whether maintaining state like such as policies or enforcing that state for the policy on the host CPU will be done on the IPDPU. Now I'm not gonna delve into the control path for the compute but you could think about that also going through a trusted interface if you will meaning that it could come from centralized Kubernetes controller and what we call the Kubernetes master and it could go also through the IPDPU and through the agents I'll call it agents for now that's offload and onload you could trigger the creation if you will of the pods via the runtime engine on the host CPU. So the idea in here again to create isolation between the host where your workload is executing and the infrastructure where policy is routing and so forth is being enforced and do the coordination between the two. That's an example of what we've done at a high level and that's what we're looking really to start examining that and what the work that needs to be done in that space. Thank you. Thank you. Thank you Nabil for scoping requirements. My name is Valas, I work for Google and I'm going to talk briefly of how we meet these requirements or how far away we're from meeting them. So if you look today at public clouds and I'm mostly talking about Google is that public clouds today offload the PC networking to IPDPU and so we already have offloads, right? And if you look at the features that we offload today if you try to enumerate them it's usually routing, policy routing or some people call it service chaining, internal, external, load balancing, security policies, et cetera. If you then look at the set of features that Kubernetes networking implements it's we call them sometimes differently but they are very similar which is podageability, cluster IP, node external IP, network policies, observability. So there is a lot of overlap between them and so can we actually then offload these Kubernetes networking features perhaps to the same infrastructure we already today offload the VPC networking? Well actually we at least in Google we already offload some of the features. It's not very big set but some important features. So for example, two examples that I brought here is podageability and intranode observability or visibility. So for podageability we rely on VPC network offloads to be able to scale, to reach, to connect large number of pods and we can run up to 15,000 nodes of 200 pods per node and they're all going to be able to reach each other even during the VM migrations, et cetera. So it's all offloaded in VPC. For intranode visibility in GKE we have this feature where we can ponder traffic even the ones that's between pod. We could configure to ponder traffic to go through the hypervisor. And so it's not a Kubernetes feature per se but in terms of observability you can effectively get observability features that you get in VPC. You can essentially get visibility now what happens between your pods even if they're running on the same node. So that was the current state but GKE still relies heavily on onloaded implementation for most of Kubernetes networking. So we run Qproxy or data plane V2 and it's all running inside the node and using either IP tables or EBPF. So why do we do that? Well, time to market, familiarity, resource accounting is an interesting topic where essentially the users who are heavy network policy users they'll be essentially spending their cycles to implement the network policies. But there are very strong tailwinds to offload more. So the sort of set of reasons for that there's numerous as we discussed in a few previous slides but one couple of things that I'd like to highlight is the maturing offloads for VPC networking then it gives us time to pay attention to Kubernetes offloads. One thing that for me personally is the most important is this feature velocity and we get that feature velocity if we can transparently evolve implementation of Kubernetes networking features independently from the guest OS and guest stack that allows us to roll much faster. And then two somewhat related other reasons is a single data path implementation for doesn't matter which OS you run or there's Linux Windows or BSD containers and also an opportunity to offer Kubernetes features to stacks that currently bypass Linux Windows or BSD kernels such as DPDK, right? And also two lightweight sandboxes such as Devizor. And also offloading as we alluded to that brings significant efficiency gains and that's Moshe's part of the slide. If you know how to. So with that I'm gonna switch to Moshe Levy from Nidia who is going to present on infrastructure offload in using a real cluster of NVIDIA DPUs. And the sound is not working. Oh, I can hear it, it's just really low. In this part of the panel, we will review NVIDIA DPU solution in Kubernetes. Hello everyone, my name is Moshe Levy and I'm at Techlit at Nidia. In this part of the panel, we will review NVIDIA DPU solution in Kubernetes. This solution is based on open source project and it's already used in production at NVIDIA internal projects. In our solution, we use OVN Kubernetes CNI for the Kubernetes networking. OVN Kubernetes CNI uses OVN and OVN. OVN is a SDN virtual switch and OVN is a project that provides abstracted with logical switches and logical routers to create a network pipeline for the Kubernetes cluster. Let's review the OVN Kubernetes components. So we have OVN Kubernetes master running on the master node, watches on the Kubernetes resources, pods, service, network policies. On the right, we have the working node running OVN Kubernetes node, which is doing the native as plumbing to the pod and to the OVS. And we have OVN controller, which is translating the OVN logical topology to OVS open flow pipeline. Taking a closer look at the worker node with regular NIC, we can see that it uses VTH pair for the networking. The OVS OVN components are running at the worker node. In case of a high packet rate, we will see high CPU utilization and we are limited to a kernel performance. When we're adding DPU to the cluster, we are moving to an SRIOV switch them networking. SRIOV switch dev allows us to create native as on the worker node to give to the pod and the peer native as which we call a VAP represent or to plug to the OVS. Also, OVS OVN are no longer running at the worker node. They move to the DPU. So all the OVN OVS control plane is now at the DPU level. Also in case of high packet rate, we will see low performance utilization because all the packet processing is now in the DPU. Also from a security and isolation perspective, if the worker node is compromised and it cannot tamper with the networking because all the networking control plane is moved to the DPU and the DPU and the master are in the trust domain. And now a little bit on how we do DPU hardware acceleration. So we are leveraging SRIOV switch dev technology. OVS open source already support SRIOV switch dev and it knows how to program the e-switch when the first packet arriving to the OVS. OVS itself is using standard kernel API. There's nothing proprietary here which is a TC flower to program the e-switch. In case the hardware vendor doesn't know how to accelerate the packet, we will fall back to a kernel data path. This also allows us to reduce the CPU utilization on the ARM cores and because we're using SRIOV and switch dev, we will get low latency and line rate performance. Now we're in is a data performance that we run and experiment in NVIDIA lab. Here we are using XCR as a packet generator. We have the host details of the Intel CPUs. On the left side, we have OVS running Connectix 6 LX dual port 25 gig VTH. On the right side, we have OVS accelerated with Bluefield 2 dual port 25 gig running SRIOV. The workload is a test PMD pin to four CPUs and because we are doing port to port in this case, the data path is genuine and connection tracking is generating the 500K connections. So we can see on the left side, we are using 32 CPUs for OVS OVM and on the right side, we are using that because now it's all moved to the GPU. Also from a two port performance on the left side, we get eight gig when we're using VTH and on the right side, we get a line rate performance 50 gig with SRIOV. Also from latency perspective, we get four times better latency when using SRIOV. Now to summarize it, so with DPU we get uncompromised performance in case of NVIDIA DPU we can offload and accelerate all the Kubernetes flows port to port to service while reducing CPU utilization to bare minimum. From a security aspect, we have networking isolation. If the host is tampered, the solution is good for ports for VMs and for bare metals and DPU is a commodity project running a plain Linux so we can run additional services and storage security and others. Thank you very much. Yeah, it's not working. So next I will introduce a new projain who is going to talk about a software projects, open source software projects on doing Kubernetes offload. Really similar to- Hi, my name is Nupurjan and I'm part of Intel IPTK team. Previous presenters have already emphasized on the importance of infrastructure offload and other implementation solutions. Let me go over how we are doing it. To move from the traditional deployment model as shown on the left, to infrastructure offload model for various cloud deployment scenarios, we need dedicated interfaces to the parts. With this, we can offload L2 switching and free routing, service load balancing, which includes my name is Nupurjan and I'm part of Intel IPTK team. To move from the traditional deployment model as shown on the left, to infrastructure offload model for various cloud deployment scenarios, we need dedicated interfaces to the parts. With this, we can offload L2 switching and free routing, service load balancing, which includes CT and NAT functionality, encapsulation and encryption to our extensible data plane. We added a new P4 based lean data plane to Calico solution. P4 programmability makes the data plane extensible to future cloud use cases and provides for better visibility into flow treatment through counters and stats. Calico supports a broad range of platforms, including Kubernetes and OpenStack and others for deployment of parts and VMs. It also supports multiple data planes using EPPF and IP tables. Our data plane offers two ways forward for accelerated functionality for better performance and reduced latencies. A software-based data plane, which is implemented using DPTK and a hardware-based solution using IPU. Since control plane components are similar in both of these models and offer a loose coupling for communication using well-defined interfaces, the same control plane works for the ABAP2 and other Calico solutions. This helps with interoperability across data center nodes with a mix of accelerated and non-accelerated nodes. Here's the picture that goes a bit deeper into the architecture. The data plane drivers split into two components, agent and a manager, which communicate using GRPC. Agent can be further split up into node agent and intra-agent. While manager running on the infrastructure manages the resources, lifecycle of the components and offloads the runtime rules, the node agent receives the CNI calls and adds interfaces to the parts on the host. The infrastructure agent implements the standard REST APIs to watch Kubernetes resources like pods, services, namespace and handless events associated with them. It also handles the networking policies for pod traffic isolation and enhanced platform security. Biggest advantage of infrastructure offload is that it offers a secure environment for configuration of resources away from compute where pods are running. It also provides a secure access to other infrastructure provisioning pieces like storage. The scalability performance and reduced latencies while freeing up the host cores are the highlights. Because of this kernel backup, it provides for better feature velocity as well. Here's a running example deployment with our components. As you can see, we have impressed manager and intra-agent running at steam sets and together they are provisioning the DPDK pipeline with rules. Just to show a P for example of this implementation, we have service load balancing implemented using P for here. This implementation uses connection tracker and that functionality. The very first packet is looked up and it's used to pick one of the endpoints from the service web pool. Once the endpoint has been picked up, the flow is added to connection tracker and the subsequent packets up into the same endpoint and that is applied. To learn more about this implementation and how other sample examples have been implemented, please go to IPDK.io, get us into our off-load where we are working on the recipe and you'll see more additions to existing implementation. Thanks so much for your time today. Thank you, Newpert. And to go back to our presentation here. So learn more about what she just described and also there's a demo of the code. You can also download the code and run it yourself at ipdk.io. It's also part of the OPI project that I mentioned earlier which also has a bunch of code being developed to be able to have consistent interfaces for these different types of DPUs, IPUs and other acceleration devices. So in summary, we have heard from folks working in public cloud. We have heard from Mosho working on a DPU. We've heard from Nuper working on an IPU. And there's a lot of commonality in that the worker, CPU, the job that they're doing should not be changed or modified in any way. This is a function where we're pushing the infrastructure underneath an abstraction so that the end user doesn't need to worry about it. There are some things that are required in all of these different solutions. One thing that Nuper mentioned and we played that part twice is that the need to have a connection to each pod into the infrastructure. So this allows you to apply policy. When you've made that separation in order to be able to know which, where each pod is, the infrastructure needs to have a direct connection. And in the case of a DPU and an IPU, that can be a hardware secured connection that isolates all of the different pods from each other. And then the second piece that the common piece that's required is to be able to program that infrastructure. And in some cases, like in Moshe's example, we're able to program it through a new interface, like through OVN. And in other cases, like in Nuper's example, we can use existing Kubernetes APIs and copy that state from the worker CPU into the infrastructure to minimize the set of changes needed in order to take advantage of the soft load. And so there's different options and different implementations out there. We wanna have that commonality. We wanna be able to have commonality across the different ways that you can deploy this technology. And we absolutely wanna be able to allow people to choose different vendors and different implementations depending on what they're looking for. And so with that, I think we're ready for questions all the way across. If I understood well, the net, I have two or three questions. The network stack would be in the FPGA, would be programming the hardware? Yeah, so in a couple of different examples that we showed today, the data plane for the networking is being run in hardware. Can also be run in software. And then there is a, it's essentially a choice as to how much of the control plane you wanna move off of the worker node. And in the case of Moshe's example, the entire control plane was moved off of the worker noted onto the DPU. And the following question with that is what is the flexibility then to modify the network stack or to fix an error if there is a bug there. And also for functionalities like EVPF. So I would say that, these are very programmable devices and they've been designed to be able to understand the state of the art in terms of Kubernetes networking. And it's really similar, or maybe really similar to EVPF in that we're starting with something like an IP tables. And EVPF is an example of using optimized software, but it's keeping it on the worker. You could use this technology and run that functionality in the infrastructure. As EVPF, that would be a useful implementation of infrastructure offload as well. You wanna hit that one? One, two, three. So just to add, if you look at it in the, we're ready today offload in VPC networking. So it already works and we do roll it out often and we're able to fix it so it is doable, yes. One last question, we are running out of time. I don't know who was first. Who is the next? We only have one question more. We run, the rest of the people can go to the, to talk to the speakers. So what's the best use case when we should use SIOV, when we should use DPU and DPDK? So how do you compare the three technologies? So at least speaking from the Intel IPU, we are agnostic as to the type of interface you may wanna use, whether it's SIOV, VertIO, VETH, we're flexible. So we support all of those different types of interfaces and also supporting them at the same time if you have a primary and secondary network. So it's really a function of what you have existing and what you want your network to look like from the worker CPU side. I would say this is probably different optimization that you have to do. For example, if you're running on the whole CPU, you don't have to have the bandwidth for hardware acceleration, you don't have the security requirement. Then whether using EBPF as was said earlier as a way to implement probably the data path in a faster way than you would have done in IP tables, you use that. If DPDK would be needed, we'll still be do that. You could still do that. In addition, you could do either on the CPU complex that is on the IPU-DPU because there will be exception packets that will be arriving to IPU-DPU that you may need to process and what we call in networking the slow path. And that could be actually EBPF or DPDK or whatever, could be handling that also on the IPU-DPU, but in the slow path, if you will, meaning not in the ASIC, but on the ARM complex or whatever the compute complex could be on that. So it's really complementary technologies that exist that have different roles and whatever the environment that you have, you could suit or fit one or the other depending on what you need, sir. So. Thank you again for staying late and getting through our technical difficulties, but thank you again. Have a good day. Thank you.