 Hello and welcome to Calico VPP, All You Can Eat Networking. My name is Casey and I'm a software engineer at Tigera, where I work as a core developer on Calico. I am really excited to be here today with Louise, who has been doing some awesome work in the Calico community, bringing compatibility with VPP networking. Hi everyone, thanks Casey. My name is Louise and I work as a software engineer for Cisco, more specifically in the VPP team. I'm super excited to talk about the work we've been doing with Calico, but before we dive into that, let's hear from Casey how Calico was designed and how it made this Calico VPP integration possible. So I'd like to start by giving a quick overview of Calico, how it works and some of the lower level design decisions that we as the Calico team have made to help enable some of this really awesome work that Louise has been doing. So to start, Calico is an open source networking and network policy provider. It can provide networking and network policy for Kubernetes pods, but also Kubernetes nodes, VMs, open stack and legacy workloads. Calico supports the native built-in Kubernetes network policy APIs as well as a rich extension set of APIs that are native to Calico. And Calico itself is battle tested. It's been deployed in production for years now and is really the most common choice in clusters that need to scale and where performance really matters. So for those of you who might not be familiar with Calico at a really high level, it works like this. Users specify high level descriptions of how they want their network to behave either via the Kubernetes API, Calico APIs, or both. And this is stuff like network policies, BGP configuration, etc. That configuration is in turn read by the Calico components which are running on every node in your cluster. And Calico then takes that and combines it with the locally running pods that it knows about and uses the result to set up each node with the correct network programming. Drilling down a bit further, here's what's happening on each Calico node. There are really two main components. There's the CNI plugin, which is called by the container runtime as part of setting up and tearing down networking for each pod. So this plugin gets called on pod add and pod delete and it's responsible for setting up the network namespace, programming routes, virtual ethernet devices, all the stuff that a pod needs in order to be able to communicate with its own local node. Calico node is the second main component of Calico and this is a long lived container that runs on every node, typically as a daemon set. And it makes routing and policy decisions. So this is really what's interpreting that configuration and making sure that the network data plane is programmed correctly. And within Calico node, there are two main subcomponents that are responsible for this. This Felix, which is a component that the Calico team wrote. And Felix is responsible for maintaining network policy state. And this bird, which is an open source networking stack, which is included in Calico and is used when BGPs are required to distribute routes through the network. Now obviously each of these components needs to be pretty tightly coupled with the underlying networking technology because they're each reading and writing state and interacting with that data plane. Now with a bit of understanding about what Calico is and roughly how it works, I wanted to talk about one of the core principles on the engineering team, follow when developing Calico. Specifically, that is to use the right tool for the job at hand. You know, technology and implementations are important, but the change over time. As engineers, it's really easy to lose sight of this sometimes. You lose the forest for the trees. And it's really easy to focus on the technology choices rather than what's actually really important, which is leveraging that technology to actually solve someone's problems. So in other words, we want to do with Calico is to make sure that the design of the software enables using the right technology to solve the various diverse problems that our users have. And there are a lot of ways that this mindset shows itself in Calico. We've got multiple built-in networking techniques using IPIP, VXLAN, unencapsulated BGP. We've got compatibility with a wide array of third-party C&I plugins, etc. But the main design decision that I wanted to talk about today is the separation of control plane functions and data plane functions. And specifically how this enables Calico to meet a variety of use cases leveraging different underlying data plane technologies. Conceptually, this is a pretty common pattern in networking software. Control plane is complex software that performs routing calculations and it needs to be kept separate from high-performance packet processing code. But for Calico, I really see a more important consequence of this pattern than just resource isolation. When we first built Calico, it only supported a single data plane. That is standard Linux networking. This is routing using Linux routes and filtering using IP tables. Still, at the time, we were pretty clear that we wanted to architect this code in such a way that if we needed to, we could extend it to support additional data planes in the future. This provides future-proofing and gives us flexibility to choose the best tool for the job and avoids us getting fixated on a single data plane technology. To do this, we pulled all the intelligence into a sub-component of Felix called the calculation graph. And then we built an internal API within Calico between this calculation graph and a swappable data plane driver component. If you're familiar with it, this is parallel to how the container runtime interface works in Kubernetes. It enables using a bunch of different container runtimes like Docker, Cryo, Container D while still providing a consistent feature set and user experience to the end user. So the data plane driver in Calico is designed to be as simple as possible. It just translates events emitted by the calculation graph into the right messages for the underlying data plane implementation. And it leaves the hard work that we don't want to duplicate, interpreting configuration, making decisions up to the calculation graph. While we started with just a single implementation, we've since been able to take advantage of this decision to extend support to a variety of technologies. This includes an EVPF-based data plane, a Windows H&S data plane, and of course what we're here to talk about today, which is a data plane built on VPP. And just for completeness, while those slides were really focused on how this works in Felix, the Calico CNI plugin also supports GRPC-based data plane drivers in addition to the default compiled in implementation. And this was a feature that was added by Louise as part of the VPP integration effort, which leads me to really the last and most important ingredient of Calico, which is its collaborative community. There's a lot of people who have made Calico happen, and it's through this community that we've been able to build a relationship with Louise and his team, who's now going to share with you how he's built upon this foundation to bring something really cool in the form of VPP support to Calico. Thanks KZ. So now that we know how Calico works and supports multiple data planes, let's see how we leverage this to add VPP as a data plane option for Calico. But first a few words about VPP. VPP is an open source software router under the Linux Foundation umbrella. It has many features from layer 2 to layer 4. It supports the Linux, ACL, but also transport protocols such as TCP, TLS and Qwik. It's built to be easily extensible thanks to a plugin architecture. It supports both virtual and physical interfaces and has a really fast API. But if there is one thing you must remember about VPP, it is that it's really highly optimized. It uses vector instructions in order to process multiple packets in a single instruction. It uses prefixes heavily in order to improve the data cache efficiency. And the packet processing is split into a graph of small elementary nodes that ensure that the instruction cache is also very efficient. You may wonder what user space networking exactly is. It's actually quite simple. It's just a regular process that does packet processing instead of, for instance, HTTP request processing. So it's packets in, packets out. There are many examples of user space networking applications that you've probably used. They include VPN clients like OpenVPN or proprietary VPN clients, DPDK-based applications as well and of course VPP. So there are many benefits to user space networking. The most important one to us is performance. When you have a user space network stack, you can tune it to do exactly what you need for your specific use case. You don't have to rely on a general purpose stack that have features that you may not need and that would be detrimental to performance. It's also simpler to develop and deploy because you don't have any dependency on your kernel so you can make changes without rebooting your machine. And it allows you to manage your own network stack just like any other software component. This is possible thanks to specific interface types provided by the Linux kernel that allow to retrieve packets in user space and drivers that allow to expose physical interfaces in user space as well. There has been a recent trend in the Linux kernel to increase its modularity. The most obvious example is eBPF which allows to inject code in the kernel and bypass parts of its network stack for instance. But there are also other examples such as AFXDP which allows to implement very fast and generic user space networking functions and also tune and tap interfaces which thanks to the implementation of your Taiobacans, Multi-Q and GSO allow to very efficiently exchange packets with the Linux kernel. So this modularization is very beneficial to user space networking. It allows to use the best tool for each job and it really opens many new options to make things more efficient. And thanks to these recent improvements in the Linux kernel it's now possible to leverage user space networking stacks to accelerate regular Linux applications. VPP was initially designed for environments where it had a fixed amount of resources and it could consume all of them. Of course that's not really well suited to container environments. Ideally you would want VPP's resource consumption to scale up and down with the actual load. So one thing we did in order to improve VPP's behavior in container environments is to switch from pole mode where VPP is constantly busy looping checking if new packets are available for processing to interrupt mode where VPP is actually notified by the network interfaces when packets are available. So this allows to reduce the CPU that is actually consumed by VPP. Regarding memory, VPP used to require huge pages to run. But this makes it more complex to deploy because you need to change the configuration of your host so we remove this requirement as well. We also made some improvements in VPP that allow it to better integrate with the Linux kernel. The most important one is the implementation of GSO and GRO. GSO and GRO actually allow VPP to exchange 64 kilobyte buffers with the Linux kernel instead of packet sized buffers. This greatly reduces the load on the Linux TCP stack and gives a very significant speed up on TCP connections. So for instance, when an application needs to send data on a TCP connection Linux will pass a 64 kilobyte buffer to VPP and VPP will segment it before sending it on the network. In the other direction, when receiving packets VPP will try to reassemble them and pass a buffer that is as big as possible to the Linux kernel. In addition to these improvements that make VPP play well with the kernel we also had to make a few improvements that were more specific to Kubernetes. Kubernetes has some specific requirements for the network. In particular, the services load balancing requires pretty specific net behavior. Calico also supports source netting outgoing connections so that your containers can reach external networks even if they have private IPs. So we developed a custom NAT plugin in VPP that is really tailored to this use case. Another point that is specific to Calico is that it proposes very rich policies. So again, we implemented a dedicated plugin for these policies in VPP. This allows VPP to implement the data plane API of Felix that Casey mentioned earlier. Using VPP as your data plane with Calico also brings many benefits to your operations. So since VPP is packaged as a regular container that means you can update it just like you would update any other applications. We made it so VPP can be upgraded and restarted with really minimal descriptions to the pods. So if you don't care about the traffic being lost for about maybe one or two seconds you can just restart VPP and all the pods will be able to communicate on the network normally as soon as VPP is back up. Of course you can also evict your pods from the host before doing that if that's better for your applications. So this is helpful for upgrading VPP if for instance there are fixes that you need to deploy or new features that you want to try out. Calico VPP also has very limited kernel dependencies. You basically only need to tap interfaces which any kernel that supports containerization supports. This is interesting in particular in environments where you do not control your kernel. So for instance in public clouds sometimes you don't have a choice of what kernel is used and that means even there you can just deploy the new version of Calico VPP and get the latest and greatest features of Calico. So let's see a bit more detail on how this Calico VPP integration works. So on the left hand side here we see the regular Calico network topology. Basically every pod is connected to the host by one viz interface and the host is responsible for all the networking functions. When we deploy Calico VPP VPP inserts itself between the host and the uplink interface. So the ownership of this uplink interface it restores connectivity to the host by creating a turn interface in the host and it also creates turn interfaces for all the pods. So now the networking responsibilities are split between the control plane running on the host. Cubelet for instance will be running on the host but also the Calico specific components such as the BGP daemon and Felix. VPP itself is responsible for all the data plane and that means of course routing the container traffic but also doing the services load balancing implementing the policies configured by Felix and so on. One specificity of the VPP network model is that turn interfaces are layer 3 interfaces whereas viz interfaces are layer 2 interfaces. What that means is that there is no harp going on between the pods and VPP or between the host and VPP. It's all pure layer 3. So now let's take a closer look at what happens to the application traffic when Calico VPP is running. The applications are completely unmodified. They use socket APIs to send and receive traffic to inform the kernel. The kernel in the pod network name space is then configured to send this traffic over the turn interface that is connected to VPP. The drawback of this architecture is that the data from the application is coming from user space then going to the kernel space and then back to user space in VPP. This is required in order to keep the applications unmodified but it's not the most efficient way. However, another advantage of this architecture is that the kernel still provides the isolation. One other possibility would be to have the applications leverage the VPP transport stack and send that traffic directly to VPP. This would be more efficient but it would also require modifying the application. It's something that we want to look at but we will likely support only a restricted set of applications. Finally, this is what you get when you deploy Calico VPP on a Kubernetes node. So in the daemon set that is running on every node you will get an additional container for VPP and in the Calico node container we add a component that we call the Calico VPP agent. So this component is communicating with all the regular Calico and Kubernetes components and programs VPP in order to do the routing, the service load balancing and the policies. It also handles the CNI function in order to create and delete the tune interfaces for the pods. So this is how Calico VPP works. Now if you are about the product status, so this is of course open source and GitHub in the Project Calico organization currently under alpha status and considered as a Calico incubation project. As of today we support most Calico features. The features that are not supported right now include host policies and some specific configuration features related to BGP. We have started running initial performance benchmarks on this data plane and you will see the results are quite promising. Without going into all the details we run our benchmarks on a bare metal testbed with two scale X servers connected by 140 technique. We used IPERF in order to run throughput tests and NGINX and WRK to simulate API servers and clients connecting to each other. The first test we run was an HTTP latency test so we basically have an NGINX HTTP server running on one node and a WRK client located outside the cluster. The WRK client is sending 4K HTTP requests as fast as it can to the server and we measure the number of requests per second that we are able to perform with that. And we also measure the CPU consumption during the test. This benchmark simulates an external client connecting to a service in the cluster. Here are the results that we got for this test. We see that VPP, Linux and EPPF are really close in terms of performance all around 350,000 requests per second. VPP is slightly higher at 370, but what's really noticeable is that the CPU consumption on the server is lower with VPP. Both Linux and EPPF are running at around 80% CPU usage. VPP is at 67. So this is quite encouraging because it means that in order to perform roughly the same amount of work we are saving quite a few CPU cycles. And this means that there are more cycles available for your application logic. Now, in addition to this HTTP request per second test we also run some TCP throughput tests. So in this case we use the two-mode cluster with encapsulation between the nodes. In this test we have one cluster IP service pointing to an IPF server pod and one IPF client pod, both pods being pinned to different nodes. And we test the TCP throughput that we can obtain with varying number of connections between the client and the server. So with one flow we can see already that EPPF and VPP are much faster than Linux twice as fast at almost 20 gigs per second while Linux is at 9. With two flows, EPPF takes the edge at 36 gigabits per second VPP is at 30 and Linux is scaling to 17. With four flows, VPP and Linux are catching up with EPPF and with eight flows, basically all the data planes are saturating the link. VPP has a really fast IPsec implementation. So one thing we wanted to measure is how the VPP encryption compared to the encryption provided by Linux and EPPF. So for this test we rerun the same tests as before but this time both Linux and EPPF were configured with wire guard encryption and VPP was configured with IPsec. So here are the results that we got. This test was really favorable to VPP. Linux and EPPF respectively reached 32,000 and 42,000 requests per second and VPP is at 250,000. VPP does consume more CPU in this test but when we compare the amount of work actually being done to the amount of CPU consumed, VPP actually consumes much less CPU per request. Now for the IPERF test, the encrypted IPERF performance is also really good with VPP. Basically the Linux and EPPF data planes seem to be bottlenecked by the wire guard implementation at around 2.5 gigabits per second while VPP is able to get 12 gigabits per second on one connection and 36 gigabits per second on eight connections which is basically link speed. One other aspect that we wanted to benchmark was how VPP behaved when the scale of the cluster increased. In order to do so, we designed a test where we configure many services in a cluster and we measure the time it actually takes to establish a TCP connection from one node to another. This allows to measure the behavior of the NAT code in the different data planes. We did that with a custom test client written in Go that sends HTTP requests at a concentrate and measures the connect and the request latency. This client is available on GitHub if you want to try it. This was required because WRK does not measure the connect latency, it actually measures only the request latency and the NAT implementation mostly impacts the connect time. Here is the median connect latency that we obtained for the different data planes. We see that as the number of services increases the EBPF and VPP data plane behave really well because the connect latency doesn't change a lot. It remains close to 250 microseconds. On the other hand, the Linux data plane which in this case is proxy with IP tables backend doesn't scale that well and with 100k services the latency is really huge at more than 6 milliseconds. That's it for the benchmarks that we have but there are many other things that we need to measure. For instance, we would also need to measure how VPP behaves when the number of backends of a service increases. In particular, we want to see the behavior of the system when there is a lot of pod churn meaning that backends will be added and removed from the different services continuously. We think that it's more representative of a real-life large cluster and we are very curious to see how VPP will compare to the other data planes in that case. In terms of features, there are also some things we would like to add. The first one is WireGuard support. We recently had a contribution for a WireGuard implementation in VPP and integrating this in Calico VPP will allow us to have better compatibility with regular Calico nodes. We also want to leverage the VPP telemetry infrastructure in order to expose more metrics about what's going on in the containers. And as I mentioned earlier, we also want to explore different ways to bring connectivity to the containers. We expect this to be much poor performance at the current turn interface, but as I said, they require application modification. However, one application that's highly likely to benefit from this is Envoy. Envoy is becoming more and more common and if we can accelerate it with VPP, we think that we could get a really interesting story there. And finally, of course, we would love to graduate from our incubation status to GA in Calico. That's it for this presentation. I would like to really thank both DfDI of VPP and the Calico communities for all the support they've brought to this project. If you're interested in trying out Calico VPP, you will find a link to our docs in the PDA version of the slides. We provide configurations that will allow you to deploy it on any cluster, but it can take a bit of tuning to get the best performance out of your system. So if that's something you're interested in, definitely reach out to us. We will very gladly help you with that. Thanks, Eloise. I'm really excited to see this project progressing as it makes its way towards GA. The last thing I wanted to say is really emphasize how much I like that this project fits with the Calico design philosophy that I was talking about earlier. I really see this as another tool that we can put in Calico's toolbox. And I think it's going to help enable a new segment of users to leverage what Calico and Kubernetes brings to the table. Additionally, as a maintainer of Calico, I'm really excited to see the community picking up such a big project and really driving this to completion. And really happy to see you taking advantage of some of the ways that we've architected the code to drive innovation in the way that you are.