 Hi, welcome to this talk low-trust edge network. No problem We're going to talk about using Calico VPP and wireguard together to get great performance and security for your Kubernetes clusters in your edge networks My name is Chris Tompkins. I'm a developer advocate at Tigeria I champion user needs to support the project Calico user and contributor community. I've worked in networking since around 2000 I realized that I've heard the device CLI is not a scalable solution and became interested in large-scale automation and infrastructure as code When I'm not working, I love reading films music tinkering with technology I sometimes fly radio controlled gliders, but more often I crash them Our other speaker today is Nathan Scripps app. He's a software engineer at Cisco and a Calico and VPP contributor Like me, he's into biking, hiking and being outdoors even seek hiking sometimes And has a French accent despite his name, which is interesting. We'll have to dig into that So we have a short talk today So we'll need to be brief in order to allow some time for questions Keep in mind that you can always learn a great deal about calico at project calico.org About vpp at fd.io and about wireguard at wireguard.com Instead of taking too much time to talk about each technology individually We'll try to focus on what can be achieved with all three With that said the project calico community develops and maintains calico It's an open-source networking and network security solution for containers virtual machines and host based workloads It supports a broad range of platforms including kubernetes open shift Maranti's kubernetes engine open stack and bare metal services Whichever data plane you choose to use calico offers blazing fast performance and true native cloud native scalability And it offers a consistent experience on a single node Or a multi-thousand node cluster public cloud or on-prem Project calico currently has more than 6,000 slack channel members More than 150 contributors and more than a million compute nodes are powered by calico every day Now each environments as we all know and we've heard about at this conference. They have some unique challenges One that we're very interested in today is physical security Because protecting the service themselves is tricky enough. Consider that you might have fire equipment testing in in the server room and that might be outside of the control of your normal security infrastructure But protecting the wires between them can be even trickier consider that If you have a copper cable run between rooms in a building that you could don't control Um inserting a simple network hub on that wire which would allow traffic interception Is a three-minute job cutting the cable crimping the cable and inserting the switch If you had a three-minute blip on your network, it could be that Especially with kubernetes failing over your workloads. It could be that Your support team don't even investigate that and never detect the intrusion So we need to protect the traffic on the wire Now as well as on the node now calico offers granular access controls built-in So protecting the workloads on the nodes and the host protection itself Of the calico kubernetes nodes We can use calico's built-in rich network and security policy model and we can use The full kubernetes network policy support that calico implements It works with the original reference implementation of kubernetes network policy, but also extends it and it has optimized performance to It's built to go faster with lower cpu consumption and help you get the best possible performance from your investments And that can be especially useful in edge clusters Adding on the vpp data plane gives you exceptional wire guard performance And which helps to encrypt and protect data on the wire without a large performance penalty as well as Quite a few other advanced features that you'll hear about from naphin in a moment, but i've jumped ahead First we need to talk about what a data plane actually is So here we can see three edge compute nodes vertically On the diagram Each one you can see has a control plane and a data plane component The control plane is the component of the mode which is responsible for figuring out what's going on in the network And establishing consensus of for example reading protocols It's typically implemented on a general purpose cpu. So even if even if you have a network node that has a sick functionality The control plane is usually implemented on the general purpose cpu The control plane manages complex device and network configuration and state For example, you might be familiar with protocols such as bgp os pf and isis Those are implemented in the control plane because they are necessarily sophisticated code The data plane is different. It's responsible for processing the transit traffic moving around all those cat videos So you might have cat videos. You might have whatever content is relevant to your business But that moves Ideally only through the data plane and this the control plane doesn't touch that That content The reason for that is that the data plane is designed to move around user traffic as quickly as possible It should be designed to be the simplest possible implementation of the required packet forwarding features It implements the fast path for the traffic So calico Uh offers multiple data planes Um, and it separates the control plane and data plane by doing that, uh, it achieves A lot of things now one of those is you get uh by keeping your data plane code minimal You can order it more easily. You can secure it more easily. You can bug fix it more easily It also allows you to keep a targeted data plane feature sets. You're not running code That is more complex than it needs to be It allows you to reuse the code in the control plane of of the node so that all that complex control plane code that you wrote can remain Relevant and doesn't need to be adjusted. So it's good for for auditing and so on there as well It keeps you future-proofed so you can adapt to changing future technologies and it keeps your platform agile Um with calico you can change data plane anytime you like Um, the data plane is the component of a networking device that transports user data So with that in mind calico offers three Data planes in ga today Um, they are linux ip tables, which is a heavily battle tested data plane It's the the original data plane we supported It offers a good performance and amazing compatibility and wide support. So for example, it doesn't have a high kernel requirements on the nodes Uh, we also offer a windows host networking service Data plane which allows you to deploy windows containers and secure them on any cloud computing provider or on premises And we have a linux ebpf data plane which Scales to higher throughput uses less cpu per gigabit It also reduces first packet latency services and preserves external client source IP address all the way to the pod It supports direct server return for better efficiency But as well as those three We have the vpp data plane now. We'll talk at the end of talk about where exactly that is in terms of production readiness But first of all, we'll we'll talk about what features it actually offers and where that sits And how it helps, um to to secure your edge workloads Now before we move on to talk about vpp, the other unique requirement for edge environments is that they're often heavily resource constrained We all know that in the cloud you can always just buy a bigger instance If that doesn't work, you can buy an even bigger instance until your product does work Um, you just have to hope no one complains about the cost So obviously that's not possible in edge networks It may be that you have a particular server set that you've been shipped out there Or or other constraints even heat power that prevent you from being able to spin up bigger machines So you need to be very aware of your performance and cost And uh, the vpp data plane helps us to achieve that with calico by being highly optimized for performance Um, and that's another way of saying uh low cost because essentially high performance on a low spec It's low cost Multi architecture so x86 and arm support and it doesn't require huge pages support from the kernel So that can be useful if your environment doesn't support it So we've talked about why um calico and vpp together are a great solution for um securing your data both in your edge cluster in flight and and on the host and in the container So I'll pass over to Nathan who's going to tell us a lot more detail about vpp itself Thanks chris. So first a few words about vpp. It has been presented in many talks So I won't spend too much time on it In short vpp is a user space network data plane, which is highly optimized both for packet processing and at the api level as well It relies on vectorization to provide a wide range of optimized l2 to l7 features From that to knows to dcp and quick it is also easily extensible for plugins Which is something we are leveraging a lot for the calico integration If you'd like to learn more don't hesitate to go on fd.io. There are plenty of resources available out there So how does it integrate as a calico data plane? When you deploy calico vpp on a community cluster, you will get one vpp instance deployed as a demand set on each node It does the routing of course, but it also implements community specific data plan features in vpp plugins Such as policies service vpp balancing Potentially outgoing traffic sourcing ipip our external tunnels and so on All this logic is done in dedicated plugins that are optimized for this use case We wanted to make it as easy as possible for users to configure just a switch to flip So you only need to pass an interface name that vpp will use as a link and a driver for consuming it We also configure in a in a friendly way for edge environments where our source constraints may be challenging So for instance, we use interrupt mode instead of pull mode so that we don't waste cpu cycles We leverage gso and gero to reduce the cpu load on the kernel and in many cases we support running over huge pages as well So now because vpp is a user space stack There are several things that will be different between vpp and the other data planes As you can see here, we insert vpp between the host and the network On startup vpp will grab the host network interface specified in the configuration It consumes it with an appropriate driver And it then restores the host connectivity by creating a turn interface in the host root network namespace It will replicate the original open configuration on that interface the addresses the routes so that things Be have similarly from the host endpoint Pods are connected just like the host with a turn interface in each of the pods namespaces The Calico control plane is running normally on the host and it configures the data plane functions directly in vpp Since we use turn interface and not vith We also don't need to worry about the layer 2 in the pods, which better matches the Kubernetes network model But now you might ask why do all this what option does it bring as a Calico data plane? So to list a few things we're aiming at addressing We are making exposing highly available services easier by using maglev as a load balancing algorithm for service ip We're also helping addressing specific user needs Like adding new networking features by leveraging the fact that vpp plugins are easy to customize At least way easier to maintain and deploy than canal modules And we're also helping optimizing network intensive applications like pn and proxies And finally the one that's in my opinion the most important for engine Edge environment and deployments. We are enabling fast internal encryption So as chris mentioned encrypting traffic is usually required at some level for compliance and security reason For example, you you want to be socked to or pc i dss compliant Or just because your hdc is on an entrusted network and you need connectivity to another remote dc And you have to go across the internet for that Having infrastructure in that context provide the encryption as really nice properties because it guarantees it regardless of application Evolutions and also maintenance gets easier because if patches have to be applied or security parameters like key size have to be bumped You only have to do it in the wrong place The main issue coming from this is performance as the default Linux implementation usually makes it quite impractical to use in production With calico vpp we expose optimized implementation Which allows this to work at land rate and with manageable cpu usage Thus enabling this to be used in production So we expose multiple ciphers and modes to address the different requirements application might have We expose my wear guard for compatibility with linux and epf nodes We expose ipsec for maximal cpu efficiency And we also have an asynchronous ipsec mode to distribute crypto operations to multiple cores So let's see how fast we can really go with this Hanoag and how it compares to other implementations So our test setup looks like this two Kubernetes nodes running on bare metal skyracks and connected with a 40g link We have vpp running on each node leveraging the available offloads Optimized interface drivers and targeting the different software accelerated crypto backends And traffic is generated by hyper from a single flow So when we when using wear guard, which is the one available on the free implementations The kernel lining epf reaches around 2.8 gigabits per second Vpp goes quite faster at 7.5 gigabits per second ipsec gives even better figures but only with vpp at 10 gigabits in synchronous mode An asynchronous ipsec gives 12.2 gigabits per second which shows really nice We also measured the global cpu consumption on both nodes during the tests and vpp doesn't consume significantly more cpu's rigid at throughput The only outlier is the cpu for ipsec in asynchronous mode Which we still run in pole mode due to pending issue, but we are working on solving it So this really allows enabling encryption across nodes and still keeping an acceptable throughput Even if you're running in a resource constrained environment like edge But the question remains can we do faster and seeing we still have a few minutes left you can guess I'll at least try to convince you that we can So in the previous schema, we still have an extra hub through the kernel So this is not predominantly about copies, which cost we often tend to overestimate and here we are already leveraging optimized vertigo in that sense But it's rather about the cost of syscalls and actually going through the Linux network stack So why do we have this hub? The way application usually consume packets is via socket apis So it's quite standard, but you have to go through the kernel a code path which wasn't designed for the performance levels of modern apps That's why we came up with gso actually as a network stack optimization But here as we have vpp running it would be nice to be able to somewhat seamlessly bypass the network stack and do l4 plus directly in vp And maybe also spare a few computers on the way So we can do this by providing two consumption model So if the application and those packets it can leverage memory interfaces the memif With either go memif, lememif, dpdk Or maybe even by running another vpp instance inside the pod directly If the application terminates l4 plus protocols It can leverage vpp's host stack with the libcl the vpp coms library All this is exposed with simple pod annotations And if you want to know more on how to use this there is actually an upcoming talk at qcon by chris and alias going into deeper detail on that matter So using this enables full user space tech tracking zero copy from the app to vpp While still being able to run regular services like dns through the socket api because you might not want to run dns over vcl So let's say we want to redo the previous test with hyperth, but without the socket apis So we'll have to go through the vcl and the setup now looks like this Actually, we switched to hyper3 due to implementation constraints, but the results are really similar We kept one flow. We kept the same test bed and disabled encryption We replaced it by ipip as a node to node transport as it is quite standard option for calico clusters And the results we get are like this So with calico vpp So with calico lenox plain lenox, we reached 13.5 gigabits on a single flow If we switch to calico vpp it bumps up to 14.5 gigabits per second And if we switch to to the vcl performance actually climbs up to 20.6 gigabits per second, which is really nice CPU usage is rather stable It actually comes down with the vcl as here vpp is in pole mode So we don't see work and we are moving the tcp work from lenox to vpp But now you may say tcp is great, but it's not cypered and we were talking about encryption and you're right But the vcl does provide a tls implementation So if we are if we go for using a night of free like tls client on server We can run a similar test but over tls And if we do this we reach up to 9.8 gigabytes of tls node to node over ipip in a calico vpp enable cost So even if you're running in an unencrypted cluster, you can still use the encryption speedup provided by vpp As a note, this vpp host stack integration is still working progress reason why we still use the pole mode here So we still need to stabilize the implementation and support interrupt mode, but it should be ready really soon Finally if your application is packet oriented, you might want to use a memf exposed by vpp In this case, our test setup will look like this. So we are using encryption again ipsec in this case We use trex as a traffic generator Test pmd as a reflector and we measure the number of packets coming back to trex. So doing the u-turn So this test is a bit different than the previous ones We still use a single five topple But the traffic here is full of like square ipf was mostly sending from client to server So if we send 1500 bytes by udp packets with this we received about 5.9 gb of traffic reflected by tspmd So that means we are able to receive the crypt, process, re-encrypt and forward 6 g of traffic on a single node with a single vpp worker So that means 12 gs of aggregated udp throughput over ipsec So with all this we can start targeting applications working at line rate over encrypted links Even in that cluster where efficient and secure processing is most needed So just a small word on the status of this work the calico vpp integration with ipsec and where our guard support are already available as in tech preview Vcl and memif support are still under development, but should be available really soon So that's it on my side and i'll let chris give a few words of inclusion. Thanks Thanks, Nathan. So vpp is a great match for edge use cases In summary, this is a new vpp based user space data plane option for calico It complements calico's workload protection with incredible wireguard performance to protect data in flights in edge environments It lets you stay ahead of the curve by offering advanced support for experimental features And it has low resource requirements ideal for resource constrained edge environments That's nearly it for this presentation We have a number of new exciting features on the horizon including maglev load balancing packet oriented interfaces And hopefully soon general availability in calico Currently the project is expected to move from tech preview to beta status in version 3.21 or 3.22 So if you'd like to stay up to date on this project, don't hesitate to join the vpp channel in the calico user slack We publish our releases there If you'd like to try it out head over to the calico documentation which has setup instructions If you have any questions at this any point Don't hesitate to ping us on the slack channel as well, or you can ask them right away Thanks very much for listening