 Hello, everyone. My name is Anurag Saxena. I'm currently working as a principal engineer in core OS networking team at Microsoft. My co-presenter is Poonagade Hussur who is principal group engineering manager in core OS networking organization at Microsoft. This talk is to demonstrate how Linux-based EVPF programs can be run using EVPF of Windows. The agenda for this talk is we will go over what are the goals for this presentation. We will then look at an overview of Cilium L4LB on Linux. We will then have an overview of Cilium L4LB using EVPF of Windows. We will then go through the demo topologies and then see a live demo of Cilium load balancer running on Windows. As many of you may know, at Microsoft we are currently working on EVPF of Windows project which enables running EVPF programs on top of Windows. A fundamental goal of this effort is to meet the developers where they are and because of this it is important to ensure and demonstrate portability of EVPF-based solutions which are in production today from Linux to Windows and to be able to achieve this with minimum possible changes. Today to demonstrate this we are using standalone Cilium L4 load balancer solution. Cilium L4LB solution uses an XTP hook to implement load balancing and it supports both source NAT and DSR modes. To give a quick overview of Linux-based Cilium L4LB solution it uses an XTP hook with XTP-TX action to implement the LB functionality. XTP-TX is one of the actions supported by XTP hook which basically happens the packet out of the neck on which it arrived. The Cilium XTP program also uses some helper functions such as XTP adjust head, adjust tail. It also uses map lookup and update helper functions and it uses tail calls. Apart from the EVPF program which implements the core LB functionality, Cilium solution also consists of a user mode CLI and an agent which compiles loads and attaches the EVPF program in Linux. This agent and CLI is also responsible for taking service details as input and configuring the required front-end back-end details in the EVPF maps. For load balancing the traffic between back-end Cilium uses Maglev hashing algorithm. Maglev is a consistent hashing algorithm which ensures that if one of the LB nodes goes down the other LB nodes will also choose the same back-end for that five table traffic. For demonstrating Cilium L4LB on Windows, we have a Cilium agent and CLI inspired user mode demo agent written. This demo agent performs the same tasks of compiling and installing the Cilium XTP EVPF program taking an LB service instance as input and configuring the required EVPF maps for load balancing to work. The Cilium EVPF program used in Windows is almost identical to the program used on Linux. 95 to 96 percent of the code is common to both Linux and Windows with a few changes like changing map definitions to match the ones on Windows, removing call to fib lookup help function and commenting out some statistics and diagnostics functionality. Also, for the demo we have disabled some optional functionalities like source range check, ICMP errors. This slide shows the architecture for Cilium L4LB using EVPF for Windows to start with. We have the EVPF for Windows framework already installed on the LB node and EVPF service execution context and the net EVPF extension which implements the XTP hook are already running on the system. Now, when a Cilium agent is initialized, a mode is provided to it as input SNAT or DSR. This input is needed as the XTP program needs to be compiled for a specific mode. The agent also gets the interface name as an input. Once the agent gets these as input, it fetches the interface properties then compiles the EVPF program using the interface properties and the mode it was provided and then verifies, loads and attaches the program to the XTP hook. As part of loading the program, the required maps are also created in the kernel. Now that the XTP program is loaded and attached, the agent is now ready to take input for any LB service instance. Now, when a service is configured, the agent gets a frontend and backend configuration. Frontend configuration consists of the frontend IP and port and backend configuration consists of a list of backend IPs and ports. Once the agent gets this information, it computes a maglev hash for the backends in the service and updates it in the maglev map. It also updates the backend maps, service maps and the reverse NAT maps. Now the control path is set up for the load balancing to work for the configured service. Moving to the data path, so when an incoming packet arrives on the NIC, the XTP extension invokes the Cilium XTP program. The XTP program then inspects the incoming packet, consults the various maps which are programmed by the user mode agent to choose a backend node and then depending on whether SNAT or DSR is configured, either source gnats the packet or does an IP and IPN cap. The packet is then returned to the extension with XTP TX action, which means the packet is needed to be hipened on the arriving NIC. This slide shows the topology for the demo set up. For demo, we are using five VMs. One of the VMs is the load balancer VM. It is a Windows server and has Cilium XTP program running on it. The other four machines are backend nodes. Two of the backend nodes are Windows VMs and the other two are open to Linux VMs. All the five VMs are connected via an internal switch and are on the same network. The IPs that we are using here are 2111 for the LB node and 21110 to 21140 for the backend nodes. The VIP used in this demo is 4111, which is not in the same network as the other IPs. Since this is an internal switch, the host will also be connected to the same switch and will act as a client for sending requests to the LB node. All the backend servers, the four backend servers are running a web server listening on port 80. So for SNAT scenario, this is how the packet flow looks like. The browser sends a request using its source IP 21100, some source port X, and sends the request to 4111 source port, destination port 80. Once the LB node gets this packet, it chooses one of the backend to send the request to. And then in case of a SNAT scenario, it source SNATs the packet and forwards the request to the chosen backend. So the packet on the wire looks like with source IP 2111, which is the IP of the LB node. Once the backend has processed the packet, it sends the response back to the LB node. Once the LB node receives the packet, it does a reverse SNAT of the packet and sends the response back to the browser. This slide shows the topology for the DSR, which is almost similar to the one in SNAT, but with a few differences. Since in DSR, the packet is IP IP and capped on the LB node, the backend node will receive an IP and IP packet, and it will need to be decapped. So we need some EVPF program on backend to decap those packets. So for this demo, in Windows backend servers, we are using an XTP EVPF program, which will decap those IP and IP packets. In case of the Linux backend servers, we are using a TC-based EVPF program to decap those packets. This is how the packet flow looks in case of a DSR scenario. As earlier, the browser sends a request packet to the LB node using its source IP 21100 and some source port X destined to 4111 source destination port 80. Once the LB node receives the packet, it chooses one of the backend nodes and then does an IP and IP end cap and forwards the request to the chosen backend. So the packet on the wire looks like having an outer IP header with source IP 2111, which is the IP of the LB node and destination 21140, which is the chosen backend. The inner IP header is the original IP header, which was sent from the browser. Once the backend node receives this packet, it will do an IP and IP decap, then process the request. Once it has processed the request, it will directly send the response back to the browser without the packet going through the LB node. So this is how DSR works in action. Moving to the actual demo, what we see here on the right side is the LB node, where we will have the Cilium L4 load balancer solution running. Currently, it has the EVPF for Windows framework already installed and running. What we see on the left side is a web browser, which is running on the host, and this will act as a client for sending a request to the LB node. So as a first step, I'm going to run the Cilium-based agent. This Cilium-based agent is what I had mentioned earlier. This is inspired from the Cilium CLI and user mode demon. We provide a mode, which is SNAT right now and the name of the interface to it. Once this agent receives this information, it queries the interface properties, compiles the EVPF program, then runs the program through the verifier. Once the program has been verified, it will load and attach the program to the XTP hook. Once that is done, it presents us with a CLI, which we can now use to configure an LB service instance. So the next command that I'm going to run is this service update command. If you see the details here, I'm configuring the frontend information as 4111 and port 80. And in the backends, there's a list of four backend nodes, 4110, 2140. Once I run this command, the Cilium-based agent configures all the required maps, and now we can test the load balancing. So if we move to the browser on the left side and try to access this website, we can see if the load balancing is working or not. When I click this, what we see here is we are connected to one of the backend nodes with IP 21130, which is a Linux server one. Now if I run a command here, what we can see here is this is showing me what are the active connections right now. One thing to note about this website is that this is refreshing every three seconds. So it is displaying the date, time, the backend server, IP, the name, and it is refreshing every three seconds. So coming back to the dump state command, what we see here is the source IP is 21100. Our destination is 4111, which is the VIP. This is the source port being used by the browser right now, and the backend IP. So this shows what backend this connection is landing on, which matches with here. What we see here is TX and RX bytes. Since this web page is refreshing every three seconds, if I run this command again, what we can see that the TX and RX bytes are increasing compared to the previous one. Now if I try to do a force refresh of this web browser, it will force the browser to use a different source port, and possibly we will hit a different backend server. So I did a force refresh, and what we see here is now we are connected to Linux Server 2, which was back in IP 21140. So if I run the service dump state command again, what we can see here is now it is showing me two connections. One of them is closing, which is the previous one 21130, which was the previous active. Now it is in a closing state. And now the new active is 21140. If I run this command again, what we can see here is that the TX and RX bytes are also increasing since this page is refreshing every three seconds. Now if I force refresh this web page a couple of times to see if I am able to hit all the four backend nodes. So right now I am connected to Linux Server 2. Now I force refresh this is Linux Server 1. Now Windows Server 1 and Windows Server 2. So we have kind of hit all the four backends. If I show the dump state, we see five connections here. One of them is active, which is the latest one 21120. Rest all are being shown in the closing state. If we quickly move to Netmon and try to do a packet capture, it should be sufficient. So if you look at the first packet, it shows source IP is 21100 and destination is 4111. So this is the original request packet coming from the web browser to the LB node. If we look at the second packet, its source IP has now become 2111 and the destination is 21120. So this is the source-netted packet from LB node to the backend node with source IP 2111, which is the IP of the LB node and destination is 2120, which is the IP of the backend node. If we look at the third packet, this is the response packet coming back from the backend node to the LB node. And if we look at the fourth packet, this is a reverse-netted packet whose source IP has now become 4111, which is the VIP and the destination has become 21100, which is the IP of the client. So what we saw right now was the DSR mode. Now if we want, we can change the mode to DSR. So this command updates the mode to DSR. What this command does is it recompense the EBPF program for DSR mode, then it again runs the program through the verifier, reloads and reattaches the program to XTP hook, and then reconfigures the service that we had earlier configured. Now if we stop this and try to refresh this web page, what we see is that we have connected to Linux server 1. So for the duration when we were switching the mode from SNATO DSR, since this web page was refreshing, we are seeing two closing connections here, but they should go away in some time. But the latest active one is 21130, which is what is showing here. If I try to force refresh this web page one more time, now we are connected to Windows server 1. And if I try to dump state again, so now the previous active connection is now moved to closing state, and the new active is 21110, which we can see here. One thing to note here is that in case of DSR, the TX and RX bytes are shown as zero. This is because in case of DSR, the written packets are never seen on the LV node. So the Celium XTP solution never tracks the TX and RX bytes when it's running in a DSR mode. If we try to now refresh this web page a couple of times, force refresh it to see if we are able to hit all the four backends. So right now we are connected to Windows server 1. Now we Linux server 1, Windows server 2, and Linux server 2. So we hit all the four backends. If we see here, there would be many closing connections because I because I force refresh it multiple times, but there is only one active connection, which is the last one to 21140. If we quickly move to Netmone and try to do a packet capture one more time, and if you look at the first packet, so this is the packet coming from 21100 destined to 4111, which is the original request packet coming from the browser to the LV node. If we look at the second packet, it's actually an IP and IP packet going from LV node to the backend node. So the outer IP header has source IP 2111, which is the IP of the LV node, and the destination is 21140, which is the IP of the backend node. If the inner IP header has the original source and destination, if we look at the third packet, it's actually after three seconds, and it again contains the source IPS 21100. So this is actually the second request packet coming after three seconds. So we do not see any response packet coming back to the LV node. So this is DSR working in action here. So this is the end of the demo and the presentation. If you have any questions, please let us know. And as we are working on the EPP of a Windows project, we are also currently hiring. So if anyone is interested, please feel free to reach out to us. Thank you.