 Hello, welcome to Virtual DevConf 2021. My name is Davide and I work for Red Hat Italy as a software engineer in the Networking Services team and I'm usually based in the Milano office. For reasons that you can easily imagine, I'm now doing this recording from home. Anyway, I expect to be in the office at least during the conference time and also some minutes after so that I can leverage on a better connection and a more silent environment to answer any potential questions that you might have. I'm presenting also on behalf of Paolo who works as a software engineer at Red Hat in the same team. He contributed a lot to this project and he is now following us from the hills of Monte Carlo in Tuscany. Today we will talk about Multipact TCP. Like you probably know, this protocol is being developed since one year in the upstream kernel and the feature set is gradually increasing. We are adding more stuff and hopefully we are bug fixing the things that are broken here and there. With MPTCP code propagating to the official kernel releases, the protocol has become ready for initial use on the latest Fedora versions. So today we are going to see some basic operation that we recorded on the common line of a Fedora server. Last but not least, we will be covering some performance testing and see how MPTCP figures compare with the TCP ones. Let's now have a first look at what are the current features in the protocol in upstream Linux and let's see what can be done in current Fedora. The very first implementation of MPTCP was merged in the networking sub-tree during the last devconf in Bernou and was officially tagged in kernel version 5.6. So in Fedora 32 it was already possible to see the first MPTCP handshake. Then more features like for example the support for active backup or sub-flow diagnostics can be found in Fedora 33. We plan to add some major features and I would like to mention the support for concurrent sub-flow transmission and support for user space path manager for the upcoming 5.11 release and for Fedora 34. All this code is totally open source and it results from the joint efforts of a community. So it's fundamental to say a big thank you to all contributors and also to you if you are willing to contribute in any way to this project. For the next month we foresee the development of new features and I summarized some of them here. You can read the full list of issues in the project's GitHub page. We plan to improve the usability of the stack by adding Python bindings for the MPTCP protocol number. Moreover, we plan to improve the user experience for those programs that need to call getsocoped or setsocoped. An MPTCP socket opens multiple TCP sockets in this kernel so we need some machinery to route the socoped to the correct sub-flow. Last but not least, the next Linux kernel will be able to send next netlink notifications to user space, for example when the kernel receives an add others packet. This will allow usage of user space path managers that can provide a richer control plane for the protocol. This timeline summarizes what happened in the last year. Many features are planned for the futures so my suggestion is if you are interested in what's next or if you want to get closer to developments just have a look at the project's GitHub pages or subscribe to the developer's mailing list. The bottom line is most of the main MPTCP v1 features are currently available in the upstream kernel. The project will soon shift the focus towards improved user experience, stability and performances. If you are starting your experience as an MPTCP alpha user right now, you can keep using the common networking. The kernel UAPI provides the same interface as the one used by TCP sockets and it provides a new value of protocol that can be used when requesting a stream oriented socket. This value instructs the kernel to negotiate MPTCP with the peer. If the negotiation is not successful, the kernel will silently fall back to plain TCP. You can use IP command from the IP route to package to configure subflows. We'll see later how to do that in details. And of course you might want to inspect the state of your sockets and look at the protocol elements inside the TCP header. For that, tools like NSTAT or SS will report counters and Wireshark TCP dump will be helpful to dissect the MPTCP suboptions in live track fit captures. And this is what you get on current Linux. Then before jumping to the test, let's take one minute to refresh a concept. How user space programs can use MPTCP on current Linux. It will be useful to understand what happens in the examples we are going to see. The easiest thing is, programs can support MPTCP natively, opening a socket and requesting the MPTCP protocol number to the kernel. In alternative, we can configure our system in a way to intercept the socket system call and re-map the protocol number corresponding to TCP in a way that the kernel receives a request for MPTCP. In other words, an application requests IP Proto TCP and somebody changes it to IP Proto MPTCP. In today's examples, we are going to use a system tab script that installs a K-Probe. In other examples, we will do lib call hijacking to modify the protocol requested in the user space. The final result is the same. A program like good old NCAT can run multiple TCP without the need of being patched and rebuilt. Please refer to the footnotes to retrieve the tools we used. Now, it's time to see MPTCP protocol in action on Fedora and do some smoke tests. For the next examples, I used a host running Fedora Rohide. This diagram briefly shows the topology we are using. The client runs in the main namespace and the server runs in a dedicated namespace. These two namespaces are connected through a virtual Ethernet pair and I added two different IPv4 addresses to each virtual Ethernet interface. In this way, the client can reach the server using four different layer 3 paths. We will see later how to inform the protocol stack that an additional path is available to reach a given TCP pair. I'm recording the tests using Askinema, but you can try these scenarios also on your own. You can find the script source code in my GitHub account. Now we have the topology ready and it's time to run the first test. It's the most basic one because all MPTCP traffic goes through a single TCP subflow. We are going to use NCUT both as server and client and use Systemtop to install a K-Probe that remaps the TCP protocol number into MPTCP. Finally, we install TCP dump to capture and dissect the traffic. Let's now see how it goes. Ready? Let's see the basic topology in action, first using TCP. Here we have the server namespace. With IP addresses assigned like in the diagram. We start TCP dump traffic sniffer in the background and then launch NCUT as a server in the background. Then launch NCUT as a client that connects to the server and says hello world. Ok, now let's look at what we captured. It's a classic TCP connection with 3-way handshake, data and fin. Now in the other pane, let's start the Systemtop script that changes the protocol number to convert TCP into MPTCP and repeat all the tests. Again, we check the server namespace and IP addresses, start again TCP dump in the background, same thing with the NCUT server and same thing with the NCUT client. Ok, now look at what we have in the trace. It's TCP again, but with MPTCP options, you can notice the MP capable in the 3-way handshake and DSS option in the other packets. Ok, now that we know how to run NCUT as an MPTCP application, let's see how a single socket can set up multiple TCP subflows. In this second example, we are going to use the IP command to configure the client's namespace with a number of subflows that is greater than one. Then, we will use IP to manually add the second endpoint in the client. When NCUT opens the MPTCP socket, the kernel connects with the server, just like it happened in the previous scenario. But then, after the first 3-way handshake is done, the client knows that there is a second endpoint and so it establishes a new TCP subflow that can be used for data. Our Fedora host is ready, so we can just start with the test. The goal of this example is to try MPTCP with multiple subflows in the same topology. We will be using NCUT again, and so we need to start the system tab script that converts TCP into MPTCP. After that, we start TCP dump in the background, like we did in the previous example. Then, we configure the same namespace to allow more than one TCP subflow for MPTCP sockets. And then, we check the value of those limits in the server. Now, it's time to do the same in the client namespace. In addition to the subflow limit, we add an MPTCP endpoint to start a new subflow using another client IP, right after the 3-way handshake has completed. And again, we check the MPTCP configuration in the client. Finally, like in the previous test, we start our NCUT server in the background. And the NCUT client says hello to the server three times. Ok, let's look at the packet trace. The top line is packet 5 of the capture that shows the 3-way handshake has completed. Indeed, it's the server hacking the first hello world. Right after, packets 6, 7 and 8 are the 3-way handshake of the additional subflow. Please note the presence of the MP Join option. In the remaining packets, the client is using the first subflow to send the remaining data and tear down the connection. In the previous example, the client autonomously started a second subflow because somebody statically configured a second endpoint. This is IP route default behavior and you get the same behavior by specifying the subflow keyword when adding an MPTCP endpoint. There is another possibility, at least for servers, and it is to create the same endpoint on the peer. This is done through the add data suboption and users can do this with IP route too, specifying the signal keyword. In the previous test, the client's kernel was afraid to send hello world packets using one of the two subflows arbitrarily, or the two subflows simultaneously. The MPTCP level sequence numbering ensures correct ordering before delivering data to the NCAT application in the server. Users can specify the backup keyword to create a backup endpoint. This will result in an MP Join subflow that is used to send traffic only when other known backups or flows are not available. Because they have been disconnected or just because they are non-responsive. In this third example, we will add a signal endpoint in the server and check how the client establishes the join subflow. After the first three-way handshake, the server advertises a new address to the client. And then the client uses this address to set up a new subflow with MP Join. We will use NSTAT to check the protocol-wide counter of received add-address packets in the client namespace, and see that packet with TCP dump right before the MP Join handshake starts. Let's see all that in action. In this example, we will configure a signal endpoint on the server. Also in this case, the system top script in the pane below is running to let NCAT client and server use MPTCP. And TCP dump is running the background to capture traffic and display it later. In the server namespace, we set the subflow limits as usual. And then we add the MPTCP endpoint using the server IP address and the signal keyword. And again, we show the current configuration. In the client namespace, we only need to configure the subflow limit and the number of add-address that are going to accept. Let's now start the server as usual and the client that says hello world three times. Let's use NSTAT to see the stats. And now let's terminate TCP dump and see the capture. After the first three-way handshake in packet 5, the server sends an add-add option to the client. That address is then used by the client to set up another subflow using the MP Join scene and completing the MP Join three-way handshake in packets 6, 7 and 8. For this last example, we are going to see how applications can benefit from use of MPTCP. We will try a native application in the server and live call hijacking in the client. And we will inspect the second state with SS. Then, since more than one subflow are established, we are going to show how the protocol can ensure reliable data delivery even after a network impairment. Let's try that live on our federal host. In this last demo, we will use a telnet server that has been patched to support MPTCP natively. So it doesn't need system top nor live call hijacking. Furthermore, we will be configuring the endpoint in the server namespace like we did in the last example. This time, the server's address will be advertised as signal and backup. Let's check the protocol configuration in the server namespace and then we can start the server application. We will be using live call hijacking for the client. So let's get clone the tool from Paolo's repository. Then we just need to set up limits in the client. And then we can start hijacking the original telnet client shipped with Fedora. And now you should see that popular ASCII movie starting. In the bottom pane, we can do some inspection. Let's first use SS to gather information on the MPTCP socket opened by the telnet client. We can use SS to display the TCP subflows. You can see that two subflows are currently opened in parallel. Using TCP dump, it's possible to observe that only the initial subflow is used for data. And that's not a surprise since we configured the other one as a backup. With the movie still playing, we can try to see how the protocol reacts to a bad network condition in the active subflow. For example, we can try to remove the IP address from the client interface. You can see how the movie still playing even though the initial address has disappeared from the client namespace. Trying to do a live capture with TCP dump, we can observe that the socket is now using the second subflow for data. So backup has become active. Looking again at the subflow information using SS, we still see both subflow established. But only the second one will be updated as counters. There is a small attention point for those willing to replicate the same tests on their Fedora host. First, the add-adder option specified in version 1 of the protocol is not dissected correctly by official Fedora TCP dump binary. So we had to use a more recent version compiled and put in a copper repository. Secondly, in order to test properly active backup operation, we need the die-recent kernel, very close to the latest 5.11 RC. There were some bugs in the active backup behavior that we discovered while adding the support for simultaneously active subflows. And those bugs have been fixing very, very recently. How does MPTCP compare with plain TCP performance-wise? The answer is as usual. It depends. It depends on what are your expectations. For the moment, we are going to focus on throughput only, gracefully ignoring other aspects like connections rate and scalability. When considering MPTCP bug transfer performances, there are two main scenarios. In the first one, the throughput is bound by link capacity. In such case, MPTCP will outperform TCP if multiple independent paths are available. For example, Wi-Fi and 4G link on a smartphone. And multiple non-backup subflows are being used simultaneously and land on different paths. When the throughput is bound by the available CPU cycles, for example a bug transfer on a very high speed link, TCP prevails as MPTCP has to perform some tasks twice, for example locking and sequence checks. Those tasks are done both at the subflow level and at the MPTCP socket level. Due to the above, a fair comparison is done testing MPTCP against TCP using a single subflow. Let's see in details how MPTCP compares to TCP in the second scenario, when throughput is limited by CPU power. This graph is what we obtain for net-perf string tests, using the lib-call hijacking approach to force net-perf using MPTCP. In the north-south scenario, the server runs in a local VM and the client runs in the hypervisor. In the east-west scenario, the client and the server run into different VMs on the same hypervisor. As you can observe, there is a 30% gap, which is slightly greater than in the second case. The main focus of MPTCP development has been features coverage so far. Since protocol implementation is almost complete, expect some relevant progress here in the near future. And that's all folks, at least for the moment. Enjoy Virtual DevCon 2021, thanks a lot for listening and cheers!