 All right, we can kick things off now. So welcome, everyone. Next up, we have Tyler Sheehan talking about roundtrip latency. Take care, Tyler. OK, one second. Let me just share my window real quick. So hello, everyone, and welcome to roundtrip latency DPK versus non-DPK. My name is Tyler Sheehan. OK, that's supposed to be second. OK, there we go. First, a little background. I'm a master's student in the 4 Plus 1 program at University of Massachusetts Lowell, majoring in computer engineering and graduating this fall. I spent two summers intern at Red Hat, the first in the summer of 2020, as Steph Storch intern, and the second in the summer of 2021, as an office of the CTO intern. This research was done during spring 2021 at University of Massachusetts Lowell and funded by Red Hat. Roundtrip latency is the time it takes for a packet to go from point A to point B and back to point A. DPK is a set of special libraries and drivers that make it possible for packets to bypass the kernel and go straight from the application to the network controller. As you can see from this picture, on the left we have the typical application architecture where a packet goes from the application in the user space to network driver in the kernel space, then finally to network controller in the network hardware space. On the right we have an application using DPDK where the packet goes from the application to the DPDK libraries still in the user space, then to network controller in the network hardware. Showing how DPDK allows a packet to bypass the kernel space as the DPDK libraries are taking place, taking the place of the network driver without leaving the user space. This project started as a Red Hat sponsored research project at University of Massachusetts Lowell to examine benefits of DPDK in Red Hat storage with the understanding that the current Sath already had DPDK support. There is DPDK compatibility in Sath, but due to architecture restraints, Sath cannot benefit from DPDK. This caused me a shift focus to Sath Crimson because the development notes claimed DPDK compatibility. A final change in focus was made due to an approaching deadline and issues gained Sath Crimson running with DPDK. This resulted in collecting data on DPDK roundtrip latency, which is what Sath hoped to gain from DPDK. The next section covers the programs which were used to collect data, by quick overviews of their use and why they were selected. The first program is called UPERF. It's a well-known and easy-to-use open-source network benchmarking tool, originally developed by the performance applications engineering group at Sun Microsystems, chosen due to it being easy-to-use and my mentor, Ben England, having had prior experience with it. The second program is called PingPong, a relatively unknown open-source program developed by Zing Yang Lee on GitHub for the sole purpose of collecting roundtrip latency using DPDK, selected due to it being the only openly available program that collected roundtrip latency using DPDK. The version used for this research was modified and a link to which is located at the end of the slides. In next section, I will explain the methodology used for testing. For hardware specifications, two machines located on the same rack with a single Jupyter network switch between them, each system consisted of a Dell PowerEdge server with two Intel Xeon Gold 6,230 CPUs. All testing was done on the exact same Intel 25 gigabyte Ethernet port. When running the PingPong test, the Ethernet port used the VFIO-PCI driver when running UPERF tests, the I40E driver was used. Next are the independent variables. For packet size, it went 64, 256, and 1024 bytes. For packet descent, it went from 100, 10,000, 1 million, and 10 million. The result of each test were the average of three different tests run with the exact same parameters. For the purpose of neatness, only the averages were recorded in this presentation. Finally, we had the measured parameters, which are runtime, max round trip latency, average round trip latency, and min round trip latency. Before we get into the research results, I'd like to take a few minutes to go over the restrictions that are faced during this research. UPERF has proven that in UDP mode, it takes a long time to send packets if the size is greater than 1.5 kilobytes. For this reason, all packet sizes were less than 1.5 kilobytes. It's assumed that's due to a fragmentation issue with UDP messages being greater than maximum transmission unit size of 1.5 kilobytes. A similar issue was seen with ping pong when attempting to use anything less than 64 bytes, which caused ping pong to stop working. Due to ping pong's current design of not including the ability to send and receive multiple packets at one time, there's a disconnect with UPERF, which did have this ability by making use of multi-threads. For the purpose of making these programs as close as possible during testing, the number of threads that UPERF was using was limited to one thread for the entirety of the tests. Ping pong required the use of a minimum of two cores and Wolverine ping pong, the two cores were utilized near 100%. For purpose of testing, only two cores were utilized by ping pong. The current understanding of UPERF is that a write is half the round trip and read is the other half. For this reason, the average round trip time of UPERF write was added to the read to get the complete average time. This doesn't work for the max and min round trip time though, since there was no way of knowing what read was when write had its max or min, and there was no way of knowing what write was when read had its max or min. Now that the restrictions have been explained, it's time to show results. Please keep in mind that the results you will see order of first three graphs are ordered by packet size and the following four graphs are ordered by number of packets sent. This slide shows the runtime results of the 12 tests broken up by the size of packet sent. The solid blue line represents ping pong results and the dash red line represents UPERF results. Both axes are logarithmic. The first graph shows the result of all tests with 64 bytes packets. The x-axis is the number of packets sent and the y-axis is the time in seconds. The x-axis starting at 100, then 10,000, then 1 million and finally 10 million. The other two graphs are the same as the first only with packet size set to 256 and 1024 respectively. As you can see, all three graphs behave very similarly where UPERF takes longer than ping pong for 110,000, but once the packet sent get to 1 million and 10 million, the results become near identical. This slide shows the runtime result of the 12 tests broken up by number of packets sent. The solid blue line represents ping pong results and the dash red line represents UPERF results. Both axes are logarithmic. The first graph shows the result of all tests with 100 packets sent. The x-axis is the size of packets and the y-axis is the time in seconds. The x-axis is starting at 64, then 256, and finally 1024. The other three graphs are the same as the first only with number of packets sent being 10,000, 1 million and 10 million respectively. A trend seen in this graph is that UPERF remains constant at 110,000 packets regardless of packet size and is greater than ping pong at all but when packet sent is 10 million and the size is 256 as seen in the rightmost graph. These are the same as the first three graphs only showing the max round-trip latency instead of runtime in which the dash purple line represents UPERF write results and the dash green line represents the UPERF read results. As before, the x-axis is the number of packets sent but the y-axis is now time in microseconds. These graphs are a bit less cut and dry than the ones seen in the previous slides due to UPERF not being one entity instead being split up into two. In all three graphs, 110,000 packets ping pong performs between UPERF read and UPERF write with ping pong then exceeding UPERF write and UPERF read at 1 million and 10 million. Ping pong and UPERF read are the closest in the middle graph when packet size is 256 bytes with 1 million packets sent. These are the same as the first group of four graphs only showing the max round-trip latency instead of runtime. As before, the x-axis is the size of packet sent but the y-axis is now in microseconds. There's no clear trend when the day is presented based on number of packet sent as can be seen by these graphs. These are the same as the first three graphs only showing the average round-trip latency instead of runtime. As before, the x-axis is the number of packet sent but the y-axis is now in microseconds. The average round-trip latency is clearly a product of multiple variables and this is seen by the graph as there are no clear trends other than UPERF decreasing from 100 packets to 1 million then increasing at 10 million. These are the same as the first group of four graphs only showing the max round-trip latency instead of runtime. As before, the x-axis is the size of packet sent but the y-axis is now in microseconds. Once again, the average round-trip latency is clearly a product of multiple variables and this is seen by these graphs as there is no clear trend. These are the same as the first three graphs only showing the min round-trip latency instead of runtime and with dash purple line represents UPERF write result and the dash green line represents UPERF read result. As before, the x-axis is the number of packet sent but the y-axis is now time in microseconds. In these graphs, you can see ping-pong and UPERF write are constant with slight alterations and UPERF read is decreasing from 1 million packets to 10 million packets. These are the same as the first group of four graphs only showing the max round-trip latency instead of runtime. As before, the x-axis is the size of packet sent but the y-axis is now in microseconds. The only crude trend in these graphs is ping-pong increasing from 256 bytes to 1024 bytes. From the results in the previous section, I will now present my conclusions. I do not believe that the data provided by this presentation is enough to draw any conclusive conclusions relating to DPDK versus non-DPDK round-trip latency which are emulated by ping-pong and UPERF respectively. Ping-pong has a better average round-trip time than UPERF but the difference between each runtime is a number of microseconds and there was KRK results mixed in. To definitively prove numerous tests would need to rerun with constant results from ping-pong and UPERF which I was unable to accomplish in a time allotted. I will now take a few minutes to suggest some ideas for future research that builds off this research as results. DPDK is currently being used on several applications but there are few tools to collect accurate round-trip latency data. I personally found the speed difficulty when attempting to find a program that measured round-trip latency for DPDK. For this reason, I believe future projects should focus on improving ping-pong which wellness infancy has the potential to show off what DPDK is capable of in terms of round-trip latency. There are several modifications that need to be made to ping-pong before this can be a reality. Ping-pong needs to become capable of sending and receiving multiple packets at a time. This may be accomplished by some form of DPDK specific multithreading or possibly some other built-in DPDK function but it should be one that fits with the DPDK architecture. This will allow ping-pong to compete with UPERF which already had the ability to send and receive multiple packets by making you some multithreading on a whole new range of tests. Once ping-pong can send and receive multiple packets it would be interested to see how the CPU utilization compares between ping-pong and UPERF. Ping-pong could also benefit from the ability to send, to specify the packet size and client slash server IP addresses from the command line making it easier to run and automate different tests. In addition to the ping-pong modifications adding a feature to UPERF where it reports the round-trip latency of the read slash write whenever the other has a max slash min would make it easier to compare the max and min of UPERF versus ping-pong. Finally, I would like to acknowledge all those who helped me to make this research a success. First, I would like to thank Red Hat for sponsoring this research and University of Massachusetts for hosting it and providing me with the opportunity to work on it. In addition, I would like to thank Ben England from Red Hat and Professor Vinod Vakarain from University of Massachusetts for their invaluable support throughout the entire research process. Finally, I would like to thank Seng Yang Li for creating and sharing ping-pong on GitHub and the entire UPERF community for their efforts in creating a well-made and easy-to-use tool. Thank you for your time. Are there any questions? All right, not seeing any questions yet so I guess we can just hold off for a few minutes. Okay. Yeah, plus, plus, Joe. This was interesting as someone who knows nothing about what you're talking about. Thank you.