 Hi everyone, my name is Anton. Work at Datadog. I'm going to be talking about rate limiting with Selium and EBPS. So very quickly I will start by doing a very high level overview of how Selium does bandwidth management. And then I will talk about some interesting lessons that we learned while trying to put it in place at Datadog. So why do we need bandwidth management? When we have multiple pods running on the same host, they share the same network bandwidth which can cause network contention. And this is where Selium comes in with the BandwidthManager feature, which allows us to perform egress bandwidth rate limiting in order to prevent this kind of contention. Let's see how this works. We start with a pod that we label with this specific egress bandwidth label. In this example, we limit it to 10 megabit per second. And if we have the BandwidthManager feature enabled in Selium, it will detect this label and it will automatically inject this BPF program into the pods network interface to actually perform the rate limiting. So this is where the crux of the BandwidthManager implementation lies in this EDT schedule departure function. And it does this earlier departure time scheduling algorithm in order to rate limit the network bandwidth. How it works. It looks at every single outgoing packet. It looks at the max bandwidth rate. So here in this example, it would be 10 megabit per second. And then it will calculate how much each packet needs to be delayed in order to achieve the desired bandwidth. So it will basically slow down each packet in order to achieve the bandwidth that we expect. And it will do it by setting the timestamp on the packet, which indicates the earliest time the packet is allowed to depart. Once the packet goes through this BPF program, it will go on the fair QQ disks set by Selium on the machine. And it will actually be the FQQ disks which will read the timestamp set by the BPF program and slow down the packets as needed. And after this, the packet will go on the network interface card Q. Actually in real life, when you're on a multicore machine, this will look more of something like this with a multiple FQQ disks managed by one global MQQ disks, but the idea stays the same. And just to note that Selium will override your Q disk setup on your machine as soon as you enable the bandwidth manager. So something to pay attention to. Now the lessons that we learned while trying to put this in place. So we did some benchmarks. Well, this graph looks a bit scary, but they're just three things, download, upload and latency. So this is a baseline benchmark without the bandwidth manager. So here we have basically four gigabit per seconds on average per flow. We have four gigabit per second on average per flow on egress as well and average one millisecond latency. And then we enable the bandwidth manager and we observe some interesting things. First of all, here we have the egress bandwidth which is limited, which is normal because we enable the bandwidth manager. So we are rate limiting the pod here. However, the download bandwidth suffered a lot. So it went from four gigabit per second baseline to around 500 megabit per second. And then the latency showed up to 80 milliseconds on average. So 80 times more. So when we investigated this issue, we realized that we have not enabled this BPF host routing feature that was added in Selium 1.9. And just generally, this is much faster than the kind of legacy routing that Selium does because it bypasses IP tables and bypasses the upper network stack in the host namespace. And also it ensures that the TCP back pressure works properly. Unfortunately, I don't have time to go into much detail on how this works, but anyway, we enable this feature and then the results were much better. So here, the egress is still fine. And the ingress got to four gigabit per second. So basically the host maximum. However, latency, well, it got better. So from 80 milliseconds to 15, but it's still higher than baseline. So we looked into it a bit more and we realized that this might be due to the fact to the way that EDT scheduling algorithm works actually. So what happens with this algorithm is that all packets from all flows coming from a pod get assigned timestamps sequentially and globally in this BPF program that I showed before. So actually, even though packets may end up in different cues, because of this timestamp assignment done globally, they end up forming this one sort of virtual FIFO queue. So you get this latency increase. So to sum up, the three lessons we learned is that first, your default queue disks will be changed to FQ by Celium as soon as you enable the Bandit Manager, even if you don't use this feature. You also need to make sure that you have BPF host routing enabled if you wanna use this feature efficiently. And finally, there will still be a latency increase due to the way the EDT algorithm works. And actually, this talk is also a call for ideas, if you have any ideas on how to improve this. And if you wanna learn a bit more about this particular problem, I highly recommend reading the blog post in number four, which explains this problem in much more detail. And this is it for me, thank you very much. Thank you.