 Hello. I'm going to talk about how you can make the Internet faster. Specifically, I mean, how can you write fast applications? How can you tune your Linux or BSD servers that host your services? And how can you improve your networks at home or in your space or workplace or whatever? And first we're going to talk about how can you build a better application? But, no, first I'm going to talk about myself. Yeah, I'm an apprentice since 2020 at a Linux distributor. And I like server hosting things. I like to do mostly OS-level things. And I sometimes read RFCs, IETFs, kernel code, et cetera. I am sort of a C novice, I would say, but I just do this and that. Not sure how to describe what I do. And I like to present other people's work. This is not the first time that I've talked at an event about what other people are doing in the world. So, yeah. I haven't written anything specifically here. I'm just going to show you what you can do. And this presentation is under Creative Commons as most talks here. So, writing better applications. If you write a small, let's say, script and you have to interact with a network device that has the rest of the API, for example. You might be doing it in Python because Python is a popular scripting language. And if you do, there's this whole thing where you can do this whole post and a get request, which you hopefully know in HTTP, there's multiple ways to give a server information about what you want. So, if you just follow the documentation of this request library in Python of this module, it will tell you just do request, stop post, and you're doing a request. Nice. And then maybe you're adding, you have to interact with some sort of appliance. And you have a list of items you want to tell the appliance. In this case, we're adding console ports to a network device. And what we have to do is we have to iterate through a list of devices we need to configure. And so, that means we're doing a lot of put requests, a lot of HTTP requests here. The problem here is that we are doing them separately. For each request that we do, what we need to do is first resolve the DNS of our console, then we have to create a TCP connection, then we have to do a TLS connection because it's encrypted. And then finally here, we can actually do a put request and then get a response back. I'll put it here twice. That's a mistake. And then you close the connection because in this current application, if you follow the example, all you do is say request post and it will open a connection and close the connection. Why is it a problem? Because round trip time, you might have ever typed in the ping command into your terminal and you know that, hey, this machine is 30 milliseconds away from me. It takes 30 milliseconds for our round trip, which is, I send a message, I get a reply back and that reply back arrives at me, arrives where I am. And if we just do it this naively, we actually end up doing this eight times, going back and forth. So if this script here, like for example, if I have, let's say, 50 entries and I'm doing this whole loop for 50 entries, I will, that means, theoretically, eight round ships per request. So that's 240 milliseconds. And if that servers on a different continent, it might be even worse. And if we have 50 entries, it can take 12 seconds to run this whole script. And if it's in a different continent, it might be 60 seconds. So obviously, not obviously, but this might annoy us, so we might want to improve a script. And what we can do is reuse the connection where we go through this whole process of setting up our connection and just reuse it. The code change here is that we use this request library from Python and open a session. Add our headers here that we need. This is to turn off TLS verification because it's a self-signed certificate. And then we don't just post with the library, but we post in our session. And what this does is it moves our for loop to this part now so that this whole setup part is now no longer done. So we're reusing our connection. We're keeping it open. So our round trip is only, we only have to do one round trip. Yeah. And that means if we have 50 entries, 30 milliseconds to 50 is, oh, that's 1.5 seconds. If I'm not mistaken, excluding the whole setup phase. It's definitely faster. I'm not faster. So can we do that on a web as well? Yeah. If you're a website developer, what you can do is you might have followed a tutorial that says, hey, let's use a library, something like jQuery or whatever. And they tell you just put this thing in the top of your website into the header part. What you are doing here is you're telling your browser, hey, there's some information about how the website has to be rendered, and you have to get it from this address. If you load a page like this, what happens is we first connect to your website, download the HTML. The browser starts parsing the HTML as it arrives. It says, oh, no, CSS is somewhere else. And it has to connect to the other web server. And as we saw, that might be eight round trips at worst. We have to do a DNS resolve. We have to open a TCP connection. We have to negotiate our TLS because it's a different domain for different web server. Even if it's not a different web server, it's still a different domain. And then we can finally pass the style sheet, and then your browser can start rendering because without the style information, the browser doesn't know how to display your website in a real block. So don't do this. Don't use, put your essential CSS files or JavaScript files into another web server because that usually slows you down. Unless it's in cache. So if you're not a first-time visitor, then it's not. Then it's already in cache. But the thing is, if you think, oh, yeah, but this is, they say this is a CDN. It should load the JavaScript or the CSS faster. Well, the thing is if you, it's not really better because, again, you have to do all this whole stuff to get the resource. And if you need a CDN, by all means, then use a CDN. I mean, put your entire website behind the CDN on a single domain. And then I guess it's faster if you care about international things, international visitors and stuff. And cross-site resource caching is dead. It has been turned off in browsers years ago because of security concerns. So there's a whole argument that, oh, because this other website might be using the same library, it's already cached. That's just not true because browsers don't do that anymore. Externally embedded images are fine, of course. If you have a video or YouTube video or image from another site, that's fine. You can include that. But the difference is that it doesn't block the rendering because your style sheets, like the parts that say this is how my website looks like, they are already processed and then you have just a blank field. That's fine. But this will make your website nicer and faster to use. That's all I have to say about applications. Next, we move on to tuning your servers because some of us like Linux and we host stuff. But first, let's look at how basic inter-technology works again. TCP. TCP is that one protocol that we often use. It's connection oriented. It only works if we build a connection. So we say, hey, I want to connect to you and some we say acknowledged and we open two ports and we can communicate with each other. It detects errors. It asks for retransmissions if packets are lost. It will say, hey, I haven't received this package. I'm expecting this package next. Where is this package? What was the term tamed? TCP uses frames? Data grams? Something. Sorry. But it guarantees a lot of things. It guarantees that the stuff arrives in order. So that means if something was lost along the way and you try to read the data that you expect, it will block the read until you that packet is missing. So it's strictly in order. Which makes sense if you download a single file, but it's also problematic sometimes. And you have to acknowledge packets. You have to, each time you receive something, you have to send an acknowledgement as the receiver. And the sender will send a certain amount of packets which is called the window of a window size and then it waits for the acknowledgement and then it continues with the next window. It is just the window size depending on your connection speed. And one very important thing that TCP does is it employs network congestion avoidance. It does not want to overload your network. If you just blast the data as fast as you can through an interface, we would have a problem because, yeah, I have a gigabit port. But if I am connected to a 20 megabit uplink, I would blast a gigabyte through and most of the data would be dropped by the switch because it says, hey, I'm full. I can't store any more of your packets. I have to drop them. My buffer is full. And I have to clear my buffer slowly. And that just slows networks down because we have a lot of packet loss. So what TCP does is it starts off sending really slowly. And until it notices, oh, there's congestion, it drops the sending rate and tries again. And this looks like this. This is downloading a gigabyte, almost gigabyte file on a gig with connection. And as you can see, here's the start phase. We'll look at it closer. And then we drop packets here and boom, the send rate drops, increases, increases, drops, increases, increases, drops, increases, increases, drops. And we have to stick saw pattern. This is using the, I'll talk about this later, but this is using the algorithm called cubic, which does this detection completely based on dropped packets. So it expects you to fill up the buffer. So you send and send and send and send faster and faster. And at some point, your router or your switch says, no, I'm full. Packet gets dropped. And from the acknowledgments, you see, hey, I'm missing this packet in the series. And, hey, I'm missing this packet in the series. It keeps acknowledging maybe the later packages that are arriving, but it says, hey, I'm still missing this packet. And the sender notices, realizes, oh, yeah, I'm sending too fast, need to drop the rate. And then it keeps sending and sending more and more. But the more important part is this beginning phase. This is called the probing, where we try to touch the water, I don't know, test the waters to find out how fast can we go. And why is this important? We're not always transferring gigabytes of files or like big files. We don't, I don't think that most of us, most of the time we don't download big files, which is not what we do. Instead, we might transfer smaller files like a 500 kilobyte file. This is the same graph as earlier, but with a 500 kilobyte file. And as you can see, there are these pauses here, and it's very short. It's not even, yeah, this is in seconds. This is not even 100 milliseconds of transfer. What happens is that we are only probing. We are just trying to figure out how much can we send. But by the time we've already increased the window, we still haven't reached the limit of our gigabit connection. So we are not transferring at gigabit speed because we were trying to test if we even can do that. So we were transferring lower than gigabit speed, a lot lower, a lot slower. The transfer, I think the average was like 15 megabits, so that's like over 100 megabits second for half a megabyte of file. So we have the bandwidth in the connection, but we are not using, we don't get the transfer speed. And that's because of this probing. And so, I'm going to talk about this. This is a TCP trace. Each packet that we send, each IP packet is also packaged as a TCP package, I'm not sure it's called. We have the sequence numbers. We're counting up the packets and we say, okay, I received packet one, packet two, packet three. And our slow download looks like this, a bit wobbly, but the average is pretty much a gigabit. This is how it's supposed to look like. And this wobbly pattern is normal if you download at full speed. This is expected. In a perfect condition, two computers connected next to each other, you get dropped packets because that's how TCP works. And with our slower transfer, we didn't get that. What can we do? This whole congestion control thing, like how fast do I have to send, how do I back off, that's described in an algorithm. And that's called the congestion control algorithm. And I mentioned Cubic. This is using Cubic. And the Cubic is the default in Linux since 2008 or something on previously that's new Reno, also similar timeframe. Funny thing is Torrent uses UDP so they can also implement their own congestion control algorithm. And they use one called LEDBAD, which aims to be very conservative and it starts really slow and it backs off really fast in order to not compete with the important real TCP traffic. Yeah, that's a nice detail. But there's also a newer algorithm called BBR which has Google developed in the 2010s. And I think it's been in Linux 4.9 or something like that, or 4.14. So, like, seven, years ago, but it's not the default. And it makes the probing a lot more aggressive because it's optimized for the web, for websites that have, where they recognize, yeah, people have now a lot more bandwidth, but Cubic is so conservative that it makes websites still feel slow. And also the thing that has changed is that as Netflix got faster there, there was a lot more buffering introduced. So your router, with your slow speed, got a lot bigger buffer. What that means is that the packets are dropped much later, which can lead to higher transfer rates to some degree, but it also leads to a bigger delay. And the congestion events are recognized much slower. So what they, BBR also does it, it looks at the latency of the acknowledgments. If acknowledgments suddenly arrive slower and later, it also detects that as a delay. And again, it's more aggressive and it's also compared, if it's, if you have a TCP Cubic and a TCP BBR, send a machine competing for the same link, a gigabit link, there's study papers out there that BBR is a lot more aggressive and traffic that uses Cubic will be a lot slower if you have two big file transfers. But yeah, BBR is used especially by the big cloud services that make up most of our web traffic. And that makes them load a lot faster. CloudFab wrote about this in their blog. And in their example, with a 5 megabit simulated connection, they had something like an example that says 1.5 seconds load time for WordPress site using BBR and 8 seconds for one using Cubic, especially when they use HTTP2. HTTP2 really likes this more aggressive probing and less slow start, faster start. And CloudFab, they pull out these numbers and they mention these numbers. That's this article from 2018. And you might have seen them elsewhere. What they tell you to do is I'm using the sysctl of your Linux system. Again, this is Linux specific because BBR is not implemented on BSTs, at least not mainline BST. They tell you to use the BBR congestion control algorithm. And they also tell you to add this value that says that your socket buffers are clear. So packets are sent out if your socket buffer contains at least 16 kilobytes of data. What I'm talking about is if your program, if your application says, hey, I want to send now, I want to send like a kilobyte, it's first going to sit in the socket in your program and only at some point your kernel will go and take this kilobyte of your socket buffer and send it to the network driver and the network driver will send it to your NIC, your network interface and then it will be sent out. So in order to reduce delay on websites, they recommend this magic number basically. And the Grafino developers also have a different idea about this. They, for example, recommend a higher threshold. And also what Klauffer mentioned is they want you to change the queuing discipline. And we'll talk about queuing disciplines later as well. But yeah, these are two main articles if you want to learn about what values to pick and why they might do a decent job. I mean, the Klauffer block article has a bit of magic to it. They don't explain everything but they think they give you a great insight into the problems that they're solving. Also what helps is, I mean, if you run a web server, use a new HTTP version. If you're not doing, if you don't have a reverse proxy, you can just change your config and if you're using a five-year-old or newer release of Apache Engine X, I think in Apache you have to load a module as well. You can use HTTP 2, which is the sequel to 1990 HTTP 1.1. And it's also, it also, you can have both at the same time. HTTP 2 is, I think it's a Google initiative like just like these days, Google and others like to push these things and implement them in Chrome. And then because Google has implemented it in Chrome, it becomes a standard basically. Yeah, but HTTP 2 makes things faster by reusing our connection instead of, well, by having priorities and using a single connection to even send multiple things at the same time or whereas HTTP 1.1 required to have multiple connections open at the same time, I think that's called pipelining. Yeah, but it all helps. HTTP 3 is a thing as well. HTTP 3 uses a single not TCP, but quick, which is like Torrent, it tries to implement their own TCP on top of UDP and it's again pushed by Google through that browser. But it's not in any major release of Apache Engine X, I think. It's only an experimental branches and there's nothing to the state that you can, where you can just use a stable release and have it. There's also multiple implementations of it from Cloudflare and whatever, but nothing just that has been merged. Yeah, that's what you can do if you host something. Next thing, how can you improve your own network at home? Cake, anyone? If you have used open WOT, you might have heard of this whole smart queuing thing, this whole cake thing. The idea is that if this whole delay detection thing is failing us and if maybe our buffers are too big on the network, can we ourselves indicate, be smart about it and indicate congestion to reduce latency? The problem is, so first of all, if you have applications, if you have sockets, all these things are all queued, sockets are basically a queue for each device, for each application, and they have to be, at some point, be sent to your network card. They're all computing usually for one queue on better, higher network interfaces. You have multiple queues, of course, but they have this one queue and someone has to decide who can go first, and that's done in the Linux kernel for a queuing discipline, which sits before the device driver, which might also have some mechanism for determining that, Intel does that for sure, they have their own quality of service stuff integrated into their driver and their firmware. But yeah, this is where you can do something. You can do it, yeah, once it passes through that and then to the device driver and the NIC, it will finally be sent to your network and then to your gateway. One thing you can do is, one thing you can control always is your device, so you can change the queuing discipline on your device. The default in Linux is first in, first out, but in system D the default is something called fair queuing codal, FQ codal, fair queue controlled delay, which says that if something, I think it was, if something, if a queue gets really full, the queue will be purposely delayed in, so a delay is added, so that the queue fills up even faster and the packet drops a curve faster, so the packets are dropped faster, so the congestion is indicated earlier and the sending slows down earlier, so that the connection backs off faster, that was hard works. But that kind of helps, but where you really wanted to be is in your gateway, usually have your DSL modem or cable modem or something and that's usually much slower than your network speed. Well, usually, you have a gigabit network or a Wi-Fi 5, and especially on Ethernet, you know, to start that, you know, your gateway is really slow, and you can't really see it, so you're not going to see it either, so it's a little bit slower. So, I think that's a good thing, but it's a little bit slower, so you can not see it either, so it's a little bit slower. your gateway is really slow. And what your gateway should do is it should stop buffering so much. Buffer bloat, because there's this whole thing called buffer bloat that has to resolve. Buffer bloat is when the buffer adds your hops on your network at too much latency, which there shouldn't be. You know you type in ping and ping to host. But the thing is, if you run a speed test under load, this latency can rise. This was here at the EGAR of a Wi-Fi, and because the Wi-Fi can't saturate the whole uplink that we have, the loaded latency wasn't that high. But at home, in most of my third box, I get 300 milliseconds. And that's quite common. So you have a 300 millisecond latency if you fully saturate your connection, if you do a download. And that means that if you now try to also load a website or do a DNS request, this all takes 300 milliseconds of round trip time. That's slower than the latency to the U.S. West Coast. And that makes the application slow. Websites, video calls, of course, especially if it's suddenly, you will feel that. You will notice that. Especially if you have multiple users. And your router, your gateway should do much better at this. And that's where cake tries to help. And other traffic control algorithms. Buffer bloat is when your hops buffer too much. If your latency increases when you fully utilize your connection. If the buffers are too small, then you don't get enough throughput. Because too many packets are dropped and you can't fully utilize your link speed. So that's bad. But usually, the buffers are too big. They don't add more throughput, but they just increase latency. And that's just unnecessary. And one idea is that we just don't fully utilize a link. And that we try to keep our latency consistent. And we don't optimize for maximum throughput but consistent latency. And that's what cake does. Again, latency rises if the slow sling is under high utilization. So if you have, so if it's close to 90, 95% or something like that. Yeah. And it should notify about congestion. But it doesn't do it well enough. It doesn't do it early enough. So what cake does, in effect, is it drops packets earlier if they have been in the queue too long for a certain connection, I think. Or it adds the control delay to indicate that there's a congestion. And through its magic, basically, to me, because to me it kind of is magic, it adds a consistent latency. So your latency can actually stay at this 20, 20, 15 milliseconds, even if you are running a Steam download and you're in a video call. That's like the future that I want. I want to have someone at my place running a Steam download and I don't notice it. And my websites load just as fast. Because for home users, that is, I think, not an uncommon thing that if someone downloads a big file, the experience for everybody else sucks. In the last five years, I think, recent Wi-Fi access points have paid close attention to that too. And they have improved compared to what you had 10 years ago. But that's only regarding if you're saturating your whole Wi-Fi bandwidth. If your whole Wi-Fi uplink is full, it already does smart prioritization and it can keep it kind of going and usable if your airspace is fully utilized. And cake just makes so. It keeps your web and app experience as far as still the same, at the same speed, even if you're doing heavy downloading in the background. Caviets are with cake. Cake is implemented in the Linux kernel, I think, since 4.19. And also in the open WRT router firmware. But the problem with it is it requires you to know your link speed. It's very bad at guessing how fast your internet speed is. And you usually use like 5 to 10 percent of your throughput. So your maximum download speed will be a bit slow. And it also breaks NAT offloading, which is a feature that these cheap Wi-Fi access points have to, well, reach the speeds that they promise. This is better than because most of us still need to use IPv4. And yeah, so you need NAT, sadly. And these cheap consumer devices have an offloading mechanism and it's not compatible with the cake algorithm. So you need to use the CPU. And sadly, these cheap access points have bad CPUs. So they are not the fastest. If you are open WRT user, there are blog posts on the open WRT forum that benchmark how fast access points are if you are using cake. Hold on there. That's basically the end of my presentation. I just wanted to point out something I forgot. Regarding these tweakables, they say IPv4, but they apply to both IPv4 and IPv6. There is no IPv6.TZP congestion control. Like, mostly all TZP sysvars are exported under IPv4. So they affect both. Even if it only says IPv4. Just saying. Yeah, that was my talk. I would be happy. Yes. Yes. That's my talk. If you have any questions, please ask. What? I did not expect that. Okay. Okay, thank you. Does anybody have questions? I don't. There might be parts of skip loafers. Yes, please, please. Yes. Do there? Okay. The question was, the Fritz box asks for the maximum link speed. Is it because they use cake? I would have to guess they don't use cake because I think the processes are too weak for cake. That's what I would guess. I think my Fritz box model didn't ask me. Maybe it's just pulling it from the modem but the speed. But they have their own QoS mechanism survey. So the whole cake thing is not the same as QoS. QoS means that you classify packets based on ports or some IP flags. There's a DCP field in the IP packet. And you can say, oh, this is high priority traffic and something like that. What you do is you put the traffic into multiple Qs and then you decide, oh, there's something in this very important Q. I'm immediately going to process it and ignore the other Qs. That's what QoS does basically. You just split the traffic into classes. But that's not what cake automatic does. Cake can do that too. But you might not need to do that. I know that the Fritz box have quality of service. So they classify traffic but that's a bit different. That's not the same goal. And the problem really with cake is I think it's not very easy to implement in hardware. I think that's just the biggest problem why we don't see cake. And that they need to know the uplink speed which makes it unusable for mobile and Wi-Fi clients. But I wish more people would use cake. But again, you need a PC processor or you need to be smart to, like, you need a really good network interface that way you can implement it. Okay. Anything else? Anyone? I can, yes. Yes, you can absolutely set it especially if you're using Linux. Again, this is also not on free BSD. Or any BSD as I know. You have to change this TSP congestion control to variable to either to cake. But then the problem is you haven't given it any parameters. What cake really wants you to do is give it, you have to tell it the speed. And here on the Graphino S article that is linked in the presentation, you can either run this command and then you say, oh, my speed is this big. And they said, and that you use this, what's called traffic control command and then you replace the queuing discipline with this, I mean on my system, if you run Qtisk on your Linux machine, I need to be rude for that, of course. Can't type in the password. It shows you the queuing discipline for your interfaces. Link, of course, doesn't need a queuing disk because it's so fast that you don't need anything, any queuing. But yeah, check out the Graphino S article. They also even explain that you can, if you're using system D networking, I think you can set it in the service file if you're using system D. There's ways to use it. But the default already on most Linux distros is FQ codle, which is already better than first in, first out. So that might be already good enough. But cake can do better. Okay. Anything else? No? Yes? Oh, microphone, nice. Yeah, you said that cake needs a BP processor to work, so I'm asking myself. If you have BP processor, then you find in your 30 euro access point. Okay. So if I'm a Linux gamer and I want lower latency, I wouldn't have to sacrifice my FPS if I use cake. Or is there something significant which I could notice? No, I don't think it's significant on x86. If you, where you will notice it, if you, especially if you're on a server, if you have a 100 gig nick, then you might notice it. You will probably notice it on a 100 gig nick. But like these access points, like they usually struggle at, like, they can't reach gigabit speed with cake. If you want to use cake and have a gigabit download link, you probably need an x86 low end atom processor. Or like a beefy arm core that's almost the equivalent to a smartphone, basically. Okay. Thank you. And also, if you run cake on your local machine, it's only well balancing the traffic that your machine generates, not what other people generate in your network. Okay. Last chance, questions. Don't see any. Okay. Thank you for listening. I'll pull up the, thank you for watching as well. Yeah, the talk is online. Thanks.