 Já si kdych pojel, že jdu dělá vše prednáška, já nás představí. Jo, jo, jo. To můžeme se tady použit? To se tady můžete, ale ty kouše je odlej normálně zaproutý. Ty můžete tady jít, a já si zbrenu to za. Děkuju. Děkuju. Víte to? ... ... ... Can you hear me? OK, now? OK. So hello, my name is Michael Luchčan, and together with Pavel Vichan and Lukash Sloka we will present you our research project, which is a LabQ research project. LabQ, it's a joint initiative of Red Hat and the Faculty of Information Technologies. Big thanks belongs to both organizations, and as for the agenda, I will tell you something about the current state and the motivation, and also I will try to briefly explain how that thing works. Then Pavel will show you some quick demo of our prototype of plug-in and Lukash will tell you something about gathered data during our research. So at the end of the presentation you will see the roadmap of our project. So metadata and DNF metadata files contains data about packages, files and groups. They are essential for resolving in DNF, and the size of them is approximately tens of megabytes, and they are compressed in GC format. This applies for Fedora, and the problem is that they are constantly changing with package updates and so on. So DNF use metadata to resolve dependencies of packages, install groups, or mostly for anything you need fresh metadata in DNF. So the motivation for our project and our goals are following. So the current state in DNF is that DNF re-unloads whole metadata files again with every subtle change in them. So this is politely speaking not optimal. So our project goal is then, of course, minimization of metadata download size, and we hope we will improve user experience of DNF and also reduce load on Fedora mirrors. So that was our project goals. And now the quick introduction to the thing. Please don't take it too seriously. These are, of course, more complicated. And so that thing is a generic tool for synchronization of files. It works on binary level, and the comparison of files is based on checksum of blocks. So, of course, everything has its advantages and disadvantages. So most notable advantages of that thing are following. It does not require any special server-side service, and it's very easy to fall back to regular downloading if something bad happens during the Zsync download. You will just download the usual way. So it's also compatible with older package manager, which does not know anything about Zsync and so on. And, of course, disadvantages. It requires special Zsync files to be presented on the server-side. Those Zsync files contain pre-computed checksums of blocks of the file you want to download. Also, there is one drawback, that it's compressed files. That Zsync does not work well with compressed files, and they have to be crafted in a special way, mostly with Harrison Cable option for GZIP, for example. And sometimes it's quite buggy, of course, as every software on the planet Earth. So now I will tell you something. I will try to explain you how Zsync works on the simple example. I promise I won't be drawing anything in the next ten years, because these are very naive pictures, and I'm out of them. So imagine a situation. You have a server with the updated version of some file, and you as a client has all the version of the same file. So in case of Zsync, there is also a Zsync file presented on the server. As I said, it contains pre-computed checksums of blocks. So first of all, if you want to synchronize something by Zsync, Zsync downloads the Zsync file. This is the first download in the process. Now it will, on the client side, it compares the changed blocks on the files and identifies them. And the second download is issued via HTTP Ranch request. HTTP Ranch is a feature of HTTP 1.1. So the server has to support HTTP 1.1 protocol. And via HTTP Ranch request you can download only certain blocks of the file. You don't have to download the whole file. You can specify only certain blocks, as I said. And the last type of the Zsync is, of course, the reconstruction of the file on the client. So this was a very brief introduction to Zsync, and Pavel will tell you something about our implementation, and he will also show you some short demo of implemented Zsync prototype plug-in in DNF. So my name is Pavel Witsen, and I tell you about our implementation. Our implementation consists of two parts, client side and server side. At the client side we implemented DNF passive plug-in. It synchronized metadata before actual DNF commands. It calls Zsync binary to reduce dolanding sites. And the server side, we use HTTP server. We implemented bash script, which periodically check out new updates. If updates are available, we backup old metadata files at the server, and doland new metadata files. The GZ files had to repact with archsyncable option. And when the GZ files changed checksum, and we had to change checksum in repoMD.xml, and last step created Zsync file. So I showed a quick demo. So I have old metadata files. I show checksum these files. And for example, I want to find out version of YouTube DL package. From the cache I used DNF repository command. This version is now there. And I now update metadata via DNF makeCache. It takes several seconds, because it is activated our plug-in, which doland this metadata files. And I can enter again DNF repository command and show the new version of this package. And finally, I show new checksum of these files, because it is done successfully. That is all. And Lukash tell you about the results we measured. Thank you. Hello, everyone. My name is Lukash. Thanks. And I will be talking about results that we gathered. But before I get to it, I would like to talk a bit more about the actual drawbacks of this approach. So the obvious drawback is on the server side, we have to generate Zsync files for each of metadata file. And also we have to repack them with Arsync cable. In this graph, the first bar, named Alt, shows the data is there on the servers now. We have just the metadata archives. Of course, there are more files, but the files and primary are the heaviest, they are the largest of the files. And others are considerably smaller. So for the clarity of the presentation, we work only with these two. So we can see that the files are slightly smaller in the size than in our proposed solution. And this is due to the Arsync cable flag, which basically makes the compression method slightly worse, but it's nothing too serious. Increasing size is only by 3 to 5%. Zsync files are slightly larger. Generally, on average, it's by 8 to 10% increase in size. So all in all, it's approximately 15%. And let's remember that we are talking about tens of megabytes, so the servers can handle it pretty easily. So the drawback is not that large. Well, how about the gains? So in this graph, we have plotted increase of synchronization size, which is basically the data download to the synchronization frequency. And by synchronization frequency, we mean the time span between the updates of the system, between the last download of metadata. So, for example, in this graph, we can see that if we were to update our metadata, let's say, once per week, we would only have to download 3 megabytes, as opposed to downloading whole 15 to 20 megabytes of entire metadata on the server. So the download size is greatly reduced, which is a very positive result. Well, the graph also shows pretty much a steady increase of metadata download size. Per additional day, it's approximately 200 up to 250 kilobytes, which is not that much. And by this rate, the download size would reach the download size as it is now in approximately 3 to 4 months, which is nowhere near the usual use case, because people usually update their system a few times per week. So we are in this area of the graph, and we can see that it's really efficient. OK. So that would be all for the results, and it's really encouraging. Time of the project, we've been working on it for several months, mostly gathering data. The graphs that were shown, for Federa25, pretty much since it's released in late November. And currently, we are in the phase of discussing this with Federa community, which is pretty much what we are doing here, and we are looking forward to integrating it to DNF. So if you have any questions, please feel free to ask. Yes? There were some issues like it was bundling a part of our sync in itself, so it was not an easy package. Have you looked into that? Yeah, of course, but now we are allowed to bundle that part of that sync in Federa packages as far as I know, so it should be no problem. Yeah, thank you. You can basically switch to it, because the view is of this update, the file has been bundled in m2.gr. Yeah? Why did you save it finally, but it doesn't work with the textbook? Yeah, the question was about... the question was about the... about some proposal to use text files instead of gzit files. We decided for this approach, because we wanted to be backward compatible with the previous package managers, and that's the answer. Yeah, we don't want to break anything. The question is whether we considered some Debian style. No, we haven't. We can talk about it after this presentation. No more questions? Have you actually done any timing comparisons? It heavily depends on your bandwidth, the speed of your connection. Sorry, the question was whether we measured the time of actual downloading. Questions, yeah. Yeah, the question was whether we are not wasting the bandwidth of HTTP request, because HTTP can... Yeah, as I said, we want to be as much compatible as we can, so we don't want to change this on the Fender infrastructure site. Yeah? Yeah. We can use this mic for questions, I think. My question is, can you use your approach with the YAM security feature, that means signing of repository? Of course, because these data are produced by Fender infrastructure, so they can sign the files. Yeah, but that thing can handle it. We will do integration, so we will consider this, of course. So I have just a small remark regarding your statistics. That looks very promising, but it's not answering the right question. The right question is how much does the overall download site of a Fedora lifecycle goes down, depending on how often you update. The thing is the metadata of the updates repository, which is what we are talking about, so you can download it at the same time, linearly, and if you download it at the same time, you basically need to square the time at the space as a download amount. And of course, it would be interesting to actually point out that with your method you can get it down to a linear thing with a small overhead, and so I think you can make your case even much, much better. I'm very optimistic that you can make even a much better case for your approach. Thank you for your question. I can answer the problem with the security, because I think this is about signed metadata, because when you download, we add the Zsync method, you get the same metadata as you have on the server, so you can verify the checksum, or verify the signatures. There is no problem with this. There is only the thing that you have to verify the checksums after you do the Zsync. Even the Zsync files don't have to be signed, because these are only intermediates, and they don't affect. If you at the end of the Zsync verify the signatures, then you are okay. Thank you for the clarification and probably the last one question. You mentioned in your presentation that sometimes Zsync is buggy. What exactly the problems are and is a problem that Zsync upstream looks dead? Is it a problem for you? Mostly this is the case, and as for the issues. So sometimes Zsync cycles when doland stoppercent and print message stoppercent doland and cycle nonstop. Thank you. Have a nice day.