 So, root is less, container networks get in shape with pasta. So, this would be about container networks, well, presumably, rootless, as you might have guessed, and pasta, so what is pasta? As you might have guessed, I'm Italian, but I'm not talking about that. So, let me show you something. Here I am, I'm on a server, and I do this. Let's take the server down. Am I really root? No, I'm not root. Right, so, great, lucky. What's the trick then? Wait, it didn't get through? Did I just reinvent and share, and I'm just so enthusiastic to show it to you? So, okay, let's say to understand where I am. And the good thing sometimes is to look at networks or network interfaces. Okay, so I have look back, and I have another network interface, but it's down. So, how am I here? I mean, it makes no sense, right? Okay, let's get out of this madness and go back to my host, have a look at what I have here. Okay, that makes more sense, right? I have an internet interface that is up. Looks similar. So, okay, the MAC address is different, though, and it's not the same interface because one is down and the other one is up. So, okay, I read the man page for you. I actually wrote it too, but. And I will not try this. And I will check addresses. I have into this strange route, no routing. Oh, yeah, no, okay, great. So, this is up, or at least it means that it will be up. The state is unknown because I didn't send any packet yet. I have some addresses. IPv4, bunch of IPv6, link local. Great, okay, this sounds making more sense. And let's see if I can reach the internet. Sorry, we talked about pasta, so yeah, great. So, IPv6 is up. I see IPv6 is up. DNS resolution was working. Okay, so did it just re-implement Podman? No, because it's full of stuff here and the container will be clean, right? I just stuffed it. Okay, so let me quickly explain the trick to you. And note that I'm really not route. I'm really, really not route. Okay, I can't delete one thing from here. I'm sorry, I can probably do it, but let's say I want to... Right, so, no, I didn't re-implement Podman. And in fact, Podman can also use this thing. And let's have a look at this. Okay, pretty similar, so the interface is still this strange name. Well, that was the interface on the server. I have the same addresses. Let me check that this works. Yeah, I can install at our three. Great, so this seems to be some user-mod networking. And before I finish revealing the trick to you, I just mentioned user-mod networking. And it might look like I'm copying packets from user space and back and it is terribly slow, right? I mean, somebody probably played this weird. So, hmm. Okay, let me go back to my strange tool here. Actually, sorry, let me run a server first. So I demonize it so that I can just keep a big terminal here. I hope you all see here, or is it hidden? No, it's okay. Great, so. And now, hmm, how do I reach that? Well, it looks very local. I will try with what I lost. And yeah, I'm being quite arrogant. Come on, I know that it's fast and I can give it 32 megabytes of TCP window and zero copy and disable mega algorithm. Right, maybe, look, even, let me do two flows. Great, 60 bits per second. So, right. Okay, now I can reveal to you the trick. So, I'm not rude. I created a network interface and this thing is pretty fast. Let's look at some diagrams. What's the trick? So, how does networking work when you are nobody? Or, well, in my case, you are as preview, but it doesn't make a real difference. So you don't have rude. I don't have CapNet admin, right? I didn't show you, but trust me, I didn't cheat on that, so I cannot have interfaces. But, Linux allows you, if you detach your user main space and the network main space, at the same time, since Linux 3.8, which is a few years ago, five, six years ago, I think, at the same time, we can actually create a network interface somewhere. So we can actually create a network interface because we are UID0. So we are, you might call it rude for convenience, but that's not rude. That's just UID0 in our main space. And UID0 can do a lot of things, like creating network interfaces. Oh, and there is the Tantab driver. It's a kind of fold implementation, but quite useful. This thing creates a network interface on one hand and on the other hand, you have a file descriptor. It's not a socket, it's a file descriptor, so sockets can be represented by file descriptors, but it's not the same thing. And on these file descriptors, you get frames, and you can write frames. It's not frames. So the whole thing, you know, like it go on a cable and have it here, so it goes, oh, no, it goes in here. Right. And then we know that regular users can open TCP and UDP sockets, for sure. I mean, when you start a browser, you don't do pseudo-file folks, right? You just start it. So are you thinking what I'm thinking if we do this? So we have the network interface. We just created this top device, which gives me the internet interface inside that. And down there, I have the internets. And then I know that as a regular user, I can do TCP and UDP sockets. I just need to fill in that something. Not saying that it's so simple, but it starts looking doable. And that's something, namely, needs to take the internet frames away, take the IP, sorry, the internet header, the IP header, so we put the payload into layer four sockets. And then I think we are done. And when we get something from the internets, we need to ask the kernel. So we are, this something is a user space application. And we need to ask the kernel, where is this bucket coming from? And then tell our network name space. And we have a number of ways to, so we have two addresses, essentially, right? So just a reminder for everybody. Layer one is the physical network, physical layer. Layer two is something where you can put bytes on, right? The data layer. So, and then you have layer three, which is IP or can be other things, but in our case, it's IP. Layer four is a transport. So TCP, UDP, ICNP, DCCP, whatever you want. And then the other layers are more related to, yeah, YouTube. So, great. And why am I doing this? So what's the point? No, not because we can actually. Yeah, also because we can, otherwise we wouldn't be doing it, but the important thing here is that I don't have root, I don't have captainet admin. So, if I'm doing a container like I just did Podmon and somebody hacks it because they didn't apply security patches or because I'm dumb and I just map ports without authentication or something. Well, I have no embarrassing consequences or limited embarrassing consequences. It could be much, much worse. And let's say, even if nobody does that, we have the safety that what a user can do is just open and connect and bind to TCP, UDP, and ICNP ping, so the so-called ping sockets. So that means I have the safety that nobody will spoof packets because if you can send arbitrary frames, like if you can spoof ARP, you are in control of a network essentially, you are telling everybody, this is me and this is him and this is her and, you know, so that has serious consequences if you can do arbitrary, crop type arbitrary frames. And it can be quite fast and also flexible now. So, I would start saying that it looks like a good thing. So, what would have been the alternative? It would have been that I went there to my server and I created a container and then I would have done IP add something, created a bridge, did something with not filter maybe, just to, you know, drop a bit of the really suspicious things like ARP frames that come out with totally random MAC addresses and stuff like that. But that would have involved that I needed route. And I realized there was a talk about that actually yesterday. So, I want to also say that this is a limitation for some applications. Maybe you're really running ARP proxy inside a container if you want to do that, hey, you need route. But then there is a good reason for it. And, okay, that's our ugly workaround, right? I mean, I'm panning frames, I'm panning headers, removing headers in a new space. So, you're going to fix it right now. I don't think so. I don't think there is something to fix. So, there are reasons why this is not allowed. And I mentioned some of these reasons. So, right, I mean, an unprivileged user shouldn't be in control of the network as well. So, this is kind of, you know, there are many ways to divide unprivileged and privileged user. That's pretty much the Linux and BSD way, other operating systems to completely different reasonings. There are, I heard of operating systems running UI at ring zero, but Linux luckily doesn't. So, and, okay, let's say like in the talk yesterday, we do it for them. But are they really unprivileged then? I would start having some doubts. So, I think that if a network interface is, if a network name space like we did as an interface, then we should only, we should follow that philosophy. Okay, it doesn't have root for a reason and we should stick to it. Because, yeah, these are the few advantages we really don't need to even debate whether it has privileges or not. And actually, yeah, implementing appending headers and removing headers is a way of ensuring isolation. We are in control of these and we check that, you know, the kernel once forces us to append those headers as we send packets out from the bottom of the diagram. Right, so when we are there on the layer four sockets, the kernel doesn't allow us to say, to put an IP address and the checks on there. No, it says, I do it, I take care of it. Don't trust you. So, that's how you implement isolation. I cheated a bit in the demo earlier. So, we were actually in the last case. So, I didn't do so much in that thing because, yeah, I used a local connection in the demo, it's a bit easier. And in that case, I already have layer four sockets because I am doing a connection from my perf from the container to the host. Well, host is a bit of a misnomer, right? But it's still the same host. It's a different partition of the host, so to say. And there I already have layer four sockets and I can just splice this. Splice is a system call in Linux that just allows you to splice to a pipe from a pipe to a socket and from that socket to a pipe, pipe to socket. And you don't need to do anything special, however. Of course, we are just carrying payload there. So, that means we have no addressing and that means I can just use a loopback interface to do that. So, if we are actually, we want to go to the internet, we need to do this trick that I was mentioning earlier. So, we really need to append others, remove others. Otherwise, so if somebody is familiar with the Podman situation as of a while ago, there was a way to be really fast with this trick. And with some amazing tricks, on the other hand, you would lose the IP address from outside. So, all the traffic would look like it was coming from the host and that's not so convenient if you have applications inside there that need to, for example, authenticate based on IP address or filter or route. So, okay, now we pretty much got, so this is pasta, okay? That's not the thing you eat anymore. That's something in between. The acronym, I don't even remember it. I mean, if you want to just go to the website and it's written there. So, we got what it does. I just wanted to present a few peculiarities that makes it, in my opinion, reasonably safe. So, we don't do dynamic memory allocation there and funny because we are dealing with packets, so it would be natural to read from a socket, allocate 100,000, 500 bytes or a bit more than three when we sent it. Now, if we are careful, the kernel has already buffered, so we can use them. We just need to avoid dropping things from kernel queues before it's time to do so. So, when I get a packet from the socket, I need to remember that my container needs to read it and maybe it loses it. Maybe nobody's reading it or maybe they are too late or maybe it's out of the congestion window because I sent too fast. So, I need to keep it there. I have no other space. I'm not allocating memory. So, I can use message peak. So, message peak is a flag for the receive and receive-like system calls that allow you to, yeah, just look at it. Don't drop it. No, let's pretend I didn't read it. And these avoid some classes of memory-related potential security issues like, I don't know, a double free heap overflow and stuff like that. It's not completely safe. I still kind of stack overflows. It's a bit harder perhaps and it's a bit easier for my mind actually to keep track of the stuff I'm doing which is probably an important factor for security. The TCP adaptation. So, there is some TCP adaptation. This thing needs to keep track of the connections. And, however, we have already two stacks around. It's actually, in the case of a container, the same kernel and two instances of TCP stacks. So, we don't need to do really much in terms of congestion window and keeping track of metrics and expanding the window, shrinking the window. How much memory do we have? No, we can ask them, what is your congestion window? Yeah, okay, use it. It doesn't do that if you want it. If you don't want it, that's why it was so confusing perhaps, but it can also be convenient. So, we have the same addresses inside, podmon and outside. Full IPv6 support, but I hope maybe we don't have to mention it in 2023. Yeah, actually, I could have just went gone into this pasta config that I showed you earlier and asked for an address via HPv6 or DHCP. Did I say past? Right, yes, I did. So, let me cover a bit of this project history. It's not original at all. It's a big scam. So, Sleerp has been doing that for 18 years actually. Sorry, is it 10 minutes, but 15? Ah, to the questions, okay, not to the end. Yeah, great, thanks. So, Sleerp has been doing that for 18 years. I'm really not presenting anything new probably. We started it for virtual machines, namely, CubeBear developers came to my team and us, but we don't want to use Sleerp because it has a bad name, but it's really convenient. Can you do something like that? We want to run our container with virtual machines that's CubeBear without root. And if possible, we would also like to avoid that. So, we started it for virtual machines and then at some point we realized that containers had except the same thing. So much that, for QEMO, you have this Sleerp Sleerp. So, Sleerp comes from the 90s, right? The story is kind of complicated, I will not cover it for the sake of time, but it was a way when universities started offering Dial-up Shell accounts to students or professors to have a natural internet access by tunneling everything you wanted into your Dial-up connection that was supposed just to connect to your university server. And then from there, you add routing, so you could, you know, if you tunnel everything into that, then you could reach whatever you wanted, not just the resources of the university that nobody cared about. So, this is, right, that's an old trick and somebody had a really brilliant idea and my opinion, to use it for QEMO and then for another brilliant idea for Polman. And there is something similar for Docker, actually. So, there is something already very similar. Also, this Sleerp for Net-N-S is like pasta and Leap Sleerp is like pasta. Both acronyms are available on the explanation. The acronyms are available on the website. So, but yeah, it had a bad name. Well, okay, nowadays you leak 50 bytes in one day and you get a CV, yeah, fair. However, also performance-wise, it wasn't really meant for, you know, the bazillion bits per second that we need to have. Nowadays, it was meant to, yeah, dial up, you know, a bit more than Telnet, post-BBS era, I would say, or still BBS era. So, it doesn't support TCP window scaling, which means 64K is all you can send and then, ah, okay, 64K more, that's slow. IPv6 support wasn't really there. So IPv6 was, actually introduced a bit before, but, you know, there was no reason to use it. We still had plenty of IPv4 addresses in the world. And, right, so we realized that Pasta was born and then, with a lot of help from Podman's development team, we shipped native integration, I just showed you, in Podman 4.4, that was in general this year. Since two days ago, this is now supported in Bilda, so if you're familiar with it, that's a facility creating container, that can be container images in Podman. It's supported by Libbeard, Cubebeard, StackReview. We are now very few, but very committed developers and lots of occasional contributors, and Podman users came up with everything possible. Let me cover recent developments, just in case you followed this project recently, so somebody said it's not possible. Okay, great, what can we do better? So, this only applies to VMs, so it's not really Pasta. We have a unique domain socket to QEMO, that means copy to the socket, copy from the socket, QEMO needs to do that and Pasta needs to do that, okay? We can just bypass QEMO altogether with your user, and it's actually faster than, don't quote me on this, but because we don't have very nice benchmarks yet, but it's actually faster than whatever you could do with truth and leverage. So, yeah, we now copy all the addresses and the routes, not just one, you saw that I had so many IPv4 addresses there, because we had some problems with cloud environments, some users reported, and WireGuard almost works out of the box, finally, that's a bit complicated, but it looks like a popular use case for Podman users. So, this new use case. One funny thing somebody came up with is, what if I want to just throw away containers, deploy them quickly with their own address, and they're gonna tell the host anything? Interesting, with the pv6, it's actually the case. Sometimes you have a slash 64, so yeah. It's actually possible, so in that case, Pasta would just bind an address that the host doesn't have, that nobody has ever seen except for the prefix. So, you can just have a completely separate container with its own address, and assign it with an IP responder that's built into Pasta, so we advertise the prefix and nothing else. We assign a pseudo-random MAC address, we take care of it, we keep track of it, and we could actually deploy a lot of containers without really knowing much about the network, or really not thinking much about it. And some people already might be long for it, but it takes a few tricks to set up, and it's not ruthless. Another less funny, maybe more less visionary, but this starts being important. You have IPv4 applications, maybe you don't have a source code for what they are IPv4, and you have an IPv6 only set up, maybe somewhere in central Europe, this starts being a problem, you start paying quite a lot for an IPv4 address. So, there are RFCs for this, and Pasta could be a good candidate. You can find more details in the bug reports there. Another one we really need to take care of now is, how do they do part forward in there? You didn't even see it because I didn't do it explicitly, it was automatic. And you can do it with Podman configuration options and everything, but for Docker, actually they really need it by a rootless kit. And also for Podman custom networks, it would be really nice if Podman could happily decide without stopping or restarting the container or Pasta itself actually to map a new port. And we are with that going into a general life flow stable. We don't want to implement OpenVswitch, but we are dangerously close to it. Start preparing your questions now. So, if you want to try this out, please do, and please especially report box. We have mailing lists, we are a bit old school, but don't be afraid to mess up, we understand that many people are not used to patch-based or email-based workflows. There is an IRC channel on Libera, it's past. There are weekly meetings that are open to everybody if you just even want to listen to the dramas we have. It's kind of quite funny. And all this information, you find it at past.top. Credits, yeah. So, David has been reworking a lot of stuff recently. Laurent is taking care of the bazillion bytes per second there. Paul from Podman's development team has got it to build out this weekend and he always helps a lot. Lami is a package for Void Linux and he presented a lot of new use cases, landing the Libera integration and you on this, making the kernel nicer to us for trying to. And we have a lot of, really, a lot of contributors and packages, so packages are available for Arch, Debian, Ubuntu, Fedora, RL, a few more probably. Questions. So the question from Paul is whether, when I was referring to the cheating case where I just have local traffic, I'm referring to Routeless, the top. Yes, yes. Okay, so, does Routeless Kit allow, is Routeless Kit the part that allows us to skip the setup to just create a bridge to the host? No, actually, so Routeless Kit, what it does, okay, Routeless Kit is a pile of things. In this case, it's what Routeless Kit does, the trick of copying packets back and forth, not with supplies, they use receive message, send message, it's not very different, but that's the part that it does. And you can, you have actually a default port for Warder for Podman because it's faster, but you don't preserve the source IP address. So that's the part it does. Yeah, so first question of two is, if this can be used with Podman network commands. So as far as I know, yes, since Thursday, but you probably know better than me. And if you don't, let's check it out. But I think Paul just did this because it was, I just see it with Podman run, as you've seen, and I think now, finally, the code has been moved to the proper place, so we can actually use it for Podman network. I think that's my understanding, yeah. So the second question is, when will Pasta become the default in Podman? You suggested 4.6, right? I hope we are on track. Yeah, quite a bit right, but yeah. Right, so the question is, I showed that I would connect to localhost and connect to the host, and maybe I want to really connect to localhost as localhost, host, host, host, like, yes. And yeah, actually, one of the reasons why we are reworking the NAT model is to allow to, right now, we just allow to disable this functionality. So to map nothing or to map everything or to map, like, the address of the default gateway to the host, but it is not very flexible. So we are actually adding more options. And what the default is, it has to be seen. We need to check with many people, like, what would they really expect? Because if you come from VMs, you have different expectations of, yeah. Right, yeah. I can check on matrix. Okay, thank you. Hello? So good morning, everyone. I'm Stefano Gardorella, and today with my colleague, Hermann Maglione, we will talk about Ublock, virtual block devices in the user space. So these are the main topics that we will cover today. We will start with understanding what are virtual block devices and what are user-for-for. And then we will do a brief introduction of Ublock and IOUring, which is the main feature that Ublock used. And then we will take a closer look at the Ublock driver, the user space libraries, and some use cases and examples. And speaking about virtual disks, we will cover also the QCov-2 format image, image format, and also we will see how to reuse the QEMU storage feature without virtual machines using the QEMU storage daemon. And finally, we will also talk a bit about the evolution of QSD, the RAS storage daemon. So virtual block, I mean, block devices are usually those devices that we use to store permanent information. And they can be like an DME or magnetic disk. So we usually have fixed side blocks that we call sector. But in this case, we are talking about virtual block devices. So with virtual, we mean devices simulated by software. So why we need this? The first use case, of course, is obvious, is virtual machines, but we can also use these devices without VMs, because the disk image format that we have for virtual machine like QCov-2 offers a really nice feature, like growing images, snapshots, backing files, that we can reuse also, I mean, in some use cases where the isolation is needed. For example, with containers. And another reason to use virtual block devices is to mount, for example, into the host disk, virtual disks, in order to do some kind of manipulation. And finally, the virtual block device allow us to use natural distributed storage, like NBD, Iskasi, and SeperBD. So Linux already provides several implementation of virtual block device in kernel. And for example, the loop device, NBD, RBD, and some others. But of course, for example, for SeperBD, the entire protocol is implemented into the kernel. So an issue into the protocol implementation could be, I mean, a big issue, and propagated to the entire kernel housing panic. So for this reason, it could be interesting to move at least the protocol implementation in user space. So of course, we still need a small piece of kernel code, the module, to do, to interface the Linux block layer with the daemon, it's emulating the device. But I mean, the entire implementation of the protocol could be moved in user space. So just to summarize why we need to do this in user, we can do this in user space. Yeah, the usual thing, it's safety, isolation, maintainability, portability. Of course, we need to pay a trade-off that is performance. And this is the main reason of Ublock. So the idea is to try to fill the gap with the internal virtual block device because Ublock is based on IOuring and try to take advantage of the high performance of IOuring to move the device emulation in user space. Ublock was introduced by Mengele, pretty recently in Linux 6.0, and essentially is a kernel module, Ublock driver, that exposes standard, I mean, regular Linux block device to the user space and forward all the requests to the daemon using IOuring queues and shared memory. We will see later. So Ublock, the Ublock driver provides several interfaces the first one is the Ublock control character device. So this device is used to do the setup, I mean to allocate the device and set up it. And when we created the device, then we have two new interfaces. The first one is the Ublock B0 in this case, in the picture, which is a regular Linux block device. And then we have also the Ublock C0 in this case, that it is the character device exposed to the daemon for them, I mean, to handle the request. So essentially what Ublock does, every request that an application or a file system does on this regular Linux block device will be forwarded to the daemon through the Ublock C interface using IOuring queues and shared memory. So the shared memory is used essentially to, forward all the request information. And the IOuring queues is used to send exchange notification between the kernel and the application. We will see more detail in a bit. We can do like now a brief introduction of IOuring. Maybe most of you already know so, but yeah, it will be useful to refresh. This set of slides, we already talked about that in a previous talk, I put the link. So today I will go really, really quickly. But if you want more information, there is a link here. So what is IOuring? IOuring is a Linux interface that is useful for doing asynchronous IO. So it was introduced by Ansak's boy in Linux 5.1 and initially was focused on block request, but then it evolved it to support more and more system codes. So now it becomes like a generic framework for doing a system call in a synchronous way. The interface is pretty, pretty simple. There are two rings queues that are shared in memory between the user space and the kernel, and three system codes. So we have two queues essentially to avoid content. So there is one submission, one queue, the submission queue where the producer is the user space, the application, and the consumer is the kernel. And then there is the completion queue, that is the other way around. So the kernel is the producer, and the application is the consumer. So talking about the system call, the first one to be called is the IOuring setup. So almost allocate the memory and set up the context. Then we have the IOuring register. It is also used only partially in the configuration phase. It's allowed essentially to register resources that are often used during the data path. So in this way, the kernel doesn't need, for example, to remap user buffer every time and can also use it to register files, the script or even FD and other things. And finally, the last one is IOuringHunter. This is the main system call used during the data path. And it's used by the application to notify the kernel when there are new operation to do in the submission queue and also to rate the completion of that operation through the completion queue. All of these things, anyway, we have a library, LibEuring, that is available. And it provides a convenience API that hides all of these system calls. So now let's take a closer look of how the IOuring queues works and how an application can submit things. So the first thing that application needs to do when it has a new operation to do, for example, a write operation, it needs to produce a new SQE, that means submission queue entry, and need to fill all the fields like opcode. So there is a write opcode. And then all the operation that we usually use, for example, in the write system call. So the file descriptor, the address of the buffer, the offset, and other things. The application can put multiple SQE before calling the system call. So when the application puts the SQE into the submission queue, update the tail of the submission queue. And when it's ready, invoke IOuring enter to notify the kernel. At this point, the kernel starts to consume the submission queue entry and update the head of the submission queue. And essentially store internally the information of all of the operation that needs to do. Now the kernel starts to process the operation. Can be do in any order. So the only way to link the SQE with the SQE, that means completion queue entry, is the user data. So the user data, I mean, it's a 64-bit value. It's completely opaque for the kernel. And the application can put anything. So when the kernel completed the operation, it produces a SQE, and it will essentially copy the user data from here to here. In this way, the application can link a SQE with the SQE. And so when the operation is completed, the kernel produces a SQE, copy the user data, and set the result and flags, and updates the tail of the completion queue. Also in this case, the kernel can produce more than one completion in one code. So when it's finished, it returns to the user space, and the application can consume the SQE, updating, in this case, the head of the completion queue. So this was essentially how IEuring works. Now there are a lot of other features and a lot of other details that we can cover. But one of the features that I want to show you is this one that's called IEuring Pass True Common. This pretty recently was introduced in the Linux 5.19, and it consists of a new code, the IEuring Common, and two new flags that essentially allow to double the size of both SQE and CQE. So why it's useful? So the Pass True Common is essentially a way to do asynchronous command for a special file, like a character device that can be exposed by a driver, a file system, or any kernel component. Mainly it's an asynchronous alternative to IOC tools. Essentially both of them allow to define new special command from kernel code without implementing new system codes. And now it's used in the kernel, and the first user was the NVMe subsystem that essentially used it to replace IOC tools. So for this reason can be used as an alternative. And another user is UBlock, as we will see. So in very short word, the user space used this mission queue to send arbitrary command to the kernel, and the new flag allowed to send up to 80 bytes as a command data. And then the kernel one used the completion queue element, queue entry, to respond back the result in a completely asynchronous way. Now let's take a look on how UBlock used this IEuring Pass True Common. So the first device that we saw was the UBlock control, and it provides several commands through this interface through the IEuring Pass True Common. The first one is up there, and it's the command that is used to allocate the device and set up most of the parameters with the demon point of view, so the number of queues, the queue size feature. When this command is successfully completed, the UBlock C0, this one, is created, and it will be used later for the data path. Another command used for the UBlock control character device is the setParams. This command is essentially used to set up the parameters of the standard Linux block device, like number of sectors, the block size, and attributes, other attributes. So when the application configured all the things, and it's ready to start to process the request, call the startThatCommon. And after this command, the UBlock B0, the standard regular Linux block device, is created. Of course, we have other commands, but we don't have time to cover. The last thing the application needs to do before starting to process the new request is to map the shared memory with the UBlock C0. I mean, on the UBlock C character device. This shared memory contains one structure for each queue element and that are identified by a queue index and a tag. This is the tag coming from the block layer, from the multi-queue Linux block layer. And we use essentially the tag as an index into the queue. This shared memory is essentially used by the kernel to write all the information for the request that the user space demon needs to handle. So at this point, the demon is able to start to serve the request coming from the block layer, from the block device. So what usually the UBlock Demon should do is to fetch and commit request coming from the device using the IOU in common with the UBlock C0 character device. So the SQE in this case are submitted by the demon to notify the kernel, for example, to notify that a request is completed. And the SQE is used by the kernel to notify the user space demon that there is a new request to do. And as we saw, to link the submission queue with the completion queue entry, we have the user data. And this is a opaque value for IOU ring, but also for the UBlock driver. So it's only used by the application, I mean by the demon, to link an SQE with a SQE. Now let's take a closer look of the IOU ring common used on the UBlock C0 interface. The first one that should be used is the fetch request command. This command is issued only once per each Q element at the beginning of the application. And essentially, it is used by the demon to tell to the kernel, hey, I'm ready to process a new request for this lot. And this lot isn't identified by the QAD and tag. And the application must also provide a pointer to the user space buffer that will be used to handle that request. In this way, the kernel can map, for example, put the data there. At this point, the application tells to the kernel that is ready for all the slots, almost. And when the kernel receives a new request coming from the Linux block device, it generates a new CQE and put into the completion queue to notify the application, hey, we have a new request to complete, to do. At this point, we need a new command to the application need to use to tell to the kernel, hey, I completed the request. And it is this command, commit and fetch request. Essentially, it's kind of optimization. So it's a single command that allow us to allow the application to tell to the kernel, we completed the request, and now we are able to handle a new request for that slot. So it's much similar to the previous one. The only difference is now the result field is valid, contains a valid value that is the result of the application, of the request completed. The rest is pretty much the same of the previous command. So the CQE are always used by the kernel to notify the application, we need to do another request. This was the overview of the kernel stuff of IORU, IORU in a new block, and now let Hermann talk about the user space. OK, sorry. Well, how we can use that? Currently, this is very experimental. So not only the kernel API is changed at a lot, but also the user space API is changed at a lot. So one of the options that you have is to use the UBD server from Ming. It's also a very fast-moving target, so there is a lot of development inside. Currently, it's just developed to be used as a server directly, providing some of the block devices that include that. For example, QQ2, that is one of the motivations to develop this new block. There was many previous attempts to put QQ2 inside of the Linux kernel. But it's a quite complicated format and a big target. So that's the idea. Inside of the UBD server, there is the library. They leave UBLK server. But the still is also very experimental. So the API is changed at a lot. But there is plan to, in the near future, to take that library to outside and be a proper library to be used directly instead of trying to use that. SPDK also supports UBLK, but they have their own internal driver. Basically, currently, it's one of the easiest options is to write your own library to interface directly with the kernel if you need to implement it. And they do that. Also, there is Rust version, developed by this fine gentleman. Yeah, this gentleman. Yeah, I have all the country for me. Sorry, I lost the thread. But currently, since the kernel, especially the kernel, is like a moving target, we are waiting until at least the kernel API stabilize a little bit. And also, I think Ming is interested to continue the working in the idea of that thread is also to provide, of course, a Rust API that is safe and difficult to misuse. And maybe a small and very simple C bindings. So the idea is to be double use, basically. Another option is developed by Richard Jones, one of our colleagues, is NVIDIA UBLK. He wrote, it takes the client, the NVIDI client, from the kernel to user space using UBLK. So it's quite nice also as an example to how to use this library. He is using the Ming's library currently. This is another very interesting project for Richard that is UBLK, the other, that is part of NVIDI Kit. NVIDI Kit is an NVIDI server client, I always mix it, server, with a plug-in interface. So the idea is you have a plug-in this file, so you can say, OK, this file I spoke to this like as a NVIDI device, or just a memory, et cetera, or a file over SSH. So the idea is to reuse all that framework, that infrastructure of plugins using UBLK. So if someone want to experiment to UBLK, I think this project is the easiest one to start, also because you can write your plugin in Python, for example. So I would sincerely recommend this project if someone want to do some experiment with UBLK, this is the easiest path. As I said before, KUKO2 was one of the main motivations behind UBLK. Currently, the UBLK KUKO2 that is implemented by Ming is basic, it's not fully complete. As I say, it's not that easy format. They have a lot of features. So the idea is KUKO2 has a really nice set of features, and we want to provide those features outside of a virtual machine, for example, especially for container users, or to simply to a better loopback device, for example, with all the features that we have for KUKO2. And giving that, thinking, OK, we want KUKO2 with all the features set, but all the external implementations of KUKO2 are very obviously simple, like write and read, and pretty much that's it. Why we don't use the QMU block layer, the QMU store as the demon, that is already have a lot of features, the full feature of KUKO2, and we just export that implementation using UBLK. So that is the project with Stefano and with Inter from Ourichi, who was doing the last few months, to be able to export all the block layer, the QMU block layer, using UBLK. The QSD currently supports NVD as an export fuse. Those two are not designed to be fast for performance. And the VHOS user block, that user is restricted to VMs. And the VHOS, that is a similar idea like UBLK. I mean, not the same implementation, but the same idea to have block device in user space. This is the project that I talked before, is to extend the QMU store as the demon, export all the block layer using UBLK. We have a proof of concept. It is still the developments we are working on it. So if someone wants to find a very interesting project to collaborate, this is one of those. And also the Rastrade, the guy that makes the Rastrade is a good guy. And also one of the motivations of the Rastrade, by the way, is to export the new RSD, also to have this capability to export the new block layer of QMU using UBLK. The main motivation behind the Rastrade store as demon is a bit complicated, has to be with an ongoing development to have multi-Q in the QSD. And recently it was released version 0.1. So there is a couple of blog posts that are really interesting to read. So also recommend to take a look. And it's also a very interesting project if someone wants to dive into Rast and especially async Rast. So this is the next thing that we want to work. And basically, it's all I got. So thank you. The question is if we compare the performance of the UBLK in user space with other currently options, for example, MVD, not yet. We have some fake numbers with not proper benchmark. We expect that, for example, the MVD of use will be faster because we don't need to use the whole network stack, for example. Of course, MVD also can be used in a unique socket. But also those were not designed for performance. And also we still think we can tune IOEuring more because the thing we IOEuring, all the IOEuring-based stuff, is if you want performance, you cannot just throw IOEuring to the program and try to make it faster. So you need to design it from scratch to be supported. So the idea is to complete the UBLK export and then the proper benchmark, especially with MVD use. That is probably the main competition. The question is if we can use this unprivileged without root, basically. Currently, yes. It's a recent development. We didn't talk about that because it's, well, unprivileged user support was a feature recently added. So yes. It's a student development. So some patches in the mailing list, but not yet merged. But anyway, UBLK will gain the possibility to use unprivileged users. Yeah, in the user space, it's already done. But the kernel party is missing. But of course, you can compile those patches and have them. But yeah, that's in the near future. The original motivation also was to provide this for containers. Yes, excellent. The question is we submit request and complete, I mean, and read completion with IOEuring comments. So if we have requests in flight, we take new requests only when we issue a new IOEuring comment. Is that the question? Yeah, I mean, you issue a new comment to fetch the other request to do. But yeah, in the meantime, you can still serve the other request, and you don't need to complete all of them. Yeah. OK, but you can also, I mean, you can also, if you don't want to complete that one, you can also ask for a new request. Also, if you don't, I mean, you didn't complete that one. No, I mean, the commit and fetch request is per single tag, per queue. So if you have, for example, 100 element into the queue, you are using only the first 10%, you can still ask for the rest. So you don't need to complete the first 10. So in this way, you can do, I mean, you can fetch a new element also if you are not completed the previous one. So for each element, that is, I mean, for each tag, you need to complete, and of course, because it's, I mean, the tag is the one generated by the Linux block layer multi-queue. So if there is, the request in flight cannot be used for another request. The question is, when we use a new block, the device, I mean, you mean the Linux block device will be in right mode, right? I don't know this detail, I don't know. No, currently, the device is not a name spacing. So there's, yeah, basically, the security goes differently. Basically, especially in the unprivileged part is only the family, the three family of the process that created the device can access that device. But it's not name spacing. Unless, that is my knowledge of today. I don't know if that changes in, because Ming is pretty fast to do stuff, so probably this morning is name spacing at all. So, oh no. Okay, well, but this is outside the name space, but you can restrict for the process that are inside the name spaces. But yeah, I get it. The name space is preparable, yeah? So, an idea that we can work on that. If you have any more questions, you can ask personally. Okay, thank you again, sorry for the short delay. And we have Eric with us, and we're talking about a summary that we can start. Thanks very much. So, I gave a similar version of this talk at FuzzDem. And pretty much the same talk, it's just an updated version. It's just a general update of what we do in the ASAHISIG in Fedora. And when I arrived in Brussels that time, and I went to do the presentation, when I connected HTMI to my Mac Mini, and it didn't work, but there's a part of the Apple Silicon, it's called DCP, or Display Co-Processor. It's specific to Apple hardware. And we found a couple of bugs, and they got fixed, and it's much more reliable now. And I haven't seen an issue since until today. So, whenever I present, I hit a DCP bug, but that's the way it goes, you know. So, this is an update on Fedora as a high-remix, so it's simply Fedora for Apple Silicon. So yeah, I'm Eric Kirk and I'm a software engineer at Red Hat. I work on automotive stuff mostly. So, why do we care about all this stuff? So, Apple, you guys probably know, but Apple is a very, very, very, very, very, very, Apple released new ARM-based Apple Silicon device late 2020 as of this month, the entire Mac lineup uses Apple Silicon chips. And why do we care? There's actually a shortage of well-upstreamed ARM devices, so this is one of them that's in the process of getting upstreamed, and even the code that isn't upstreamed, it's all publicly available, and the upstream guys are doing their best to keep pushing as much as they can upstream. Another cool thing about these devices is the firmware is unlocked out of the box to run third-party operating systems. So, I'm talking about Linux, obviously, but there's some guys running OpenBSD on Apple Silicon. It's, this is actually a feature of the device. It's not an exploit or anything like that. One thing I find useful is the firmware, it's even unlocked onto the EL2 layer, so which means you have KVM Sport and all those goodies included. So, they're really, really fast, and they're great bang for book. You can get some of these devices now for I think like 700 euros dollars, and they're pretty fast. Why do I care specifically? I work in, I said this already, but Red Hat Automotive, and many of the Automotive words are ARM based, so I end up doing quite a bit of work that requires me to have some kind of ARM environment. So, the Apple Silicon devices allow me to iterate really quickly and build and test my code much more quickly than I would be able to and other devices. And I learned more stuff about modern ARM, hardware and software implementations, kernel space, rusts, et cetera, just from kind of following the effort and getting involved. So, this'll give you an idea of the performance. When I first purchased this machine, I was using a Raspberry Pi. So, this is the number of seconds it takes to build a project I was working with at the time called Lib Camera, which is a C++ code base. And I was doing my development on a Raspberry Pi sometimes, which is the top line. So, that's how long it was taken to build Lib Camera. I also created this kind of Fedora container environment on my Android phone. It worked. So, that's the green and the yellow bars. And the red bar is my company issue laptop. That's just an Intel laptop. And funnily enough, this Apple Silicon device was even cheaper than my company issue laptop, but it's faster. I took those benchmarks like well over a year ago. I think at the time, it might have been even a Linux VM running on top of Mac OS, but yeah, that just gives you an idea of the outstanding performance. So, what makes Fedora Sachi great? We have absolutely amazing upstream folk. I don't know if you follow a few of them on Mastered On or Twitter or whatever, but they're really amazing. I can say how amazing they are. And I'm not gonna go through the list of names because I have fear of mispronouncing someone's name, but they're all there. We also have great downstream folk. So Neil and Davide are in the crowd there. There's Neil, Davide is here, Davide is over there. So they're part of the Sieg and they're here obviously. So if you ever get an opportunity to grab five minutes of their time to pick their brain, I'd always recommend that they're outstanding Fedora and CentOStreme contributors. And we also have Michelle and Leif Liddy and many more. So one of the things I really like about this Sieg and this community is everyone kind of has this upstream everything attitude, if at all possible. So that's one thing that's in the spirit of various kind of ARM related certification like works with Chromebook or System Radio or there's a rail certification as well. And well upstream to ARM devices aren't as common as they could be in the ARM ecosystem. So I really like to see that this community kind of has that attitude. So this is our general workflow. We push as much upstream as much as we possibly can. And then we try and propagate it to Fedora and that also ends up in Fedora Sahi remix which has some forked packages just to make things work. But this isn't the only workflow we use. Like sometimes we have some, there's publicly available code, it's not ready to go upstream. So we maintain copers and that kind of thing for that code. That's not quite ready for upstream yet. Yeah, this is kind of another thing. In the best case scenario, basically this Fedora Sahi thing doesn't really exist anymore. You can just install, it's standard Fedora out of the box pretty much. And an Apple Silicon device, so absolutely since it would actually be a success. These are just some forked packages we have in copper at the minute. There's not as many as you would think. There's U-boot, kernel, kernel edge, a slightly different version of Mesa and a handful of others. And some of those will become obsolete in time also. Yes, there's actually three types of Fedora kernels, that boot and Apple Silicon devices at the moment. So the actual Fedora kernel, like kernel arc that has Apple Silicon partial, Apple Silicon support merge, so it boots but it's kind of more a boot to shell experience at the moment. Like don't expect to accelerate the graphics and everything to work. So yeah, we test and enable configs as support arise upstream. This is built with 4K page size, which is kind of the most common. Page size across various CPU architectures. So not everything is upstream with support for 4K page size, at least not yet anyway. And the Apple Silicon hardware, same macOS runs with 16K page size. So the hardware is actually designed for 16K page size. So if you run the standard Fedora kernel, you take small performance hit there. But there are advantages to that because you have increased compatibility when you use a 4K page kernel. So there's trade off there. This is one of the kernels we maintain. It's the Fedora Sahi kernel. So this uses all the stuff from the last kernel we were talking about. And it merges in another kernel from the upstream guys that adds extra yet to be upstream packages. So we enable even more configs and we build it with 16K page size instead, which gives us increased hardware support on the devices. This uses software render graphics. So you use a simple DRM and you guys are familiar with what that is. And now we have another kernel. Our kernels are based on the branches the upstream Sahi guys create. So one of those kernels is called kernel edge. I just said I'd describe edge in this context because edge is so fashionable the last few years as a term in general. But edge in this context, all it means is it's a kernel with additional experimental features. So this uses the last kernel as base and adds a couple of more patches and more configs. So the only difference at present is this one also has accelerated graphics. So as a user, that's the only difference. But under the hood, if there's a couple more differences, like this is pretty much the only Fedora kernel, it's the only Fedora kernel out there I'm aware of that's built for us for Linux support. So this is to enable the kernel space site of the GPU driver. And actually this is gonna be promoted to the main Fedora Sahi kernel in the coming weeks or month. So the difference between the two kernels might change. We'll have a discussion about that with upstream and in the seagull, whatever. So we also had Clang LLVM as build dependencies as required by the Rust tool chain because you need those build tools to build Rust code at least at the moment. That's the only compiler that can do it. And we have a forked Mesa package. But even that forked Mesa package, all that code is being actively upstream. So that fork may well, that could be one of the next packages to disappear without giving any timelines. So this is just an example that in the recent kernel release 6.3, I think it was released about, I don't know, two months ago, six weeks ago around that time. So there's many examples of a Sahi helping the community as a whole. I thought this was a pretty cool one because up until 6.3 on any big, little device, so a device with different types of cores, you had to pin VMs to a specific set of cores. So you couldn't mix, match, say performance cores and efficiency cores. But this really talented virtualization guy, he made it possible so you could use all the cores in a guest VM on a big, little machine. And he used the Sahi Linux to do all the work. So I thought that was pretty cool because that didn't benefit a Sahi, it benefited loads of MSOCs and even, there's an x86 Intel chip that has big, little cores. Yeah. Is it Alder Lake or is it Alder Lake? Yeah. Yeah, yeah. So yeah, it even fixed that issue on non-architectures as well, so I thought that was pretty cool because they did all that work on a Sahi. I pretty much gave this talk at Faltem. So this is just pointing out some of the differences. There's loads of differences, so I just summarized a couple. There's been some DCP fixes for HDMI. Accelerated graphics have come a long way. They're really mature in performance. I don't know if you watched the streams. I'm gonna talk about them in the next few slides, but some of the graphics work that has been going on is pretty amazing and they have some really demanding games in that running on Apple Silicon now. We have Fedora branding. Because most people are running 16K kernels and a Sahi Linux, there's been page size fixes and all sorts of user space applications. This is just a couple I listed off the top of my head. So initial support for DM2, SOC, and many more things. Just a slide I added this morning because I was talking around with the idea of showing off the accelerated graphics, but in the interest of time I don't want to do that and a Sahi Lina, that stream on YouTube, she has so many examples of that. So if you're interested, feel free to check out that stream. Hector Martin, those streams. Neil Gaumplet, those some streams of downstream work. And yet they can be pretty interesting. This is the question we get all the time. Can I use Fedora Sahi as my daily driver? And you may, it depends on what you use it for. I've been using it for over a year now every day from my work, but it really depends on what you want to use it for. So to get an idea feature complete list under upstream Sahi Wiki, that's a good reference point because it tells you what works and what doesn't. Like two examples of things that don't work are sound and camera, although there's progress being made there, especially in sound. Like there are workarounds for these things, like Bluetooth works perfectly, pretty much. You can use a USB camera, et cetera. So I asked a couple of days ago, should I share this link? And nobody complained, and one or two people said, yeah, sure. So we still deem this unreleased, but this is the link to the installer if you want to try it. It's up to you. It works pretty well, but don't expect things to be perfect. Forks, so back when the Sahi guys initially released, there was loads of different Sahi forks farming. So one of these is the leaf lady fork of Fedora, and he's actually a part of the SIG these days. So he has this fork also of Fedora, which it's still maintained this day. It consumes the same pancakes. What that's supposed to say is that it's 99% the same as this one. It consumes the same packages, not pancakes. So it has a more minimal variant of Fedora Sahi, and it uses an OS composing tool called MKOSI, whereas the SIG images use Kiwi. Another interesting tool leaf has on GitHub, actually, is because the Sahi chain loads eventually to a Ubooth UEFI environment, you can boot over USB if you interrupt Ubooth at that point. He has a cool project, actually, that allows you to create flashable USBs pretty easy. I think that project is pretty nice. That should not say kernel edge. That's just some various links. And that's it. Any questions and answers, or any questions? So the question is for YouTube, for the YouTubers, what is the difference between the normal Fedora kernel and the Fedora Sahi kernel? Oh, yeah. The official Sahi kernel is more designed to run on Arch. So this kernel is based on the real Fedora kernel. The kernel you're referring to is designed for Arch, and the kernel config options are just completely different, but they are very similar. The easier way to answer this is, one, when Arch started from Fedora-paping config and then added the stuff that's needed for applications in our programming. So we use the same Arch set on the left-wing tree. I think this Arch set applies on the Fedora kernel. He puts his things on top of the mainline for-all stream and then builds it. Yeah. Why I prefer the Fedora kernel is because I know everything that works on this laptop is going to work. I know if I run Podman, for example, that all the kernel configs required for Podman are turned on. But if I use Hector's kernel, that might necessarily be the case, because he just turns on the minimal, well, whatever is required for him, whereas this Fedora kernel has everything enabled on the standard Fedora kernel plus the Isahi stuff on top. The question is, is the Fedora kernel less stable than the Fedora-Isahi kernel? The question is, is the Fedora kernel nowadays less stable? The question is, is the Fedora kernel less stable than the Fedora-Isahi kernel? The question is, is the Isahi kernel, the Arch kernel, as stable as the Fedora-Isahi kernel? And yes, they're on par. Because what Neil will do or I will do, we just merged the two kernels, basically. You have the best of all worlds. So yeah, they're on par. So anything you see, most of the things you see Hector do on his streams, you can do it on Fedora also. So, yeah, question. I can see Neil like, oh my god, so much pain. It takes me a bit of time to talk to a big kernel. Yeah. Okay, so the question is, how hard is it to build a Linux kernel with Rust support enabled? It's getting better. It used to be more difficult. I even have a contribution that made it slightly easier. It's pretty easy. You just install Clang, a little VM tools, and it works. But we do have an issue at the moment as regards versioning. You have to have a specific version of the Rust compiler to build without errors. And an issue we're starting to see kind of regularly is Fedora, Rahide, or not even Rahide, just will bump their version of Rust C. And the kernel guys haven't arrived there yet. So we'll start to see build failures. So that's an issue for us. We're talking about long-term solutions. That problem that we did has something ready. Well... Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. So for the YouTubers, just to summarize the conversation in the room, we haven't fully decided on a long-term solution, but what we're thinking of is we're thinking of building kind of a kernel Rust package that basically pins the Rust compiler to a certain version just for building the kernel. But we haven't fully teased that out yet, so... Yeah, but it's in progress. Are there any other questions on the matrix? If there are no other questions, I guess that's it. Okay, thanks very much, guys. Welcome back. I hope you had a lovely lunch. Now, before we start, I'd like to mention a couple of things. Don't forget that there's a part tonight. If you don't have your pin, pick it up at the reception. Also, in the matrix room, if you want to ask questions and you're uncomfortable asking a lot, you can ask there. Also, don't forget that tomorrow, don't miss the last session because there's always a quiz, quietly vocal and some prizes at the end. That's it. So, let's go to Hamilil from Red Hat and we'll be talking about Red Hat's talents. Okay, hello, everyone. Okay, let me... I'm Platy. Okay, good work. Okay, hello, everyone. I'm Hamilil from Red Hat Network Service Team. And today, I will give an introduction about the Linux tunnel. Okay, here's the Internet. First, we will talk about what's tunnel and then what tunnels you really use in Linux. And then last, we will talk about what tunnels could be chosen for cloud network. Okay, here. First, what's the tunnel in the real world? It's an underground passageway and vehicles could go through it from one end to another end. And here is what tunnel looks like in the network. Linux tunnel in the concept is... it could encapsulate a network packet within another protocol and transmit over a network. It allows you to create a virtual network link between the two endpoints and provide security and private communication over the existing network. So, let's see some... what packet looks like in the network. First, you see is the... Let's see. Here is what packet looks like on the Internet. First is the Internet part and then is IP header. It may be IPv4 or IPv6 in the recent years. And then next is TCP or UTP header. And then the last is the payload or the data it carries. So, what if this is an internal packet and you want to transmit it from one internal network to another internal network through the public IPv4 network. Here is what we do for IP tunnel. Here is the internal packet you have in the private network and you put it in... put the existing or the outer header over the internal network header and then the outer is the Internet. So, this is what an IP tunnel looks like. As the name said, it's an IP or IP tunnel. So, this is simple. And as the name said, it's IP or IP tunnel. So, it's used to connect two internal network, internal IPv4 subnet through the public IPv4 Internet. The outer header is very simple. So, it could only transmit the unicast packet. Okay, this is very old protocol. It was developed in the 1990s. Excuse me. Look, the connection is not stable. Plug it in again. Let me go through it quickly. It was developed in the 1990s. So, after some years, we have IPv6. It was started in the Internet of Work. How to connect the IPv6 networks between the public IPv4 network? We have the SIT tunnel. This was developed at the year of 2005. Its name stands for Simple Internet Transition. The main purpose of this tunnel is to communicate the isolated IPv6 network between the global IPv4 network. After years of development, we also support IPv4 headers. So, in fact, no, the SIT tunnel has already covered the IP tunnel. Okay. But after recent years, the IPv6 network also developed and we have a lot of global IPv6 networks. So, here's the IPv6 version of the SIT tunnel. This IPv6 tunnel and the outheader is using IPv6 headers. So, this is IPv4 version and this is IPv6 version. So, recently, all the tunnels, we talked about the data plan in the network. If anyone captures the package, they can know what's inside the package. So, how to protect our data? So, we have more breaks between each slide. So, how to protect the data? There is IPv6. IPv6 actually has a transmission model and the tunnel model. Here, we will only talk about the tunnel in this talk. So, IPv6 supports two models. One is the edge model. It will do data authentication. Another is ESP model. It will encapsulate the inner data. So, here, the edge model only to authentication for the inner data and the ESP will do encapsulate the inner data. But it also supports to combine them together. So, you have both edge header and the ESP header. So, we can protect our data. But sometimes, we need to connect our network from... We want to connect to our company or school or some private network, and then we need to do some user identification. How to do that? Here, we have a PPTP tunnel and an L2TP tunnel. L2TP is a level 2 protocol, and PPTP is only a point-to-point protocol. So, both of them are based on PPP protocol. But L2TP is a level 2, so it supports to create multi tunnels between the two endpoints. But L2TP... L2TP has a basic capital method compared to L2TP. L2TP only could combine with Epishek or it just talked. So, PPTP is a little faster than L2TP because L2TP needs to do much more in competition. But on the other hand, PPTP is not security and easy to be cracked, so it's not recommended in recent years. This protocol was developed in the 1990s, very old, while L2TP is developed in the year 2000. So, it's much... But as we talked about, PPTP is faster, but not security, and L2TP is security, but a little slower. So, how to balance the performance and the security? Okay, we have OpenVPN. OpenVPN is very famous. It's also created in 2001, and this is default. It's default is using L2TP model. So, it has very good performance and also it uses LibSSL to encrypt the data. So, it's also very... The security is also very good. But it's popular since 2000, so it's already held 20 years. And recently, we have Workout Tunnel. Workout Tunnel is created in 2015, and it's also merged to Linux in 2019. So, it's a modern tunnel. Workout is a very new and open source VPN protocol, and it's much faster than OpenVPN because it has a very simple design and it also has a lower overhead. The overhead is very simple and also it uses a modern encryption protocol compared with OpenVPN, which uses SSL, which is old. So, Workout is more security and faster. So, all the current tunnels we talked about has a fixed header, and it also has... They are limited to fix the inner protocols like we have IP or IP or IPv6. The users need to config different tunnels with different inner protocols. So, how to... Is there a way to not care the inner protocols? We can only use one outer header, an outer protocol. We have the GRI tunnel. GRI is called Generic Routine Encapsulation. It's also a very old protocol. It's designed in 1994 and updated in the year of 2000. As the name said, it's a generic tunnel. So, the protocol is independent. Not sure if it's because my computer is in the show. So, the GRI tunnel is a protocol-independent tunnel. So, it supports IP as an inner header and a PPP. Also, it supports Ethernet header. It also supports to transmit the multicast traffic. As we talked about before, an IP-IP tunnel only supports unicast packets. And GRI supports multicast traffic. Also, its name is a generic routing protocol. So, it supports routing protocols like OSPF. But it's not supported on Linux. Some routers, like Cisco routers, they support OSPF. In Linux, we have GRI tunnel and IPv6 GRI tunnel, which is IPv4 and IPv6 version. There's also a GRI-type tunnel. The difference is we have an inner Ethernet header. So, it could carry level 2 packets through the Internet. And here also IPv6 GRI-type tunnel. And the last is an ESPN tunnel. It's called encapsulate remote switch port analyzer. In the hardware switch, there's a function that it could monitor one part traffic to another part through level 2 in the same switch. But sometimes we want to monitor all the traffic to another subnet. So, with this protocol, we can monitor part traffic through the rotatable Internet and transmit to other subnets. So, the good part is we can extend the basic part monitoring capacity from level 2 to level 3. So, we can see the tunnels can happen at multi-levels. Here we have GRI or IP header, or IP, IPA, or SIT. Most of them are based on the IP header. But also, in the recent years, more and more tunnels happen at the UDP level. Here we have three UDP tunnels. First, the full tunnel is full over UDP. It's all developed at 2014. So, it's a new protocol. But one thing of using UDP tunnel is UDP works with existing hardware infrastructure. Like the RSS in EEC, the RISU side scaling, and the EMCP or switch, the ECOS multi-pass routing protocol. And the other is like checksum upload and the GSO jail. With this feature, this makes UDP tunnel has a significant performance increase compared with the IP tunnel. So, that's why we talked about the open VPN and the workout. They both over the UDP, so the performance is very good. The next is the bare UDP tunnel. The bare UDP tunnel is the full tunnel. The full tunnel supports IP and GRI header as the inner header. And the bare UDP supports IP and MPL as the inner header. And the last is the GUI tunnel. The difference compared with these two others is that the GUI tunnel has GUI header. GUI is the generic UDP encapsulation tunnel. Since it's a header, but the header is also lightweight, so the speeder is also good. But it allows the header to have the optional data field and could be used for virtualization, security, and connection controls. The next is the WaxLine. WaxLine is also developed at the year of 2014. So, in the year of like 2000 or 10 years ago or 15 or 20 years ago, most of the data centers are in the same place. We separate the network with VLAN. And it's enough in that time, but after years we have cloud network. The data center may separate in a lot of places. So, how could we connect this data center? So, WaxLine is called a virtual extensible LAN. It was developed to address the limitation of the traditional VLAN. And the traditional VLAN has only 4,000 VLAN IDs, which is not enough for the large-scale data center. And WaxLine has a 24-byte WaxLine network identifier. It's called VNI. It allows up to 16 million virtual networks, which is very modern enough. And also, WaxLine encapsulates the... encapsulates on their two internet frames. So, it carries their two... their two data over the UDP header. This means you can create an isolated network on their two across the data center and the cloud network. This flexibility also allows VLANs to be deployed across the data center or cloud network. And you can migrate it from different physical place or data center and still keep the same logical network. Yes. And there are also some other protocols, like the WaxLine, like the NWGRI. NWGRI also has... NWGRI is a... uses a GRI header to encapsulate the data compared to the UDP header. And also, like STT uses a stateless TTP to encapsulate the data. They all have... They all have... They have different protocols, but they have similar... similar functions to how the... like the identifier to make... to address the problem... to address the VLAN issue. But as we see, the WaxLine is popular, but also how the NWGRI or STT, they all have fixed header, and it's not easy to extend. So what if we want to have some other features in the future? It's not easy to do that. So we have the Genel tunnel. Genel is a generic network visualization. And from this picture, it looks like the same, but actually Genel header is a flexible header. It could help... It's support to how... using a type length value option, the TROA option. So this could make us to add new functions without modifying the basic protocol. It's also used... it's also used a... larger VNF field compared to WaxLine. Genel used 22 bytes, but Genel has 32, and the WaxLine only has 24. So this is more than enough now. It also supports built-in support. The Genel also supports built-in encryption to make the data more secure. So... Also, the WaxLine encapsulates there, too, while the Genel could encapsulate there, too, or there are three, so it's more flexible. So many sources could become the new filter tunnels we can use on... we can use the OIN, and OIN is currently still... is using the Genel tunnel as the default tunnel. So that's almost what we talked about on what we know Linux support now. Let's think... since the cloud network is more popular, let's think about what kind of tunnels should be used in the cloud network. First, the cloud network may have different products. So the GI tunnel should be fit because it could support different inner protocols. And the next, as we know, WaxLine. WaxLine is popular, and it could be... it has a large-scale... a large Wi-Fi field to separate the networks. And the last is the Genel tunnel. It's the future. It's designed for the future. So... that's all the tunnels we know, and I want to introduce. Thank you. Thank you. Thank you. Thank you. Maybe... Okay, hello everyone. Welcome to another session, and we are here to share with us, and we will talk about NixOS. Thank you very much. Once upon a time... this seems to be working, hello. As I've already been introduced, and the presentation is going to be about a very beautiful operating system called NixOS, or NixOS. I don't know which pronunciation is correct, to be honest, so I'll be using both. First of all, this talk is going to be slightly different in terms of few technological tests. If you open this link, you should be able to follow the slides on your device live. And there's also... if you look to the top left corner, there's a hamburger menu which you can open, and you should be able to post questions. So if you have any questions, try to post them through this live show. And I will do an extended QA, actually, because I think... I hope there will be more questions than I can answer in the talk alone, so feel free to put in there. And if you can put a test question, maybe I'll see them right here and see if it works. Anyway, don't panic. NixOS really doesn't bite. This is sort of a sales pitch for the OS, so I'm not going to go into many details. But I'm not a very good salesperson, so I'm going to leave you to spoilers right in front. NixOS really doesn't bite. That's why, of course. It doesn't bite more than the others. If you do try new distros, your way gets pigeoned by something. NixOS is no different in this, but that's the point of this presentation. It can be frustrating because it's a bit of different distribution than the others, but once you start using it, you learn to love it. I already see people, and she's already here, so there's living proof it up. It'll roll on you very easily. So, you might be saying, asking yourself, why should I listen to this guy? Well, my name is Jakub. I've got all the socials, everything. But more importantly, I've been using Linux for over 20 years now, 19 of which I've been using Linux full-time. I've been paid for using Linux for well over a decade, and most importantly, I've been using NixOS as my only desktop operating system for over four years, so I hope I've got a little something to say on the topic. So, what is NixOS? And why should you care? Well, if you want to talk about NixOS, we first have to take a little detour and talk about Nix, the package manager. Nix, the package manager, is a very specific package manager. You already know what package manager is if you're using Linux these days. Nix does the same role, but it does it quite differently. It is functional, declarative, it's quite fancy, and it provides immutable results, which is probably the most interesting part of this piece of software. We'll talk about all of these points very soon in detail. One of the more interesting things about Nix as a package manager is that it actually runs on everything. It's not bound to NixOS, so you can try Nix on... Yes, even on phones. So, NixOS, what makes it unique? Well, Nix. Nix gives NixOS many features, like immutable package management and dependencies, fully declarative system configuration. If you are experienced with, say, things like Terraform or Puppet, you might know what this means. If you're not following my lead and you know what it means very soon. The system state is atomic. It means that... Well, it's atomic, you know what it means, right? If you have a database, atomicity is the inherent feature of NixOS configuration. It protects you. You really can't break NixOS easily because if you do a typo or something, the system just won't build and nothing will happen. Hence the atomicity here. And all the configuration for the operating system is in one place, and it's written in a single language that's rather easy to understand. NixOS is also a quite user-centric distribution because it allows you as a user to create dynamic environments for your applications, environments plural. You can have multiple environments in parallel running for different workloads, different experiments, anything you want. It allows you to build different and completely independent user profiles and independent what it means. Again, we'll talk about it very soon. User configuration can be done in a very similar way than the system configuration is done. That is being declarative and, well, auto-making everything. And of course the community. The NixOS community is one of the greatest community out there. People in NixOS are aware that the destroy is different and they are very, very helpful to newcomers because they understand the frustration the newcomers come with sometimes. Don't be frustrated, really. It's not worth it. It's easy. So, let me give you a bunch of good reasons to switch to NixOS. It's personal opinions. No warranty or management. So, would you care to guess what this is? No guesses? Yeah, it is. It is an NixOS configuration but this specific configuration is actually complete and working system configuration. This is all you need, like literally all you need to have your desktop running full NixOS with Plasma 5, with Firefox, and your favorite editor. Like literally all. There's nothing, you don't need to add anything to have a working system. So, this is how easy it is. Well, you probably see this. Okay, I'm lying just a little bit. This is a way to split your NixOS configuration if you want, but this specific case, this specifically is this. This is a configuration for file systems and such. This is generated for you by the Instaur so you don't have to care about this. It just has to be there because there has to be some kind of initial state but this you don't have to care about. This is not a complete example. I've got a much larger configuration in there but you get the idea, right? So, about the NixOS configuration, well, as I said, it's declarative. As you have seen, declarative configuration means that you basically say what you want but you don't have to say how it's done because that's what Nix does for you. You just say, okay, I want Firefox. That's it. I want Plasma. I want X server. I want an Nginx container. You just say that you want something and it'll magically happen. It is immutable. It means that once a configuration is made, it doesn't change. And if you make the configuration with the same inputs, you will always get the same outputs regardless of the current state. It doesn't really matter when you're creating a configuration. That means it's reproducible. You can always make the same configuration all over again. You can test it and you can be sure that if you do the configuration same and you give it the same inputs, you will always have the same outputs. And it's also atomic. It means that once the configuration is, let's say, created or built in NixOS terms, then you can switch to it. And the switch is atomic. It means that the switch either happens completely or doesn't happen at all. So it's virtually impossible to end up in a half-configured system because of this atomicity principle. It's not possible to break things just by rebooting systems or something during the build or anything because that's the inherent property of the system. So how do you deploy this kind of config? Well, there's a simple command. This command will take all your configuration and it will... Sorry. It will build you a new operating system configuration and it will switch your runtime to it just like this. You might be asking yourself right now, well, how are new packages? I always do some Pac-Man or Upt-Upt or Jammer or whatever at DNF. I heard he's the new thing in Red Hat World. So how do you do the packages? I mean, do I have to do this all the time if I want to just do it? Do I have to do this all the time if I want to just add new packages into the system? Well, you can, but you don't have to because these profiles I've been talking about a while ago. This command will install the package Krita for you in your user profile and once it's done, you can just start it. But let's say we don't like Krita. Let's say, okay, I don't like this editor. What do I do now? Well, you can uninstall the package, of course, but let's say, okay, but let's try a different thing. Let's say I want to try Inkscape. Do I have to do the install and uninstall again? Well, again, you can, but you don't have to. Nix has this great feature of running actually a package or package output that you don't even have to install. This thing, if you run this in your Nix environment, it will start the Inkscape package for you, but it will not install it into your environment. This will create some kind of, let's say, one-off ephemeral dynamic environment just for Inkscape, which will disappear after Inkscape ends. It is great for testing your style. It is great if you don't want to take care of any configuration or anything, if you have your configuration completely decorated, which you can, of course, even with the packages. But if you want to try things, you can do it like this and you don't mess up your profile if you don't want to. So, testing packages is great, but can I test the configuration of the system as well? Well, of course you can, because the NixOS rebuild command doesn't only do switches, it does a bunch of other things. Let's talk about the dry-activate thing. That's one of the more interesting things. The dry-activate does something like NixOS rebuild switch, except it won't do the actual switch. It will do all the builds. It will show you what will have to be done in order to switch to the new configuration, but it won't do the switch itself. But for testing things, if you have some kind of inconsistent configuration that requires runtime operation, like what happens basically, this is the thing that happens at switch, is something like what happens in post-hooks on packaging stores, like unit reloads, runtime configuration, these kind of things. So, NixOS will tell you that this will happen if you want to do the switch, but it won't do the switch. This is the NixOS rebuild boot. This will build the configuration. It will prepare your bootloader, so if you reboot your box, it will boot into this new configuration, but it won't touch your current config. This is useful if you have, let's say, a kernel upgrade, and you're not sure that your current packages will work with the new kernel for any reason. So, there's one more interesting thing, and that is build VM. You actually build your whole configuration into a totally separate virtual machine which you can play with. I don't know that if any other system does this. So, now you've tested everything, and you want to try to break it, right? Well, this happens with common distros. You edit something here, you add a package there, and everything breaks, right? You've been there all. Well, not with NixOS. Good luck breaking that. Because, again, in order to break things, you would first have to switch to the configuration, but you won't switch on syntax error because the build will fail. You won't switch if you have compatible settings because the build will fail. It won't even build if there are upstream bugs NixOS knows of. That happens a lot of time to me when I try to build a new kernel with ZFS, because I use ZFS, and ZFS doesn't always keep adding new kernels. So, if there is a new kernel coming in which is not yet compatible with ZFS, I go to message, hey, your package ZFS is broken. You can't do this, and I won't break my system. On Arch Linux, for instance, when that happened, and I missed the little message in Pac-Bun that something didn't happen correctly, but it still built a kernel, I reboot it, and the system isn't bootable for it because there was no drivers for ZFS. This does not happen on NixOS. But what happens if you do manage to break the thing? I mean, it's pretty hard, but it is also possible. So, let's say we broke the system. Do you know what this is? Okay, for those of you who don't, I'm sure you know what this is, right? Yeah, it's true. NixOS has its own time machine. You really can do all these things because NixOS has all these properties. Again, this naturally leads to the time machine capabilities, but since NixOS is made by humans and it's not 100% foolproof, nothing is. You can always have the option to do a rollback. I mean, if this switch happens not to your liking, you can just do this single command and your NixOS will rollback to your previous configuration. Well, what if the switch breaks the system? As I said before, NixOS configuration, we call them generations, are stored in your bootloader. So, even if after the switch, your system becomes unresponsive for whatever reason, you can always reboot it and pick whichever generation you like. Usually the second to last because that's the one that probably worked, but if you do a lot of upgrades, it might be a different generation. What happens to these generations if you don't need them? Well, there is something called garbage collection which you can trigger manually and it works pretty much like your standard programming garbage collector. You've got profiles, you've got things that configuration links to and things that no configuration links to and those are removed by the garbage collector. So, these are, I think, the most interesting things about NixOS itself, but let's talk about the Nix package manager as well. I already told you something about the... I already spoke about it, the independency. As I said, Nix allows you to manage your own profile, your own packages, your own pretty much everything independently of the system admin as long as you stay in your user space, of course. Users are also independent from one another. That's quite implied, basically. But you are also independent on your distribution or even operating system. NixOS runs almost everywhere. I've been talking about profiles and environments. These terms, I would prefer not to define them. These are the terms I'm using mostly for you. These are not, well, sort of official, but I'm not aware that these are being used as official as I'd like. But whatever, I remember I was talking about a profile here. I was talking about the configuration of your user profile mostly. It's the same way as you configure the system. You have a Nix, let's say it's called NixExpressions, some sort of, you build it, you've got your configuration, you've got your packages and everything you define in there, and that's static. When you change your configuration, a new configuration is created in the profile which creates a new generation of that profile or configuration, and you can go between them as much as you can with the OS itself. But there is also this concept of environments, and those environments are dynamic. FMRO, you can make them on demand like I did with the Inkscape example. You can do much more complex configurations than that. You can create a, using this Nix shell capability, you can create your own environment for, let's say, package development. You can have multiple libraries with multiple versions of these different libraries having in different environments in parallel. So you can test your configuration against different libraries at the same time without having to touch the system or do anything fancy with like VNs or something like that. You can just create the environment on demand and you can destroy it after you're done. So that's, I think, one of the great things for pro-users, but it can be useful for beginners as well. And of course, Nix runs everywhere. Like literally everywhere where Linux currently is, there can be Nix running on it. So you can run Nix on your Red Hat systems. You can run it on Ubuntu, so you can run it even on Macs. That's actually wildly supported by the community. Mac is the second largest, maybe even the... No, I'm not sure, but it's one of the two largest platforms Nix builds for. So even if you use Mac OS, you can use Nix on Mac OS without any hassle. And of course, it even runs on Android. I personally don't use that, but I've seen it working and I've seen there is a good support for that. There are ports for other platforms. I think I've seen a discussion about 3BSD. Not sure what the state is, but you may follow if you're interested in that. Good, right? Well, sometimes NixOS doesn't really fit your need. And the biggest example, for instance, with beginner users and normal users like us, is unfortunately games. Specifically, and that is the sad part, Linux native games. Like Unity 3D games, I was unable to start two out of three Unity games on NixOS, no matter how hard I tried. And that goes for native close source applications as well. If you have any close source application that you need to run, you might find it difficult to run on NixOS. It's not impossible, but it's definitely not going to be easy. Because something called the NixOS Store. This is how Nix creates everything. It creates packages, files, configuration, everything, NixOS, outputs, ultimately lands in this directory, which is read only for users, so you shouldn't touch it otherwise. And it contains, as you can see, it contains all kinds of random things. This thing, this is a cryptographic hash of the specific build of the thing that makes it unique and that's what allows you to have multiple versions of the same library because the prefix will always be different. But this prefix will always be the same if you put the same inputs to the same configuration. That's the immutability part. But since everything is easier, it means that nothing is in the common directories like Uselap, or even bin, or sbin, or user sbin. These directories are usually mostly empty because the execution environments for these packages are built per package. And that makes things that rely on those, that assume these sort of standard, but not really standard locations to be present, and you have to fix those, and it's not easy. I don't want to go into technical details, but be prepared, and if you have any questions, you might run into these issues. So, I hope you have questions. We can answer here, because I only went through, like, three or four things that might be interesting, and I'm sure there are much more use cases out there, because each user is unique and has unique needs, so go ahead. If you want to build a closed... Oh, sorry, yeah, the question was, what do I do if I want to build a closed-source application? Well, if you want to build it for NixOS, then you can build it just like any other application. The distribution in NixStore is binary, so that's not a problem. But if you want to distribute it for it to be compatible with NixOS, then my suggestion probably would be build it statically. Let it chip all the dependencies it has, like all new dependencies, like go applications, essentially. We can also talk about server use cases. I mean, okay, go ahead. I'm sorry, can you speak a bit louder? No, no, that's the NixOS switch does. It switches your current runtime environment to the new generation, but you might have to reboot it if you rely on things that only happen at boot, like kernel upgrades. That's the same with any other... Sorry? Okay, sorry. The question was, do I always have to reboot when I do the configuration change? No, you don't. Unless the change requires reboot by design. Yeah, the question is, what is the source of the packages? Well, this is interesting. The source of the packages, by default, is something that is called NixChannel, which contains information about all the packages in source, and there is a building build system that caches the builds, basically, and it stores them under these hashes. So there is an online caching system which has all these packages built with these hashes for the official channels, so you can either build them yourself or you can just try the cache if the package already exists in there and it's downloaded from there automatically, if it exists. If it doesn't exist, it gets built. How do I know that this package is built the same way? Well, you can build it yourself and you will see that this hash will be the same if the inputs are the same. That's the immutability part. If the inputs are the same, inputs being both the configuration and the sources, like a commit, a specific commit from the sources, then if you build it from there, then you will end up with the same hash. If you don't, then there's a problem somewhere you need to solve and it's probably a different source or maybe it is a security issue. Maybe someone tried to touch your sources, so if you don't trust the caches, you can build everything yourself. Do you recommend that you always build out the line on the caches? I think other sources are trustworthy, essentially. Yes, I think they are. I mean, NixOS is a huge distribution. Everything is cryptographically signed by the package developers. You can always go through the public history of the packages on the NixOS, Nix packages GitHub repo, and I don't want to lie here, but if I recall correctly, you should be able to know this hash before you even build it. This hash is calculated from the sources, not from the outcome, so you should be able to check. No more questions? No server admins interested in NixOS? Okay, that was one. Wow, how much space does it take? Most of the space, like the bulk of everything is in Nix Store. Almost everything is like, if you have 50 gigs, then 49.5 gigs will be in Nix Store. It really depends on how many generations of history you want to keep. If you only work with the latest generation, then the space is pretty much the same as with any other distro. Like if you have many packages, let's say 10, 20 gigs, if you want to keep generations, multiply by the number of generations, essentially. There was a question over there, I think. Okay, go ahead. Can you have Nix Store backed by external storage? Yes, you can. As long as it provides basic POSIX compatibility, then yes. And I don't think even POSIX compatibility is required. I'm not sure on that, but yeah, it actually even happens when you have, let's say, virtual machines on a V-host and the V-host can share its own Nix Store and share it to the machines and they can share the packages because, again, the packages are addressable. Probably, yes. If you mount an ISO image, yes. The question was, can I use an ISO image? Yes, you can. Someone, okay? Is it possible to have a POSIX compatibility? That's a nice question. Is it other packages from other distributions available? Well, let me answer slightly differently. Nix packages is the largest collection of packages of open-source applications out there. It has, yesterday, it has over 80,000 packages available. The second largest was Arch Linux AUR and it has, I think, about 60,000. So there are good chances that your package is already packaged for Nix OS. But if it isn't, if you really need to use some other distributions packages, then you can use, well, Flatpak is supported. So you can use Flatpaks and you can always just write a simple wrapper around the other distributions package. You can unpackage it and just put it to Nix Store as a package. But you may have to patch the binaries, so they look for the correct libraries of dependencies because usually it doesn't exist. So you may have to do a binary patch, but Nix OS has facilities for that. You can use Nix for anything. It's essentially a build system. The question was, can I use Nix to package other packages for other distributions? You can use Nix to package anything. I'm not sure if there are default outputs for different packages, but there are definitely outputs for Docker containers, for instance. You can build a Docker container from a Nix OS expression, from a Nix OS expression. Go ahead. I'm not sure I heard the question completely. I heard that the question is, how do Nix language is in terms of stability? It's quite good. I mean, I use Helix editor and VIM mostly, and the support is there. I'm not sure about VS Code, but even this presentation software has support for Nix language, and it's like two years updated. I mean, the highlighter, not the password, but the highlighter, you see the language was highlighted and that was the internal works of this presenting software, which is outdated. So the support is pretty good, I would say. No server questions, really. Go ahead. If I have a package that needs to change the state of the system. Well, you probably shouldn't use a package for that. You should use something called a module, which is a Nix OS thing that defines the configuration. Right. If I have to choose a system package like for web server, do I choose between Nix OS and Apache, and how the package tells the system that it is not a default web server? You don't do this in Nix OS. You just say, I want... You basically... I can actually show you, maybe. I can show you how the live presentation is presented on my... Sorry. This is... It's lying a bit. This is how you do that. You declare a service. Okay, I want a service that's called NGNX. I want the service to have a virtual host with this name, and I want to do something like this with it. This is how you define a virtual host in NGNX, and this is everything. This is all you need to put into configuration for a full NGNX web server with Redirect, in this case, to work. So you don't do state changes by the packages. You do them by these... This is called a module. This is called a module services with module NGNX, which has some submodules, configuration, et cetera, et cetera. There's a huge number of these modules. Everything, almost everything that you can only use is probably a module for Nix OS. This is a Nix OS feature, not a Nix feature, by the way. You've got something similar for your home environment using Home Manager. So that's how you do that. You don't change the state of the system by a package. You shouldn't do that to a stateless system, because Nix OS is essentially stateless. One more question? Go ahead. Yes, exactly. The question is, can I replace automation tools with Nix OS? Yes, Nix OS is, by definition, its own configuration tool. Since it's purely functional and fully declarative, you say Ansible, Puppet, maybe, those are not purely declarative languages, but, for instance, Terraform is. So if you want to compare, I think, one of the best comparisons would be Terraform. What Terraform is for, let's say, cloud infrastructure, Nix's language is for Nix OS. Can you start over, please, a little bit more? The question is about the state of the project of Nix OS. How mature it is, what part are mature, what parts are not. I don't feel I'm the one to be answered these questions, but if I would have to answer them, then I would say Nix OS at its base, the Nix language, the module system, it's quite mature, it's been with us for, I would say over a decade now, not sure, but I would say there is, yeah, thanks. It's probably well over a decade now. What is new and a bit unstable is features called Flakes, which I still want to talk about a little bit in a minute. And that's a different approach to Nix packages called Flakes. It's still experimental, but it's been widespread and it's, I think, pretty stable, but officially still experimental. But Nix OS at its base, I really wouldn't have a problem using it in production on service, in math scale. All right, let's wrap it up. Okay, one last question. Can I run Nix OS on a mainframe? I don't know. On Raspberry Pi, yes, you can. I'm running a Raspberry Pi home server on Nix OS. All right, so let me give you a few tips at the end. Learn Flakes, this topic has already been touched. Flakes is a new system for package management in Nix OS and Nix. If you want to start with Nix OS, learn Flakes. I didn't, I don't know much about them, and it's biting me in the ass. So, do learn Flakes. The official installer, I mean, there is a new installer, it's just a few weeks old, I think, which is the same installer as for Ubuntu and other systems, but there are a bunch of other installers, so if you are not comfortable with installation, look out for the other installers. They might be easier. Use Home Manager for your home configuration is the same decorative config as it is for your operating system. Don't be afraid to make changes. You don't break it, and once you're comfortable enough, try to installing OS fully manually just to see how it works under the hood. There is a couple of resources. This presentation will be available later, so you can check them out. Thank you very much. So, thank you. Seems I'm moldable. So, hey, hello, everybody. Thanks for coming to my talk. I'm Miro, I work as a senior principal quality engineer at even though, like, I'm now a full stack developer testing farm, so QE is just a part of it. We are also doing all the stuff from SRE to development, and we are also testing our stuff. So, yeah, welcome. Yeah, small agenda, like what we will be speaking today about. So, I will guide you about the introduction that everybody maybe knows. This is what is testing farm. I will also tell you what is TMP and why testing farm and TMP together are important for this talk. I'm gonna go through the testing farm request, it's just some introduction. I think there might be people who don't know this stuff, so I think it's always important, and I will dive into some interesting features that I think they are interesting, and we'll look at some use cases where testing farm is being used that's on GitHub, GitHub, Federa, it's Centro Stream CI, and how you can use it also otherwise. I will show you some future plans, and I hope because the content is packed, but I speak fast. So, yeah. And I will do a demo, but the demo is taking a little bit time, so I will do some questions with it, and I think we will do it. So, what is testing farm? Testing farm is an open source testing system as a service, or our code is actually open, or our infrastructure code is open, even though like it's not super well contributable, but we are working on it sooner or later, it will be that way, but it's like people contributed even from outside Red Hat. So, it's basically a flavor of software as a service, and we are focusing on executing automated tests, and we are running this automated test against most of the VMs of burn metal machines, but also containers, and we are really back-end of CI systems. If you think about it from the high level, that's what we do. CI systems call us to do the dirty work, provision the infrastructure, run the tests, and return back results. Testing farm itself is being used quite a lot. We are also this... This is a bad word. Like, something that is a lot of places in some of the real infrastructure, but also we are in federal infrastructure, central stream infrastructure, GitHub, GitLab. So we are on a lot of places, we are doing the same job there, running tests. We are doing it in hybrid cloud, that's important because we have one single public API that is being used by everybody, but we have a worker deployment in Red Hat, and one in public, running tests also inside Red Hat, but also for the public stuff. This single public HTTP endpoint is important. We wanted to be really open source, and we didn't want to have multiple HTTP endpoints. We want to have one really and sort out all the problems that come with it. It's reachable here. If you go to this link, actually, that leads you to our documentation, and you can look how it looks like. It's fairly easy. It's still 0.1 version. We are slow, but it works well for three years currently, or what you are in production. Our request, testing form is really easy. You specify what you want to test, and you specify the environment on which you want those tests to be run, and testing form returns you back results from the high level view. Right? That's the worker deployment that I was telling you about. Like we are the API's public, public range, what is the worker deployment. We call that range because we are farm, so the worker deployment is range. That horse has nothing to do with anything, but I just like find it nice. So that public range runs everything I will be speaking today about. Rated range is something for internal stuff and audience, because those tests are running inside Rated network and providing results only to Rated employees, but I think it's half of our request. Half go to public range, half to Rated range. As you can see, we support a lot of infrastructure. We are running a lot of containers, actually. Majority of workloads is running against containers because we run some generic tests, but mostly we do AWS, also downstream, or I mean in Redhead also, also inside Redhead network, but also upstream. We have AWS connected to the internal network. We do internally open stack internally. Beaker, we have this nice new REST API which you can connect any other provisioning system very quickly. We have some Azure preview and we are planning Google Cloud and IBM Cloud, but majority is this this one is used just for one special device you are testing now on Q drive trees. Hope I can say that. So I will be speaking only for this part. This is about, the talk is about I will not speak about Rated stuff because I'm not sure if everybody is here Rated employees, a lot of you are, but I will tell you that on QCamp. So what is TMT? That's why it's important. So TMT, it's that's my little bit word, maybe not everybody likes it, but it's basically for me it's an open source test management we did this for REL because we had a lot of legacy systems, or we still have inside Redhead, and we wanted something modern that basically the folks who are working on REL can easily open source tests so they can share and it was funny because the test could be executed on Fedora, but there was no infra actually running it, and also there was no system where you could easily consume this test, something that would be nicely polished, that you can share the test between Fedora and CentoStream, combine them together and so on. So that's why TMT was created, it's open source there is already like 50, 60 people contributing to it, so it's getting larger we have, check it out it can do for your project test management, I don't know if who of you works in QE, but there are test case management and test management systems usually something that you pay for this is in Git, you store some metadata in Git and you can do test management in it, it's pretty cool, we like it and then we are connecting with these systems like Polarion and TCMS what are some internal deployments of test case management system, so we are creating some export plugins, so we play nicely but we love this that we are working with tests and test metadata because that's cool that's what everybody does correctly, I don't know it's Git you can create a merge request again your test cases, that's cool this is Git, so that's the default otherwise, TMT is also a CLI tool and test execution test executor that's where it relates to testing farm testing farm uses TMT to execute tests on their workers so this test management of course, we have a specification for it, so in this specification there are few attributes that I will be speaking about or few levels of test case metadata but like TMT is really focusing on that you don't need to repeat yourself, so it has it is basically kind of YAML format but it's on some steroids you can basically store your metadata in a hierarchy, there is some inheritance and stuff so it makes it really easy that you can have your test configuration test metadata dry, don't repeat yourself right otherwise, if you go to some project which uses TMT, you install TMT from whatever operating system are you using if it's RELike then install TMT and you run TMT it will discover and show you, so it's like we love it because it's like just looking at time you go to a repository and discover what tests are there in general, and you can have multiple unit frameworks, you can have multiple integration test frameworks or whatever but you believe that TMT once will have this nice way how you can really discover all the tests that you have regardless unit test frameworks what not TMT is actually like, for me it's a test framework agnostic you can connect it with anything really even though it also can execute tests and has some preliminary support for some frameworks that we use in Red Hat, but people run PyTest via it, Avocado a lot of stuff can be run in any Ruby JUnit name it Ginkgo okay, yeah so TMT has this I'm not gonna go to the details much here I will tell you in the next slide why so TMT has like four levels of test metadata one is like core attributes which like applies to all the other levels then there are tests, plans, think about tests like one test case, one that is testing something, and then you have plans what is like a collection of tests and TMT where it shines that you can nicely select tests fine tune your test plans in a way that something only runs on CentoStream, something only on Fedora and something only on RHEL and so on so this comes from some use cases that we had in RHEL that we needed this really the selection of the test is important but you don't have to use it, you can just use TMT as a very stupid thing that runs one test, whatever stories that's interesting you can then link and create user stories in TMT that is going a little bit beyond even test management maybe going into the project management but they find it useful that you can when you go to the TMT project run TMT you look up the stories you can check like which stories are covered by which code which tests so it's like interesting check it out I don't have too much time tomorrow we have workshops so come there and try it out yourself and better here in the audience we'll do like more we'll give you more information about this I wanted to be really brief here because we don't have time so how does the testing farm request look like what's the anatomy of it so I said that this like really simple right so first you define tests you define tests by pointing testing farm to a to a git repo oh that's too much right so this is our API we support actually two test types TMT and STI and if you are using STI then migrate to TMT please I will tell you later on why and the TMT is the main format you don't support anything else but you can see here that testing farm can be actually agnostic TMT is just like one execution of tests like we were planning to add some random running of some script or what not but it was not the TMT now it can do all the stuff but in theory like we could run other other larger frameworks and integrate it on various levels so testing farm so we have only TMT you give it a git URL, git ref because it's in git the test metadata in git you point it to that git where it finds the test and then we have some cool other features that are TMT specific you can select test directly the API you can say that filter me test filter me these plans these tests you can change the TMT root directory so the directory where TMT looks up its root it's like something like it it also has like that git somewhere right so TMT has also some file there where it denotes that this is the start of the metadata tree it's usually in the same root as the git repository but you can change it nothing so much interesting here right so the test and then you have yeah that's what I wanted to say that we are planning to depreciate STI finally and yeah there will be a further change proposal maybe Federa 40 maybe I don't know we'll see but I spoke this these folks who are running working on this we think that TMT now can replace STI fully we are over it's like functionality wise so we hope that get rid of it because you don't want to maintain two things doing the same thing why there is no reason we have better and then we can just like spend more time making TMT better so I'm asking you if you could slowly migrate I would be glad there is a nice migrate guide on TMT it's linked here so the environment specification so testing farm like takes the tests and then it can run it on multiple environments think about running it on multiple architectures in public to support x86 64 and ARM so you can run the same tests on two architectures but it gets more complicated because each plan is running in a separate environment so if you have hundred plans then all we run in a separate VM testing farm then parallelize it and I think for five we are running now in parallel five on five VMs and we are crunching all the plans and of course all the environment so it can get messy like people are running via like hundreds of plans thousands of tests if you run million then it will break but then we will have to fix it nobody just came like we had some people who are running really large stuff but currently seems like it's okay so that's the environment in environment you can specify a lot of stuff so you can see it's an array of environments where you specify the architecture as I said look you choose what operating system to run on we are always composed then you have pool like one thing that I didn't mention and I will have it in the features like we are trying to abstract away you to knowing that you will run on specific infrastructure you actually like our users usually want certain amount of CPUs, memory disk size, 2 disks, 2 nicks TPM support, UFI support those are generic properties then we abstract away from the users and we choose the infrastructure for you because maybe UFI is provided by Azure but also AWS so we will choose the infrastructure that we want you don't care you just want the property but if you want really AWS then with pool you can say it then as a good testing system you can pass environment variables to the test execution and the environment variable but you can pass also secret environment variables so the time thing that we hide from you that can be useful if you need to deal with secrets for example uploading VM images to AWS or uploading your container image to that you build the testing farm so we can have secrets there it's being used one of the features that we had to implement is installing arbitrary artifacts before the tests run so if you are testing a Koji build most probably you want to install it before to test the real thing so currently that's that's how it's done you can ask for installing an array of four-wheeled repositories adding a repository file to the environment sitting in the time I think it should be fine then you can specify the hardware as I was talking about it and then some other stuff so that's roughly the test environment stuff the environment can be influenced via this API you say what you want but also some parts of the environment can come from the plan so for example maybe in your plan you already know that you want to run against you want to your test the tests in this plan need eight gigs to run well so you can say that in the plan you don't need to say that in the API so this combination can be done then as I mentioned the test selection so there is an adjust code attribute in TNT where you can really fine tune the execution for your test so you run only the specific environment something only runs on S390 it's an internal use case but something only runs on ARM so we are ready I think TNT is very good in this test selection part testing farm then runs stuff and if you are integrating with a service like packet we have a webhook mechanism so basically packet is being their API is being hit when we change the state from new to queued running complete so they don't need to poll us because you can poll the API get the results maybe you don't even know that you are using testing farm you just know because you get to some link like this and you can see that you are hitting our results viewer what has been contributed thank you really easy, nice these are all plans this is actually from TNT and you can see a lot of tests being run there there is a nice rep producer did I mention that TNT is also a line tool for local debugging tests if not then I'm sorry but otherwise like this it looks a little bit larger now it's a little bit older picture but you can just paste install TNT, paste this in your in your local host and it should do mostly the similar thing as CI does it's not completely the same we are on the path to making it the same but it's complicated because in your local host you are probably not having AWS machines you have Liberty M right so then we have here some stuff about the environment preparation and installing the builds we have some playbooks that run before the artifact installation copper build installation in this case just copper build was installed post artifact installation you can see here also some links to API request when you can look up the details about the request if something fails then we I don't have time if something fails then it looks something like this by default we show only fail stuff because like when you come to around you usually want to see the fail stuff because that's what interesting like past that's cool right everything should pass so we can show the past plans here but otherwise only show only fail then you can look up like exactly in the log what failed what you would expect from a testing system show me the failures give me some reproducer when something errors out we are trying to be reasonable we are not always reasonable maybe once we will feed this into chat gp then it will explain to you what is wrong or we will fix the code so it's more reasonable so for example in this case basically you pointed testing farm to repo where there is no TMT metadata it looks like this you don't find any plans and it tries to give you a hint that run this command then in your repository and you will most probably see that I think there is some some context something that is being passed from the CI systems that is used in the adjust rules so this context that is being passed it's not it's something that we do not auto detect but for example Fedora CI is passing us some details about what architecture is being tested and distribution and what is the trigger in this commit and then you can adjust your test according to this context so that is the selection stuff ok let's move on so like TMT when you want to try something on your your local host you don't need to care about testing farm at all, install TMT clone repo or use TMT in it command to play around with it so TMT is really for this local use cases local stuff we also now have a testing farm CLI tool which will be used for onboarding and interacting with testing farm if you have yourself a token but we will be blending somehow TMT and testing farm in the future we will see as I said testing farm already uses TMT most of all TMT will use testing farm it's a little bit like weird but it will work if you are an automated system you are using most really testing farm to run the testing scale so Beckett uses testing farm to run tests against copper builds or without copper build installation in CLI there is a Jenkins instance I will be actually doing some details about this here I will just move on and yeah so if you are user just use TMT and once you get in the place you want to run this in CLI you will interact somehow with testing farm most probably you will not even need to know they use testing farm because you will use some of our users like the CLI system that we interact with sometimes you don't even know that you are really using testing farm you will see that if you see that Oculus viewer the results viewer so what futures are there now in testing farm this infrastructure agnostic hardware requirements that seems easy but we have this beaker system inside redhead that has super variety of infrastructure like beaker sorry bear metal machines I mean that have different network arts different CPUs and we want to get to this detail right so this is all fun like public but in reality in real life we have a system that is providing super wide inventory of bear metal machines so that it gets funny but we are trying to get it out as a infrastructure agnostic way so you can define it and you can say for example that I want only this CPU and we will choose that CPU for you maybe it's on AWS so we will choose AWS for you so we can run multiple environments as I mentioned parallelization up to five reproducer steps I showed you we have a testing farm CLI tool that I have for demo we can now request testing via restart some tests if you have the token of course run run some arbitrary command of testing farm somebody asked me like is selenux enabled on cento stream 9 in testing farm and I told them like then Walsh would weep if not so of course it is but with this with this run command somebody can just run testing farm run on cento stream 9 set status and he will see answering this simple question should be possible so that's the run command but we have now a reserve command so you can reserve you as the community member if you onboard the testing farm and give me your public IP like I will make it available to you later on it will be not needed of course we will add it automatically but it's here that's what I'm going to be demoing today what I'm really glad about that you can reserve a machine according to hardware requirements have fun on it we are not a funny service like some pet project we have SLA SLOs we're doing demo stuff monitoring stuff we have SLO error rate less than 5% API time more than 99% queue time that's the new metric so under one hour but we are now under a few minutes in public actually we have good infrastructure so just that I'm not lying this is actually internal dashboard I'm sorry for it we don't have it in public yet so this is where we are keeping the SLOs also with other teams who are running services inside Red Hat and we will open source it somehow definitely because the metrics are actually open source but like graphonize load to load then we have this web log notifications I mentioned we can do secrets variables in test execution we can integrate with some internal internal other instances that somehow deal with results actually the report portal is coming to Federa that's a system for storing results we'll integrate with it so you can have a history of testing results testing form really has just this simple view variables in TMT environment I think I already had that there filtering plans did that load yeah right the queue time is missing here so this is for the last 28 days like our error rate maybe I should slowly start opening the error budget mode because if we reach 5% we drop in the error budget mode we drop all work and just go fixing stuff but API uptime looks I think that's even unreal but it works like we have metrics and it's no just some data but I was surprised seems like we are very stable in the last 28 days at least for the API and the queue time currently that's actually on public branch 14.2 seconds it takes until testing form from queued ghost running here we are we need to spare money so it's actually slower we are still not doing the scaling well as we should okay 5 minutes and then I will do the demo and questions so just yeah the scale I said to you we are running now 700,000 test requests a year so it's like really like already there is stats testing for my we can look it up and it's there this is also shows like how we were growing how we were growing from the time then we got to production will it load yeah so 2020 we started 56,000 requests 200 and we are slowly growing maybe 700,000 this year slowly but surely as more people on board yeah the main use cases of our service are are basically a public service a service using our API to run tests publicly then we have Red Hat CS systems who run tests and run it or the results are available internally against internal infrastructure but we have also from public to internal this is quite lockdown but basically with testing farm Red Hat teams can validate their merge requests to github against unreleased rel or any unreleased other product I think that's cool because it was very hard to do before it's lockdown it needs to be very safe because like it's yeah it's a weird scenario in these terms but it's making sure that Red Hat employees can validate their products very early shifting left as much possible that's a big selling point for for Red Hat employees I believe internally Fedora CI packet then we have some Zulu integration github actions and the CLI tool which you can use to interact with services only if you are using CLI or directed API you need a token otherwise with this also with github actions you need a token otherwise with Fedora CI packet and Zulu you don't even need a token like because the services take care of it you just drop some files in github repose yeah not much time left for minutes I will do this very quickly so packet itself a github application that can test your pool request it can build a copper build and we can validate that copper build installs on the system and then we can run test against it easy as that report back to pool requests you can I will just show a few links here because we don't have time I was not so quick as I wanted today so maybe that's good yeah so here is packet running and you can see that it's testing on a lot of a lot of all the versions I think this opportunity made behind Fedora and it's running test against that this is actually directly TMT TMT is of course we are dog-fooding as well then packet can also test only you can skip the copper build you can just drop some files into github repo enable packet and you can run test against that github repo whatever it is you have root a VM shoot yourself anywhere you can so packet easiest way to run test via testing farm enable packet at TMT metadata to the gith no testing form token needed github github is a little bit harder to setup but also it can do github there are a number of requests in packet so packet can run multiple test jobs those test jobs multiple environments and multiple plans so it can get large but like it's possible so packet you can have multiple jobs and configure it differently it has very good configurability because packet implemented a special field they can they can patch our API so you can do a lot of stuff with that and you can even run tests against internal infrastructure packet so if you are a retro employee it's really easy no secret support only limitation that I would say packet is perfect and I would just say like that's all for github but no secrets there is a problem of sharing the github secrets to packet so we'll need to sort it out most probably with a hash core vault but it needs more time Federasia is having a Jenkins instance that is reacting on Koji builds and it's running tests and it uses webhook mechanism in Jenkins and they're reporting results and yeah if you go to github you will see some automated checks everything we start with Federasia it's run via us also Federasia runs some generic tests via us Instalability R2V, Inspect RPM Deploy those are generic stuff and they run even via containers so because not every test needs a VM right if you don't need we can run against containers Zool is the next CI system for testing we have gith merge gith PRs on Pager and we have it actually in two flavors it's also for CentOS stream this gith contributions there we integrate with Zool via playbook once you have the HTTP API you can integrate wherever you want and we are providing the results Zool is providing the results directly from merge request maybe you know Zool check it out yeah very easy onboarding it's configurable this is not configurable so much as it should be and I will go to the demo then we have a gith collections workflow you can read what are the benefits but rather you spec it if you don't need secrets with the github actions you can use secrets and if you really need to integrate because are you right to hack or call API directly you can use our HTTP API directly via HTTP or whatever or you can use testing form CLI tool and in future we will have multi-host tests and easy onboarding via Fedora and Retta single sign-on and more infrastructure grow more infrastructure because we are hungry and now onboard if you want there is onboarding guide and now demo time so I want to this is like the newest thing I wanted to show you it's available if you are willing to share public IP with testing form then I can make it available for Retta but it will take us a little bit more time to share the public IP automatically so you don't need to know but if you want only onboarding let me know somewhere you can find the contacts in the last slide so I want to reserve what? I want to reserve sent to stream okay I want just Fedora Rohite I will just click it Fedora Rohite Intel architecture reserved for 30 minutes and now I open for your questions until these 3 minutes 3.5 minutes and I will get and I can do whatever you want so that's my demo reservations on testing form if you have a token you can reserve a machine and do investigation this gets you the exact same environment as testing form does you might need to pass in some special parameters to make it the same as your request was but maybe we will adjust that it reads it out automatically so this is very cool I'm really happy about it I'm ready for questions it's understandable yes so the question is if I know how many people use this for a packet I think packet has a nice light with those large components right I can query the SQL database but it will take a little bit time so after the well I'm we are testing hundreds of Fedora packages 900 was last I checked but with packet I will need to check is somebody from packet in here maybe they know they didn't come so we will need to find out we don't have good statistics for this and we should have but I can query the database after the no problem sorry any other question yes so if you have some specific hardware requirements you would like to because we own the infrastructure I own AWS account I know beaker so you have a beaker account or AWS account I think it should be possible we currently do only AWS open stack but it's internally publicly do only AWS so if we have accounts in testing farm it's possible then to provision these other clouds we have some Azure support but others are not implemented so if you need for example GCP it needs to be implemented so you can contribute our code is open or you can file us a Red Hat issue and we will try to get to it it's all we are all based on user request so if there are more users asking for GCP then we are planning also to make it possible that you can give your credentials somehow to us then we can use your credentials because that is needed for cloud costs or what not but otherwise we are all running this on our accounts currently internally we have cloud cost dashboard I can tell you that your team how many dollars it spent on AWS if that works for you but I'm free to speak after the talk and we can sort out the details any other questions was it digestible or not that's an improvement last time I said that slow down the recording if you can understand so it was better I promise it will go through just let's wait a few seconds how much? I'm over time 2 seconds 3 seconds now it's actually now it's preparing the environment I be hacked a little bit that it shows the status you will see that it was running setting up and now it's preparing the environment at the end it should log in it takes so much because our guest setup playbooks are slow and they are updating the whole system and rebooting it so that's why it takes longer so maybe we it works and I wanted a normal machine so I have just something but I could provision a machine with 32 gigabytes of RAM more disks more disk size but this works if you want this it's available we have actually this tool on PyPy if you have a token you can already use it just let me know somewhere that we can meet together and I will meet your public IP but next time I will steal it and we will make the access to you because you need to somehow access these AWS resource we cannot just open it for everybody because somebody in test they set the password to Fedora then the bot comes in and it will do something right so we need to make sure that this is safe because it's an internal network so we already VPN so sorry but if you want this it's available catch me thank you for your time yeah perfect welcome to the session about the new version of the NET5 and now the floor is yours Jan, Pavla and the others ok thank you very much I would like to thanks organizers to get the opportunity to give you our excitement about the next generation of the software management in Fedora and other downstream distribution before I start I would like to introduce my colleagues Pavla Kotokhiwola and Honza Kolarik and my name is Jeroz Lamnacek and everyone and we are from DNF team well during the presentation there will be three funny parts with the passing like therefore enjoy it have fun so first so that we are all on the same page and know what we are talking about so DNF is a package manager which means basically software for installing, removing upgrading other software and to put it into context on the lowest level you have RPM which does the actual installation and removal above it there is DNF that uses RPM and does other stuff like it understands repositories and modules and uses Lipsol to resolve dependencies and so on and above it there is packagekit which is DNFDemon and that is used by some graphical user interface tool ok so this presentation is about DNF5 but first let me just shortly talk about the previous version about DNF so that we know why the DNF5 was needed what was the motivation to move to the next version so in here you have the structure of the original DNF so this is not DNF5 yet ok and you can see it's a little bit complicated in the middle there is this big box called flipDNF and it's containing other smaller boxes and it's DNF library and you might think ok it's somehow logically composed of small libraries but it was actually the other way around so first there were these small libraries and it was then meshed together to form the DNF and so you might see how that might be a problem because there remained some duplicities it didn't fit together very nicely there was a lot to work done on it to work properly but still it's not perfect then you can see in the purple boxes there is packageKit, I already talked about it it's the DNFD one then there's DNF, it's the command-line tool then there's microDNF and that's another command-line tool so you might ask why do we need two command-line tools for this and the answer is they're both a little bit different DNF can do a lot of things it has a lot of features but it also needs Python so it's not very suitable to use in containers when you want minimal installation and that's where microDNF comes in but microDNF on the other hand doesn't do everything that DNF can do also that's because they both use different parts of the library so for example when a change is done in DNF it doesn't get automatically done in microDNF again you can see how that can be a problem and last thing you can see there is two plugins and again there's duplicity there there's C plugins and Python plugins and there's overlap in functionality so you can see there's a room for improvement but we got to a point where it was very difficult to make improvements without breaking compatibility so this is where DNF 5 comes in and yay now this is DNF 5 structure and you can already see it's much more simple this big box is just the library it doesn't compose of different sub libraries that don't fit together very well it has completely restructuralized API it's written in C++ and yeah you can see that there's again a DNF 5 beamon that aims to replace package kit and there's only one command line tool which is DNF 5 DNF 5 the command line tool is now great because it has all the features that DNF had but it's just a slight wave of microDNF because it no longer depends on Python but don't worry about Python, we do have Swig bindings so it can still be used via them and it's really great because all of these three things use the same library and not its different parts so one change can be used for all of these as well I think you can still notice that there are two boxes for plugins but that's okay because now these are actually different there are C++ plugins for the library and then there are plugins for the command line interface so for example adding a new command would be there while something that would alter the run of the whole DNF would be in the plugins for the library and it could also be used by the daemon yes? current DNF plugins live in Python with DNF5? current DNF plugins if they will be compatible with DNF5 I believe not yes so they will need to be rewritten but we are rewriting many of plugins right okay I will hand over okay there is another question well the question is whether CLI plugins use SWIC or whether they must be in C++ yet they must be in C++ the reason is quite easy if anyone will write an important plugin for your distribution in the Python then you will depend on the Python and then you will lose one of the advantage of the whole spec therefore for us it is much more better to write or to help you to convert your Python plugin into C++ then to support Python interface for the command line plugins I'm sorry let's wait for the question and answer section with other questions maybe it will be better to have it in the one place sorry well when we talk about improved API that we suppose our new library will provide what it means we try to provide a better workhorse safer workhorse it means that if you run things incorrectly with your new code DNF5 should alert you stop you and explicitly say you do things wrong you should call it in a different order why this is important we have experienced many reports from the users like you know it doesn't work and the simply answer is you run things not according to the description in our one page and so on therefore the workflow was not guarding or even worse no one reports anything and you think everything is working according to your settings for example you modify settings at the time when it makes no sense but then every time when you ask what was the setting for the operation you receive the new value that was not used for the operation therefore that's why we have some locking mechanisms for example for the configurations of course we also improved transaction reports well formally DNF4 use or overly use logging therefore if you have any problems then look to the log but your application is unable to handle such a report because parsing anything from the log well this hopefully work from all sides therefore if something will get incorrect with your request for the transaction then there is logger and you can list it any time and make a decision according to the log rather than to just depend on the return value from the transaction resultment well what is also different DNF itself has its own configuration but if you use API you will discover that many configurations many API methods that requires parsing configuration directly to these methods the reason is as Paoloa presented difference between C and the Python path configuration is in the Python and the logic is in C and it's not capable to pass the original structure defined in the Python what it means you need to pass less arguments it's more transparent therefore you cannot have a configuration and use different set of the values sometimes you can override it but again it will be transparent therefore you can get the original value that you use for the request well with what we also did we try to listen to you to the community what you need in PAS because many your requests that you know we cannot do it right now in DNF 5 we remember your use cases and try to modify the design to be able to provide or to support additional use cases it means that we try to use less hard coded settings and values in our code and move these values for example for the transaction to the configuration additionally well looking to your code your project we discovered that you have some bundle of our code therefore sometimes make sense to provide it as API then you don't need to for example pass arguments it's like I would like to behave like a DNF but the API or the argument parser is not available therefore let's use the same logic like DNF does then our users will get the same touch like with the DNF the problem is that it's again code bundling and we usually improve this code because we find some edge cases when original parser fail but it's not updated at your side it's hard coded your parser at your application and you have to maintain it and usually it differs well there are of course many improvements in DNF 5 and let me just share few for example performance improvements for example one of the site is loading of the repository with the DNF you can see that it downloads the repository and then it's somehow freeze and then it's continue with another one and that you cannot experience with the DNF 5 the trick is that we split downloading metadata into two processes that run in parallel that runs in parallel it means that when you download the first repository you directly started the downloading the second repository and the first metadata for the first repository gets processed by our sole work this is not only improvement another thing and very important thing is that DNF 5 does not download files by default files are really huge piece of the metadata the biggest one and DNF 5 is not downloading these things it has two sites I mean less downloads less requirements for your hard drive and infrastructure also is used much less and the highlight is that in some cases that I will be not available it means that some packages for example requires some strange files for example guidelines they do have a problem because DNF 5 by default will not see these dependencies as satisfied because the metadata that were used by DNF are simply not available for the DNF 5 the good news is that if your downstream if your third party repository requires such a metadata it's configurable therefore it's not hard coded you see that you end up you need it okay then you will pay the price but don't we should not pay the price for these rare or hopefully rare use cases on all systems oh additionally I have few example of the performance improvements and this is the let's say the long improvements in our DNF stack that well we improve according to the DNF and of course DNF 4 was much faster than DNF 2 and DNF 2 was much faster than DNF 1 and it is much faster than DM in many cases especially for the transaction and we continue with the same same way therefore well as you can see well even repo query if you ask for the multiple arguments or if you run for example what requires for with the repo query or even even update with many arguments there is a huge difference but please take it as an example yeah there are many differences and hopefully you will get satisfied with the result well let me move to our roadmap because I think it's also important where we are we are at the federal 38 I will happen at federal 38 micro DNF was absolutely by DNF 5 no one complains and we received not complaint for this change it means that if you install new containers well you will have a DNF 5 inside well also what will happen well of course the federal that's one important location federal 38 was also a milestone for the DNF 5 project because this is the first time when it appears in federal repositories well upcoming release federal 39 what happen there is a huge change because DNF 5 will absolute DNF what it means by default anyone will start by default using DNF 5 also this is the time where users or users of the command line interface it means other components are supposed to work on the adoption of the DNF 5 it means if your app is using anything from the DNF then well check verify this state before the release of the federal 39 that everything works with default package manager distribution means in DNF 5 if it doesn't well then please report it as soon as possible because there is nothing like unresolved problem you know we are engineers resolving problems it's our job but if you will not report to us what you need then we cannot help you sorry guys the next release Fedora 40 Fedora 40 is the time where we would like to absolute micro DNF or remove micro DNF from Fedora we will set up the standard process you know or retiring the package it's the standard process of Fedora and I'm using this channel just to share it with you because it is important get ready if you depend on something that it's not provided by DNF 5 please don't act no we are taking these requests as a priority it means if any other software is blocked or adoption of the DNF 5 is somehow blocked it's our priority well I've heard several times why we are removing any functional software from our stack and the answer is I remember YAM I remember when and in which state we removed YAM from our distribution well it was removed at Fedora 31 and at that state it was completely broken the problem is that if you will search on the internet what to use with Fedora or REL often you will get a YAM therefore you will install YAM you try to use YAM and then you will get a tracebacks because it was not functional and that's what we want to avoid remove software before it gets broken from the distribution because microDNA is not going to be supported by upstream and the next stack the next milestone is of course the removal of the DNF DNF from the Fedora from the same reason therefore if your application depends on DNF API or LibDNF or Hockey or Hockey API then Fedora life cycle that would be good if you will start or if you will finish adoption of DNF 1 what's your risk standard process in Fedora when you remove the package is well you know we are not going to support it and well with adoption you can take it you can take it you can start with support but you know you have to support that's that therefore and I am passing the line ok so let's see the DNF 5 in action first we have prepared some command line examples comparing the original DNF with the new DNF 5 experience so we have prepared like two separate containers both with the same configuration and on the left side I will show you the command using the original DNF and on the right side I will run the same command using the new DNF 5 so let's start with some common usage like installing a package so we can install for example the Glypsy devil package and the same one with the DNF 5 and what's happening now is that the DNF needs to fetch the metadata from other repositories before being even able to tell that the given package is available so we have in the system configured the links to demo repositories and from there we are downloading the metadata files about what are the packages available, what are their sizes and what's the relationships between the packages and we can see the difference that Iada was talking about when we look at the R38 repository there is like 34 megabytes downloaded for DNF 5 while for the DNF there is like 83 megabytes of metadata downloaded and it's mainly because of that the files are now not downloaded by default and here we can see also the calculated transaction or in other words what's going to happen if a user confirms the installation so DNF tries to find the best available way how to install the package of the given name and if it wouldn't be possible it tries to provide the reasons why it is so we can also see some differences in the output we have some more information regarding the DNF 5 there is like what is being replaced and also the sizes here now refers to the installation sizes of the packages while in the DNF it was the download size so we can just install the packages we can see there are some connectivity issues but there are some changes in the output but hopefully for the better user experience now so when using the command line environment we don't probably want to type a lot so in the DNF there was already some support for auto completion where we for example want to auto complete the mark command we provide with some suggestions there when we press the double tap but if there is some sub command for the given command we want to use the DNF doesn't provide any suggestions and instead it incorrectly provides already the like install packages as the argument for that we can try the same thing for the DNF 5 you can see that mark command is also completed and also the sub commands are provided for the for the user also with some bring description what each command is doing so this is also hopefully for the better experience and now let's try to compare how fast is querying the information about packages so we will first run the most simple query command which is the DNF repo query which basically lists all the available packages from the connected repositories so we can just run it for both commands we could see that there was a little bit faster for the DNF 5 but we can measure it precisely using the time command so we can use the DNF repo query and also we will suppress the output so we will actually really measure the execution time of the command and the same one for DNF 5 and we can we should see there the improvement there is like more than 2 seconds and less than 1 second for the DNF 5 we can also try something a little bit more complex so list all the packages with what depends on the core library and the same thing for the DNF 5 and there the difference should be more significant in this case there is like 6 seconds for the DNF 5 and there should be more than 3 times slower for the DNF yes it's like 18 seconds okay now let's try the same use case as I shown before with the installing of a package but now using the DNF 5 API so this is how a simple python script installing some of your favorite package looks like but let's look at it step by step so first we need to have installed the DNF binding package for python in order to simply import the libdnf5 library like this then we need to create a base object like the center point of the DNF and tell that we want to load the configuration file from the default system config file and apply it then we need to prepare a data about repositories so we prepare a sec with the data about all connected and configure repositories from the system configuration file and then we will download the need in metadata and parse it into the objects now we have like everything prepared to install DNF what we want to do or like what is our goal in the script and here it basically means installing a package of a given name in this case it's the nudoku package it's a simple sudoku game for the CLI environment then when we have the goal specified we can resolve it meaning calculates the whole transaction what's going to be installed, removed from the script which is followed by downloading the needed packages from remote sources and then actually running this transaction which will perform all the needed actions in the system these two actions here are separated because in some use cases we actually don't want to change the system and for example just download the packages right so now we can run the script I will just show you that there is no nudoku command yet and I will just run the API script and it will simply just install the package hopefully maybe and yes it should be now installed and here we can play the sudoku all day long in the CLI okay nice okay so now we are heading to the end of the presentation so I would like to briefly introduce our core team members that are participating in the DNF 5 development so here Marek Boha, David Cantrell, Evan Goudy there are our presenters today Alysha Matej, Jadam Raczek is our great tech lead Jaroslav Roval and Nikolas Sela and I would like to also mention our QE team I think we have a nice and tight cooperation with them and they help us improve the CI stack and implement the API tests for the Python and now I would like to invite you to connect with us and collaborate on our projects our main interface is on the github channel so we have their issues which we would like to invite you to discuss what things are bothering you and what you want to improve the pull requests their review process and etc and also we have we try to make label all the things we think that are good for the new contributors with the good first issue label so watch out for these on the Kevin board you can see all the progress on the work items like what is being reviewed or what is working progress and what's planned to be done soon in the to-do list and for general Q&A there is the discussion session also we are on the bugzilla which we watch this for the Fedora bugs so yeah and we try to watch all the items from these channels regularly every week so we can provide the answer like the response to reporters quite soon okay thank you well the question is that we turn off downloading file list by default but why did not we use file list from the primary is it correct we do we do therefore everything is in primary we read therefore it means if you require something from BTC from the configuration or if you require binary from user bin standard directory it's not from Fedora yes everything will work like charm so the question was when I run whether it downloads file list in background well not yet but for example if you request a file I mean install and any file DNF automatically downloads a file list therefore this already is present and I think we will extend for example if you want to list files and so on that there will be a logic but it will take a time to implement it everywhere I don't know who is there maybe it's Bishak I'm sorry can you share the question you mean hard drive or RAM RAM well the question is well that there are some rumors that DNF5 is using more RAM memory than micro DNF I think this is not true because micro DNF by default downloads also file lists and the most requirements for the RAM is a file list before you process them therefore it means there should be a significant decrease in the default behavior of DNF5 in any operations but you cannot count on that I mean if a user will try for example install file then it will download the file list and again it will require additional memory for processing these metadata well we have no other choice well the question is that there are some rumors that modules are going to disappear and the question is what's the plan for the DNF team well we have no other choice than to implement or modularity use cases because it is expected that on one system you will be able to handle non modular and even modular staff therefore we have no other choice than to implement modularity like it was in DNF last question okay well that's your answer well you can read the history well the question is why not Rust the long answer Rust wasn't an option at the time when we start with the development of DNF5 and we have good reason why to not start short story long story is on the Fedora devil please read and thank you very much and please don't hesitate to ask questions after the session we will be happy to answer any of them yeah okay welcome so yeah I hope you are all here to contribute to Cento-Western Kernel and maybe not maybe you just like get inspiration or maybe get scared away by what we have and what we have built anyway people external to the REL or Red Hat Kernel teams can contribute to the Cento-West stream and by means of REL taking the Cento-West stream to the REL Kernel so first I will spend few words about what is the Kernel package in Cento-West stream it's a package as any other and right except it has some specifics so it's RPM it's actually set of RPMs but the specifics are that the source repository for the Kernel the Git repository where the work is done is not the repository from which the RPM was built this is different to most packages in Cento-West stream so it has separate repository more than later second difference is that the work on the the Cento-Western Kernel is parallel so imagine like bunch of number of surgeons working operating on the same body one of them is like maybe updating the circular system replacing arteries with something better while another one is replacing a leg at the same time so there are some challenges especially when the arteries go to the leg they have to coordinate somehow just to give you an idea how massive it is I checked yesterday so currently we have 149 merge requests opened for Cento-West stream Kernel 9 some of those have several hundreds of commits in them of course to have all of this to be manageable the Kernel teams have to establish some processes on that because it wouldn't work otherwise so there are custom rules custom processes that needs to be adhered to if you want to contribute on the other hand the benefit of all of this is that we basically need a dedicated person to maintain all of that to merge the stuff to take care of building the packages to the disk gate to the repository from which the packages are built so we have a dedicated person or maybe even persons for that and they deal with the Cento-West stream processes themselves so the distribution have some processes but the contributors to the kernel don't have to deal with those so we basically exchange one set of processes, one set of complex processes for another set of complex processes so this is the introduction in order to understand what what the work entails, what are we basically constrained by let's first talk about the expectations so let's take a user, user of Cento-West stream user of RHEL what they expect from the kernel of course stability what does that mean that means no regressions stuff that worked before should work after a kernel update well it's more, it's performance no performance regressions the user probably wants after update for his workload, for their workloads to work as fast as they did before so performance regressions no no API changes of course if you have a program and then update the kernel you still expect your application or program to work basically every application interacts with the kernel so the API that the kernel provides the application should not change at all this is what you expect well then we have kernel modules you may be using third party kernel modules NVIDIA driver for example for your graphic card or whatever and you want those drivers after the update so the kernel update so the problem is that those modules, those kernel modules the kernel drivers, they are using the internal API of the kernel so you want this application binary interface between the modules and the kernel to not change at all well it turns out that some people want more not just the binary interface they want also the internal API which means I take the driver source code from the old version update the kernel and compile it against the new kernel and it should still compile sounds like the same but it's actually a very different thing no behavior changes of course I want the stuff to run exactly as it was running before the update so yeah in short do not touch the stuff I'm using red hat please keep your hands off the stuff and also please bring me new features we want the newest Chinese so please give us what we need there's that new hardware we bought we want it to work in your next release so please update the drivers for us oh and we also heard about this cool new feature that upstream kernel has so please bring it too update the core update everything you can except for the stuff I'm using don't touch that stuff now do you see the problem that my stuff is different thing for different users so everyone want us to update everything except for this stuff so this is the situation we're in we need a compromise of course we cannot deliver all of that at once so the compromise that we arrived to it's a compromise nobody is happy about it but apparently people are still buying rel and using CentOS stream so it's not that bad so the compromise is no functional regression if something worked before it should work after the update performance yeah it's we should not regress performance but imagine cases when you do something and you improve performance for 90% of users buy a lot didn't touch performance for 9% of user and for 1% of users you slightly slowed down their workloads is that okay is it not who knows yeah so there's some kind of balance so yeah but probably slowing down networking for 90% of people are slowing down disk storage that's probably not acceptable then yeah of course the user space API that the kernel provides to the application that should not change but it should not break so we can change it if we maintain backwards compatibility probably or if nobody notices that's okay as well right the kernel modules yeah they should probably work but here is a problem the kernel level of land upstream the community kernel it doesn't care about internal API stability it changes all the time the interfaces inside kernel and to the drivers they are constantly changing so if we are to bring in some new features we cannot keep this interface completely stable we can try to maintain some backwards compatibility but it's like a lot of work and nobody's enjoying that work cost a lot so yeah so we have to limit somewhere again it's about finding a balance we also are updating drivers not all drivers obviously we don't have infinite people working on the kernel but we are updating drivers but again sometimes updating a driver means bringing in a new kernel core feature that the driver is using yeah so there are dependencies and it's not always easy because sometimes bringing in some new feature from a steam it's risky yeah we risk stability we risk introducing regression so again it's kind of balancing act so we might update drivers but maybe not enable all the features that the hardware support supports and of course we are updating the kernel core features again balancing that with stability not everything is possible without compromising the other stuff so yeah as I said I'm always happy but hopefully it's good enough for everyone how does this work in practice when we started and now I've been talking mostly about basically only about CentOS Stream 9 but it applies to also the other streams so for CentOS Stream 9 we had when the development started there was a kernel 5.14 and the CentOS Stream 9 is still on this kernel even those years later so we have kernel 5.14 but we are back porting stuff for upstream we are taking commits from upstream and applying them to the kernel and that ranges from single bug fixes for like simple stuff single features to like large rebases of all drivers and all subsystems so for example for you to know what I'm talking about the XFS file system is up to the upstream kernel 6.0 the USB subsystem all drivers, the core like everything is on par with upstream 6.2 you know the latest kernel in the upstream is 6.3 so this is pretty new in the XFS subsystem this is pretty core stuff that's like connected to memory management and all of that it's 6.2 in the kernel in the current CentOS Stream 9 wireless including all drivers 6.3 multi-party cp 6.4 that kernel is not even yet released upstream so you see all the different stuff is kind of the kernel replaced different version so maybe you get the franca kernel terminology now or I can tell you the term now this is what we have it works it works but one thing I forgot and I will stress it multiple times we are taking stuff for upstream it means everything almost everything we strive for everything that we have in CentOS Stream to be upstream we are taking stuff from upstream everything is to be upstream first so as I said all of this can stick together and work reasonably only thanks to thanks to several things the processes we have around that I will talk about in a minute and the second and very important thing is testing so this is the testing that happens whenever a single change be it a single bug fix or that large rebase whenever it is applied to CentOS Stream kernel first testing happens when a developer does the work so when we back port stop from upstream or when we develop stop upstream and then back for it so this is where the first testing happens obviously nobody wants to just push over the fence untested stuff then automatic testing kicks in we have CKI there where talks about CKI in previous DEVCOMs if you are interested you can find the recordings online so this is continuous kernel integration testing large test suite with a lot of infrastructure around that so this is run we have Ernestiv Linux network stack testing which is specifically for testing networking both functional performance and we are adding more over time so this is run automatically then there is pre-verification which means before the change can be merged to the kernel a human and hopefully or maybe automation if it is there if there is automation for the particular feature or particular bug but someone someone else then the original outer that is important tests this and if it passes only then this is merged to the centoestrian kernel then there is integration testing because it is nice that you test it one feature or one bug in isolation but it can interact with those other 150 parallel changes to the kernel in weird ways so once this is merged we need to test again so it did not conflict or did not influence by other stuff in negative way and last thing is proper QA testing done by QE engineering that is really comprehensive testing of the whole kernel and all the features so that is a lot of testing ok so with that let's let's look how it looks from the point of view of the contributor to centoestrian kernel the centerpiece is the merge request we are using gridlab for centoestrian not just centoestrian kernel but the whole centoestrian development so this is the URL of the centoestrian 9 kernel source repository which is where the development happens it's completely public you can go there you can watch what we are working on and of course contribute so merge request against this repository that's basically the centerpiece around which everything is built however before you can open a merge request to file a bug or issue every change we do needs to be tracked so you go to bugzilla or jr doesn't really matter file a bug or open an issue there be sure to select the proper product component subcomponent product and past linux 9 component obviously kernel subcomponent in bugzilla whatever you think is the best fit the reason for that is you want your change, your merge request to reach the proper team in the bug or in the issue explain why you are doing that explain the benefits of it why it's important if you come as a contributor to centOS team kernel it's not the amount of work that is connected to the contribution it doesn't end with submitting pages, submitting comments there's more, there's testing I talked about someone has to do it and there's maintenance, someone needs to maintain that stuff for the whole period where red enterprise linux is maintained which is many years so submissions don't come for free for redhead so explain the benefits, explain why the change is good and also because it means work for other people try to make it easier for them provide detailed testing steps so the quality engineering is okay with the change because they know okay this won't be that much work for us don't forget to note that you will be submitting the main request so you are not asking the developer to do the work make it easier for them and be mentally prepared for the option that it might be rejected so we have a bug or issue then now let's do the work we clone the repository as usual from gridlab starting create our own branch based on the main branch this is where the main branch is where the development happens apply our patches and now let's stop at that now this is the first thing about the processes the commit messages they have some mandatory format I will explain it on the next slide so it's not like freeform commit message so that's the first thing apply upstream commits one to one so one upstream commit is one CentOS stream commit do not squash commits together and of course it's a Frankenkernel so it's unlikely that commit from upstream will apply as is to CentOS stream kernel most likely some commits do if you're lucky but most likely you will have to do some changes adjust to the old APIs that are there or something those changes need to be explained there should be reasons why why the the back port differs from the upstream let's look at the example this is a real commit from CentOS stream 9 and let's see first the mandatory fields there must be a bugzilla or JIRA link in every commit every single commit must have this line and in this format it's parsed by machines so even like training white space is a problem you will be taught so it's not like it's not entirely black box more on that later so this is the first thing second thing every as I said there is upstream first policy so everything that we bring to CentOS stream kernel must be upstream this line actually says this is the hash of the upstream commit this one is back porting exactly this format it is parsed by machines third thing that's mandatory is the signed off byline which has some legal implication if you're not familiar with the certificate of origin see the documentation must be there and here I will point out this section this is not parsed by machine this is for human consumption for reviewers except the conflicts line that must be in this form explains what was changed compared to upstream so this is an example how it looks like so you apply patches and then you can finally submit a merge request the usual GitLab workflow nothing you are not familiar with so basically you fork add remote push to your remote open merge request using web UI or API and you're done almost in the merge request description you have to repeat two things the bugzilla or gilaring and your signed off byline when you do it then the information kicks in on GitLab kernel is built and it's tested if you are an external contributor then only limited tests are run it's not full test it's for security reasons once your submission is reviewed by someone from Red Hat they will start the complete pipeline and we don't want the code compromised by external contributions obviously then also some automation is run which will do checks on the submission it will check better the fields I described like all that bugzilla and so on it's present it will tell you in a comment in the merge request it will check whether the bugzilla or gira issue files whether it's in proper state there's something you cannot influence someone in Red Hat has to put that in the correct state the bug or issue another check is for missing fixes so if you back ported something from upstream and there was a follow up upstream that fixed that it's not there you will get notified it will also check for conflicts between submitted merge requests it's quite important given the amount of merge request we have then there will be some labels added actually quite a lot of labels so this is an example of one merge request there are subsystem labels here basically those indicate which subsystem were touched or were modified by your submission this is a large merge request it has I don't know like three or four hundred commits so it touches a lot of subsystems there are labels with the current state like exnet needs review or CKI and so on it will also add approvals to to get lab to merge request meaning list people who should review the merge request and it will notify them then review happens this is important part but it's nothing surprising so someone from Red Hat or from community will review the commits to submit it will make comments now it's important to respond to the comments if someone has objections or ask for explanation provide that the discussion happens in the merge request in the comments there and when finally when it's reviewed the merge request is ready for QE testing that means you get all of these labels in the okay or warning state at worst your commit description is okay no merge conflicts the CI testing succeeded and somebody or all the approvals were given all the needed reviews were done then and bugzilla and JIRA is like in prepared for testing state you get ready for QA then the testing happens the pre-verification again done by someone else then you you will probably agree in the bugzilla or JIRA who will do the testing whether it will be Red Hat or whether someone from your company or your friends provides the testing whatever then if it's done then you get bugzilla or JIRA okay ready for merge it's merged or the maintainer will merge it one short notice sometimes if the for example if there is a driver update it might depend on other changes in the kernel so if you for example update the networking driver your favorite networking driver to the newest version it might need some changes in networking core to support new protocol perhaps so that's a different merge request of course it's not the same stuff so different merge request needs to be filed for that and to save us time you can actually submit those merge requests in parallel you can depend on other merge request which is not yet merged by pulling the code applying your purchase on top of that the only thing you have to do is put this line to the merge request description that you are depending on this particle another merge request and then the automation we have will be able to figure it out find out that what is the first commit that is really part of your merge request so that's like top of the dependencies will put this label there and then you're good the lab is not aware of that this is a feature that is missing in the lab so we implemented it ourselves and as a consequence it's really hard to impossible to use the web UI I will ok 15 seconds we have tools that will help you with that the lab is a CLI tool that helps with submitting merge requests reviewmatic which is a custom tool that helps with reviews that understands the web flow we have so that's it here's a link for full documentation web process if you're interested in details and now questions so the question was that by doing the back ports we are repeating the work that upstream does with the LTS kernels and better it would not be easier to use them yeah the problem is that the quality of the upstream stable kernels is not satisfactory or a distance we found there are things like machine learning involved which selects which commits are back ported and also basically whatever someone thinks is decidable for those stable kernels strewn over the wall and applied and it does not mean this does not meet the stability and no regression guarantees we need so we really want all the stuff that's going into that to get reviewed by us and to be tracked to have someone who's responsible that's like we can go back to say ok yeah you broke it you fix it yeah so all of it together you found out it's like for stability it's basically no go yeah there's question yeah so the question was what what the integration testing consists of what it is basically right now it's done by CKI again but it's like larger battery of tests a lot of different tests I don't remember but there's LTP and like many more the question was whether they are open source I believe so I'm not 100% sure of everything but I think so please don't like yeah I'm not sure I'm not sure thank you hello everybody have you ever wondered how to maintain an RPM package in Federa, CentOS, CentOStreem, REL or any other if so I will try to answer some of your questions today I'm a little bit afraid that I will open more and more questions than I actually answer but I have plenty of time after the talk so we can discuss it my name is Lumier I'm from Python maintenance team in Redhead and we maintain Python packages quite a lot of them without further ado the agenda is kind of packed I'm not sure I will manage it in 25 minutes so I will try to do it quickly but we have plenty of time after that for questions so let's start this is probably well known Bell Curve describing the normal distribution of basically anything and if we have it for the users of a software then the average user is somewhere in the middle and the line here is a software and on the right side from your point of view it's too new not well-adopted, not tested and so on and on the left side is too old absolutely full of flaws and full of problems and security bugs and so on and it's average so my wild guess is that a lot of you is somewhere here in the middle something which is still supported but well tested, not too new not too old and that's fine the problem for us especially in Federa and Redhead Enterprise Linux is that we are on the both ends of the both extreme no software is too new for us I will show you some examples and no softwares are too for us, I will also share some examples of that basically the flow of the code looks something like this that's if you really don't know what does it mean to maintain an RPM package that it might simplify it look like this so somebody creates a software, library application whatever and on the other hand there are users who want to use those applications and libraries and in the middle is a magic man or woman the maintainer, package maintainer because usually if you want to install some software use it, you see it on the internet and oh, this is a very nice application and you really want to install it if you use a Linux distribution probably your first step will be dnf, search yum, search, epithes, search something, something and you really wish the application to be packaged into that distribution because it means that you can install it and somebody else take the care of all the stuff around it and compilation and testing and all stuff and that's who we are in the Python maintenance but not only in that one team also in other teams as well we are taking the code from upstream creating RPM packaging and delivers those packages into upstream into our users, customers and so on and so forth I won't go much into detail today I won't be telling you about the comments and pushes and PRs and updates and here and there what I would like to show you is what the challenge is you might expect if you want to be RPM package maintainer so I will use a couple of terms here the list is short when I say upstream development I mean the development on GitHub, GitHub somewhere the upstream code applications or libraries when I say downstream development I mean the development in the Linux distribution the packages who are packaging the stuff writing spec files and so on when I say CVE I usually means some security flaw security vulnerability and EOL is a shortcut for end of life software without any more upstream support so let's take a look if you want to be a package maintainer what's your responsibilities you should try at least your best to keep your package functioning which usually means that it should work of course it should be buildable from sources so the process you describe how to build it, how to compile it how to test it and how to create a package from it should be repeatable after some time it should be buildable from sources it should be installable the fact that you have RPM package doesn't mean that it will work on your machine so it should be installable and should just work it should be up to date so we should update it regularly you should fix issues and potentially CVEs that's something I will describe later in the deeper detail you should also try to limit the impact on other packages which is something we will also focus a little bit later and you should try to limit the impact on the users which is even more even harder than limiting the impact on other packages but the best thing you can do as a RPM package maintainer is that you stay invisible if everything works nobody knows about you because after you install it and it works I don't care who maintains it but if something breaks then it's your responsibility let's start with some very very simple example the package name IOPING that's a very simple utility it can measure the latency of your hard drive the same way as the classic pink measure the latency of the network that's it in C or C++ if I'm not mistaken and it needs only two dependencies to build GCC and make use make files and it needs to be compiled GCC and make and only one dependency to run which is the core of all the Linux distributions so nothing you should be worried about and no packages needs IOPING to build and no packages IOPING to run which means that this package is a so-called leaf package if you imagine the tree of dependencies of RPM packages that one would be at the very bottom nothing depends on it which makes the situation for you kinda easy for the maintenance because the probability something breaks for you is very low well if anybody moves the GCC from 15 to 20 then it might break for you but other than that the probability that something breaks for you is limited because you have very limited set of the dependencies and the probability that you will break something for anybody else is also kinda limited because no other packages depends on those on that package directly so only if you break it you might expect some bug reports from users and that's something which is really hard to measure because we don't know how many users are using the specific components the specific packages so what could well wrong me as I mentioned somebody can find a bug in the application that might happen of course or security vulnerability and the problem might be upstream so in the code of the IOPING itself or the problem might be downstream or in the files or manual pages or documentation whatever or the problem might be on the left side of the previous image so somebody update or change something and it might break your package or you change or update something and it might break other packages or the expectations of users in that case what to do as an RPM packageer well if the problem is upstream test it in Federer and propose the fixed upstream and we are doing it a lot that should break the first kind of expectation or that basically RPM maintenance just taking the code packaging into RPMs and giving it to users but that's not usually true and I will describe it a little bit more detail later the problem is that in Federer we use the latest greatest software and that means that we are testing the software with the latest possible components of the dependencies and that means that it usually breaks something especially in case of Python so we are the first ones who actually see the problem so it's not just about hey I can report the problem which is the second best thing you can do I can just the report it and wait for the upstream to fix it and after that happens I can just take the patch or the new release update in and it's done usually it's not so the best way you can help as a package maintainer is prepare a fix you can test it on Federer where the original problem appeared so basically the environment you want to test it is actually already prepared for you the second best thing is to understand the package and you can provide all the necessary details all the versions of dependencies and configuration flags and compilation flags and all the details around that might be really helpful for upstream and by doing that you might actually make upstream to fix that problem very early for others so you uncover a problem you fix it, report it and those might make other users on the bell curve in the middle will ever know about the problem being there at all or you can do nothing which is bad if you have broken dependencies then the first point is the same as a bug or security vulnerability you just need to take your software and adapt it to the newer version of your build or test or runtime dependencies which might be a little bit hard if you are really lucky and somebody is moving a package to another my version like switching from gcc10 to gcc20 for example if you are lucky they are aware of you depending on the gcc10 and they might create a compact package for you which means that you can switch the build dependency and say ok for now I'm fine I have to take a look on that because the gcc10 won't be there for a long time but right now I'm safe users can still use the latest version of ioping but I have to think about it but honestly that doesn't happen so often so you have to switch your package to the latest version of dependencies and of course if the problem is downstream in your package then it's your responsibility right that might sound like a lot of work does it that's nothing not yet we are not there yet let's move to the other side of the problem because we have a cookies not actually cookies but we have a lot of stickers so after the dog I will give you some stickers so let me step aside for a while I will describe in a more detail what we do in Python maintenance especially in Federa and it loves Python almost all of them Python 2.7 not that much but still and we would like the Federa to be the best distribution for Federa developers at all and for Python developers at all which means that we maintain multiple Python interpreters at the same time continuously for some of them mainly because it's the same in REL and that means that if you want to develop a Python application for REL 8 or for REL 7 or for REL 6 or REL 9 and future 10 then you can use Federa for that you don't need a virtual machine or anything virtual with all distributions you can take Federa take the Python whatever version you want you can create a virtual environment and it's all great if you are on the other side of the bell curve as we are sometimes then you might want to test your application with the latest, greatest version of Python which is also possible we have Python 3.12 which is now first beta but we have it in Federa since first alpha which means that you can really test with all the supported Python and we support a lot of them much longer than upstream and also we have some alternative interpreters like PyPy we had also Jyton but not anymore and also we are very fast really it took only 7 days for 3.11 alpha one to appear as a build in Koji which means that if you really need the latest, greatest Python there is absolutely no need to compile it from sources from you and that's really, really good first beta and only 3 days for 3.12 alpha one after it was released upstream it appeared in the build system of Federa and you can use that which is quite awesome and what's maybe more important than that is that these are much faster than the usual CI providers you know that you can use Python for testing on GitHub actions and Trevys and CircleCI and whatever but you will wait months after the official release of Python 3.11 or 3.12 in the future you will have to wait months after the official final release to appear in the CI system like those you are not a Federa user shame on you but it's not a problem because we offer all the Python including with the TOX which enables you to test one application with multiple Python versions all the Python TOX based on Federa as a container image available on Docker Hub and also as a GitHub action so we don't need to use Federa directly you can use it only in the CI to test with all the supported Python versions alright but the situation for most of the interpreters I mentioned on the previous slides are the same as for the IO-Pink because the alternative versions are basically the leaf packages as well as the IO-Pink because nothing should really depend on it they are not meant to be used in production they are meant for development and testing but there is one the main one the main troublemaker and there is only one in every Federa release which is the main one it's basically usually the latest stable release of Python and the next one which causes us the more troubles the most troubles and most work at the very beginning of the Federa release because all of those are usually leaf packages for testing and development that's fine that applies also for the future versions but the main one is the troublemaker here let's increase the complexity of the example let's do the IO-Pink a little bit the graph for Python 3.11 which is the main Python interpreter in Federa 38 and Federa 37 looks like this we need 45 packages to build it we need 24 packages to use it and by use it I mean the Python interpreter and its standard library not all the components provided by that package but that's the part we really cannot affect something might break here and the probability on the left side of the image is much higher than for IO-Pink because those packages are much more complex than GCCN make and the numbers are just really, really higher but the funny part begins on the right side 4,400 packages need Python to build and 5,600 packages need Python to run like wow so what we can do we can bump a version building in Koji and release and do whatever you have to fix it for yourself let's say that's obviously something we cannot do and that's the reason why it generates so much work for us the reasons are basically two the first one the newest Python releases or every Python release usually contains some backward incompatible changes like removed the deprecated modules because when developers see deprecation warning for two years or three years that's not enough so when the module disappears from the standard library then a lot of packages or a lot of tests and stuff like that starts to fail, right? so there are some backward incompatible changes and also one more problem is that Python is interpreted language and that's true but a little bit of that is compiled let's say and that compilation process creates the PYC files which is basically a cache version of imported modules the problem is that the imported modules the PYC files are written under hard drives to the locations where only root or super user can access which means that we have to we have to ship those files in RPM packages and that means that those are installable only by root so if a user imports a module and it tries the Python tries to create the PYC files somewhere it's not possible then it won't happen, right? and the compilation step would be doing the same again and that means that we need something called mass rebuild but that's somewhere in the middle of the update process the first step of the update process is write a change proposal you can see those on the Matthew's talk today so write a change proposal describe your plan describe the contingency plan and so on and so forth then package a new Python version as a new package we cannot just update the old one we want to maintain it so we want to keep the old version so we have to create a new package and then rebuild the thousands of packages in copper copper is just let's say an alternative build system we can use for testing and during those rebuilds you will uncover a lot of bugs some of them you can fix some of them you cannot you have to report the bugs upstream of the libraries Python itself because we are just humans and Python is just a software full of bugs as well or downstream fix it downstream in the Python packaging as well so really incredible amount of work and it usually the whole process I'm describing here which usually takes the whole year because Python releases are somehow aligned with the Federa releases which means that fall this year we expect Federa 39 313 alpha 1 and we will have the whole year to prepare the Federa packages for the new Python version rebuild all the packages in the SciTech again the SciTech is something like a branch in Git nothing to be really worried about so we prepare all the packages we know about all the problems we try to fix in copper and then we can rebuild them again in the SciTech of the development version of Federa actually happening right now the process of rebuild in the SciTech started a couple of days ago and we are somewhere around 3,000 packages already built and then you merge those rebuild packages back to raw height and ta-da you have the Python in that case 312 in raw height and that's awesome it's incredible amount of work and thanks mostly to Tomasz you can talk to him after that Tomasz is doing incredible amount of that work for through the whole year maintaining that process it's incredible and some numbers 3,361 packages in copper almost 50,000 of builds and just almost half a thousand bugzilla and those are just bugs in bugzilla I'm not talking about all the GitHub issues and so forth so in Federa updates are really complex and you have to be really careful because you can break all other packages or a lot of system tools which is case for Python because a lot of things really in the system depends on the Python itself like the installer Anaconda and the DNF version 4 the 5 is not longer in Python but the older version is so you have to be really careful but the benefit for the updates is that it usually fixes the security flaws for you and the bugs so if you cooperate with upstream it usually don't take that long to release a new version of the upstream project and you can update in Federa you don't have to make your own fixes you can but you don't have to and you can do it in a similar way as we do you can assess the impact of the change in Federa if you want to let's stay here in this room and Karolina will tell you more about how we do it in a copper in a little bit more detail but let's move to the other side of the bell curve what if we cannot update what if we cannot change it that much that's a problem we have to solve in a rail the promise for rail customers and users is that we will try I added the dryer mentioned we'll keep your system secure without breaking bad bar compatibility of a provided components which means that the updates are usually out of question usually not every time and I would like to describe that on two security vulnerabilities we had to fix and how we deal with that on multiple levels to show you that it might be really interesting and also really complex fixing something in the older components in the old systems for the customers who depend on the maximum stability let's take the first one which is a web cache poisoning in your lab which is a standard library of Python and the problem is that when you have a URL address here it's usually parsed somehow by a proxy and you have to set the delimiter for the key value pairs here and Word by Web Consortium recommends using the ampersand only but the Python had semicolon and ampersand both at the same time as a default all right we don't need to go much deeper into the detail the thing is that if the application is configured differently than the proxy in front of that application it might mess up the results in the proxy and that's what the vulnerability is called web cache poisoning okay so what we can do we can follow the same recommendation and that's exactly what upstream did all right recommendation is to use ampersand well switch the default to the ampersand that's too rash that's too rough find somewhere middle ground all right let's switch the default to the ampersand and allow users to configure something else that sounds great but the separator keyword argument added into Python don't allow you to use more than one character so you have to choose either the ampersand or the semicolon or whatever you want to but not multiple at the same time and that's a problem right so the Federa we are latest, greatest, whatever it means so we just followed the upstream resolution we updated all of our Python's because the fix was kinda quick so it was implemented upstream they did security releases so we just updated all our components and for the AOL interpreters which are maintained by us and not upstream anymore we just backported the patch so Federa follows upstream so default is the standard and that's what it is, deal with that but for the rail we cannot do that that's a problem so we basically took the patch from upstream adapted in a way to allow the old behavior by default and added the warning so if you use that default so which means that your application is not ready for the new codebase if you use that default somewhere in your rockwheel appear a warning link to the documentation and the documentation will describe to you what's happening but after that we still cannot expect our users to change their codebase so we have to find a way how they can set the new default without actually changing their applications so in the patch in the same thing we have provided a way in the Python code itself which is kinda obvious but we also provided a way how to send it, set it we are a configuration file in ETC and also a configuration we are environment viable so we didn't change the default behavior, we added a warning to our customers saying don't do that please because on the other hand they might use the default for Python and then might have some old proxy configured in the same way and everything is fine if we change the default it might silently break a lot of stuff silently, that's a problem so we added a warning we added the possibilities how to configure that so everybody can do that on any system setting variable adding a file into HCC and whatever so we had to be very careful and in that specific case we have decided to not change the default setting another CVE and the last one that we have talked today about is the tar file module directory travel zone and if you know that the second part of the identifier is a year then this one is waiting for fix for a long time really I've tried to set to Peter like hey we can wait a couple more years and then we can celebrate the 20 years of that CVE and fix it right that would be great birthday gift but no Peter decided to fix it this year and we are actually in the process of doing that so I will describe that so the problem of a tar is that it was designed to back up the whole systems complete, completely which means that it supports Simlink, Hardlink Extended Metadata some special files relative paths and a lot of stuff the problem is that if you unpack an archive from untrusted source it can completely break her system so it is not something we would usually need these days we are no longer back-upping our system on the tapes so that was it but then there is a problem what's the correct way how to do that and the upstream had the perfect resolution to nothing we said in the documentation that blindly extracting archives is a bad idea you shouldn't do that and you should inspect the archive before you extract it blindly alright that's true the documentation said that but the problem is that still the best place to fix it is in the tar file model the best I'm going to be pushed from the stage no no two more minutes please the best thing or best place to fix that problem is really into python even the documentation says that everything is correct in the python so peter by another colleague decided to change it into python but it cannot be done in a backward incompatible way he wrote a PEP 706 which is python enhancement prokhozer really low document full of useful information if you want to read that and it added filters to the tar fully trusted which means that you know you have to bring tar somewhere in between and data which is the safest one only allowing some capabilities of tar python now have a deprecation period in python 312 and 313 the default is the same but there is a warning if you use that default same thing as the neural lib and then in 314 the data will be the new default which is how it works in python so in federa we basically followed the same thing and backported the patches into the LL interpreters so now all of them behave the same way and in rel we have decided to do even more strict thing and we backported the patch and set the strictest filter to be the default one in the rel distribution so we can finally say that after many years the vulnerability is finally fixed and that's one backward incompatible change we had to do and we will see what we will cause so some conclusion some conclusion being a package maintainer might mean a lot of things a lot of different things for a lot of different people if you want to maintain a leaf package with a useful utility please do that it won't cause the whole free time you have but if you want to maintain something like python in federa rel be prepared to sacrifice something and in federa it's good to know your packages because if you do so you can prepare patches you can help upstream fix the problem if upstream is no longer interested in their packages you can maintain them you can help to evolve the whole ecosystem or at least create reasonable reports in rel you have to you really have to understand the packages you have you really have to be able to backport patches just down the road to the very ancient component open the book full of dust and backport the patch into that so you have to really know what you are doing and you have to be prepared to do some tough decisions like we did when fixing the vulnerabilities alright and that's all thank you for your intention because I repeat the question why we have in copper less packages actually depend on the python that's a great question you actually look at the presentation that's great so the reason is that we follow the dependencies between those packages and we don't try to blindly build everything we first prepare the dependencies for all the packages and then we add them so it means that we weren't able to build all of them in copper but the side tech we built which means that some of the packages were in the buildable state so we didn't add them to the copper because not enough dependencies at that time thank you for the question well the question is whether we try to discover runtime issues with the new python that's a great question and honestly the amount of work we have with the build time is a lot so there is not that much time to do so but we really think that every single rpn package should contain tests so we really depend on the components depending on the python to have some tests which kind of means that we should be able to get them to get the problems in the runtime but if the package has no test there is no way to verify it works with the new python yes, unit test mostly because during the build in the build system there is no access to the internet so no that much of integration testing and stuff but every single package should have a check section with the unit tests and if we rebuild the python and then rebuild the py test we hope that py test has as good coverage as possible so it will uncover some possible problems and I'm sorry for that thank you very much hello everyone my name is Karolina Sorma I work at redhead as package maintainer I do a lot of package updates in my daily work most notably python Sphinx which is quite popular documentation tool which has got over 1000 dependencies and every time I want to bring a new update of python Sphinx to Fedora it means that I'd like to be pretty sure that this update will not break anything Lumir has provided a lot of context just in the previous talk why it's important so if you haven't been here make sure to check out later the Lumir stock the process I'm using in my daily work I'd like to describe here today for you this talk is going to present a very practical approach I'm not going anywhere deep in my explanations there is no time for that it's going to be applicable immediately for python packages and for packages that are built from our languages maybe with some tweaks I'm not very familiar with anything else but we can talk about it later in the social event the projected audience for this talk is me 2 years ago when I started my journey as a package maintainer I really needed a lot of context in order to be able to do my job so how does this typically look like there is an upstream project developer someone who creates a really nice library they create a new version me as a package maintainer somehow get to know that there is a new version available that it will be really nice to bring it to Fedora Rohite because this is where the newest freshest, hottest things should go to it's the developmental version of Fedora really good place to learn such updates and that's it I package it, I push it to Rohite build it, I'm done well this is the place where a decision can be made I can either do that and do it just like push it to Rohite I build it locally, it builds great or I can just step for a moment and think maybe I could leverage Fedora quality tools that are available in the ecosystem and what would be that quality tools in a very simplified manner and please don't quote me on that because I have no idea how the Linux distribution is built really distribution like Fedora consists of many RPM packages and in the ideal state those packages are buildable and installable unfortunately that's not the case for all of them but we aim for that and in order to ensure that the update I'm bringing to Fedora is buildable and installable I've got some local tools that I can use which is mainly mock or I can build my update in container or I can leverage Fedora CI Fedora CI does scratch builds and any tests that you have with your package in the Fedora disk it and there is Zool CI which does everything Fedora CI does plus much more so if you're going to leverage Fedora CI go for Zool CI with the only disadvantage that it will take some more of your time and it would be best if you just opened a PR pull request to your own package in order to get any failures before you bring them to the distribution but it's not only about the particular RPM packages when we create the distribution the whole point of creating a downstream distribution is that the packages actually communicate with each other they play with each other nicely which means I can build my package on top of the other Fedora packages and I can use my package to build some other Fedora packages also again Umer has got a great graph showing exactly that and for that for checking the integration we've got a process we called impact check in copper language it's called mass rebuild and you can think of it as an integration test pretty much while the CI is more of a unit test it takes care of the health of one package impact check makes sure that no other package stops being buildable when you bring your update to Fedora Rohite in copper language copper is a build system it builds packages its main usage is not how we are using it we are creating a lot of copper repositories with pro-oA builds and in copper manner what I want what I really want to have is this last build status succeeded this is the only thing I care so the process looks like that I create a new isolated Fedora virtual Fedora I bring my just updated package to that copper repository build it and then take all of the other packages that depend on my package and build them if they all build I'm happy probably my update doesn't break anything if there is something that failed and I really don't like that red journal there then I should investigate and take a look at it so in order to show you this process I took python rich library it's a really nice library to build text user interfaces and the authors of that library and the maintainers of that library in Fedora didn't do me anything bad so they don't deserve what I did to their package I just removed the emoji module I just prepared a downstream only patch which removes one of the modules the package has it's a silly update it's just something I did downstream the upstream developers wouldn't probably agree with me but it's not so far from the truth many times it happens that when you have a major version coming from upstream it contains a lot of removals or renames and some incompatible changes so it can and it does happen and I can still have some packages in Fedora that rely on this emoji module to be available to build and to run so I'd like to know whether there are any and which one and what happens there I've got 32 packages that require python free rich on build time and 22 packages requiring python free rich on runtime copper is a build system I don't have any means to check the runtime thing unless those packages have upstream tests if they run the upstream tests they should pull the runtime dependencies in the build time and then that would increase the visibility for me so the question I'm left with is how many of those packages those build time those packages that have python rich on build time will just stop to build so my catastrophe that can happen is 32 failed packages to start I assume that each one of you has already done the Fedora packaging tutorial and has got installed the basic packages in this process I will describe I will use tools that come from Fedora repository some additional ones Fedora package contains pkg name very nice tool to get just the package name from the whole rpm name string Fedora repository contains repositories I can query I don't want to install packages to my system from Rohite wouldn't be a good idea but I want to query through the RPMs and the metadata there is a nice binary copper CLI to enable me interaction through the command line so that I don't have to click on the interface of copper and more useless parallel comes with a binary parallel to speed things up if I want to send 1000 builds to copper I don't want to wait consecutively one after one the really great tool that comes with dnf is repo query it's complex and it can do a lot of things and it probably deserves its own talk so I'm not going to go into the repo query and what it can do the basic commands to query the repositories is just as follow with the first command you will list all the RPMs all the build RPMs that are in the Rohite repository and all the source RPMs that are in Rohite dash source repository together dash queue and one option to surprise the metadata check which is not something I'm interested in so all the commands have this optional dash queue for a released Fedora the query is a bit different so I just put it there for documentation purposes but I will only continue with Rohite so first question I need to know what gets built from my package I already told you that I just destroyed my package but I'd like to know what is the build RPM what is the name the other packages will depend on in Python case the convention is that the source RPM are not versioned this is the name of my Fedora component but what gets built is Python free dash reach and this is the name I'm going to use in the following commands there can be multiple built RPMs coming from a single RPM and in such case if you'd like to do an impact check you should probably repeat all the following commands for each of the built RPMs that come from your source RPM any other packages can depend on any of the built RPMs that come from yours I will not begin to that too much build time and run time dependencies for the Python world the same invocation of Python reach requires in build time I need to query repository with the option dash dash requires and provided the source RPM name there can be the full name of the RPM and what I get is the full list of built time dependencies if I'm interested in the run time dependencies which for the purposes of the stock I'm not I would invoke the same command with the with the name of the built RPM it's significantly smaller but also I can see that the run time dependencies are listed in the build time dependencies so my package pulls them and probably tests with them so I'm fine here back to the build time dependencies I want to know what other packages are my library Python reach to build I already told you that there are the answer to life universe and everything is 32 but how did I know that there is the commands of course like for everything the repo query that will tell me there is the option dash dash what requires which takes my built RPM as an argument recursively so that I can get the full picture of what would get broken if I really push to the update to the repositories and I'm interested in only in the source RPMs which are the ones that expose built time dependencies those are the ones I can check in copper from all of the results I will pipe them to the pkg name too and this will tell me only the package names so if I run the same command you can see that it gives me something and if I make my bash count that is 32 this is what I want to build on top of my update in order to assess the sanity of my changes knowing that I can go to copper anyone can use copper you only need to have fast account, feather icon system and API token which you can easily get from the copper feather infracult.org api page once you're set we need to do those five steps I mentioned at the beginning so we need to create a new copper repository add and build our package with the update add and build all the other packages first of all all of those steps just before the talk so that I don't clutter our time typing those beautiful commands in my terminal but we will go to the live demo with the results so first of all I do I define a handy variable only so that I can reuse the commands every time I do a new impact check and I do them a lot creating a new repository means that I need to invoke copper binary with create argument it takes another argument name of the copper I want to create I want to create it for Rohite I don't care about architecture really because Python packages are mostly no arc but it may be different for your case there are different routes there are also not only you can also build RPMs for other systems not only for Fedora I'm using copper so it's really universal the important thing is that I'm adding a new repository to my copper it's not only looking at Fedora Rohite but also to Koji why is that? there is quite often a little bit difference between what's in the Rohite repo and what's in Koji packages change all the time the state in which the repository will be 20 minutes later will be completely different sometimes it's like some changes sit in the Fedora repository but the maintainer deliberately or by mistake forgot to build it so we are grabbing the latest packages the latest versions from Koji in order to test on the and because this is a pro way repository I'm not going to keep it forever for small updates 14 days is just enough so I'm deleting after 14 days only big updates require more time adding the package I already conveniently destroyed my package and sent it to my fork of Fedora this git so it's there sitting in the branch DevCon that package I can tell Copper that I want to edit from the source control management system so for that I have an argument at package SCM and again to which Copper what's the URL where my package sits what is the branch name where I pushed the update and last but not least the name of the package I want to build which is Python Rich in my case I didn't have to go to my fork of the package I could only build my local source RPM but then I would not in order it's easier for me to switch any automation I would like in order if I wanted to tweak with my package and push new changes to my fork then Copper has the ability to pick them automatically and when I have added my package to Copper I should build it for that I have a command Copper build package where and what and now I should wait this is important step I should wait for my package to build successfully and only start pushing new packages for that when it's ready if I do it too rapidly and the package will not the Copper will have no problem with going for my Python Rich from Koji repository or Rohite repository and I will not test at all what I wanted to test so provided that my package actually built and it built successfully if it didn't build successfully then I need to repeat it so long as it actually builds otherwise I will not test anything again I can use the utility code parallel to speed things up and another great feature of Copper is that it can allow me to add packages directly from this kit and if I don't specify which this kit it means Fedora this kit so I'm adding packages that are sitting in Fedora this kit all those 32 packages to my previously defined Copper I want to switch on the webhook rebuild it's really nice piece of automation I just mentioned before I didn't call it like that anytime a package maintainer any other package maintainer does a change to their package while I'm doing the impact check while my repository is on in Copper the Copper will automatically know that there was a change of state in the Rohite repository and will automatically rebuild the newer version for me so I don't have to think that oh I did my impact check two weeks ago and it's probably still now because the packages move so fast my Copper will pick it up and another argument dash dash name another option I need to provide to the command all of the all of the packages that require my package build time which is basically again the output of the command you've already seen piped to sort and unique just to be sure that there are no duplicates there shouldn't be but it doesn't hurt last step of the process building the depending packages again using parallel to speed things up I can send to parallel the command build package and it has it looks very similar with some new options so no weight dash dash no weight is an option not to clutter my terminal output I will not have to wait for each build to conclude in order to another one to be sent and the factor of niceness background Copper has got different queues there is a normal queue for normal packages and there is a queue that gets populated in the background so when someone has got a priority to work they should probably use the main queue but impact checks are rarely that important so it's only nice for the other users of Copper not to clutter all of the available workers with my 1000 packages that I will just remove when I see that they build successfully so background is really nice option please use it and again I need to tell each packages I want to build so this is my magical invocation when this all happened my biggest question is what failed this is what I really want to know maybe nothing that would be great maybe something that's not so great with Copper I've got another handy API to interact there is a monitor each Copper has got information about its state how did it go if I just run Copper monitor like that the output will be provided in JSON which is not so nice to interact with from my command line so I can tell that I want my report as a text row and I only care about failed state whether the build succeeded and the name of the package that succeeded or not and in this case I'm only interested in what failed if I do that I hope it will work yes there are 4 packages that failed in my Copper that's not too bad I could go through the build logs of all 4 of them but it can easily happen that I have 10 or 20 or 60 packages that failed my Python interpreter colleagues have thousands of packages that failed it's not that nice to go through the build logs what we often do we use that command to create another control Copper and I have the habit of naming it completely the same with the dash control in order to speed up my typing Kung Fu so with that control Copper I would rebuild only the packages that failed in the first Copper to look for false negatives this could look pretty much like that so provided that I got one Copper repository with failed packages I created a new one control repository which rebuild only those failed packages and from those I'm only interested in those packages that actually succeeded this is my difference what failed in the first one and what succeeded in the second one if the list is long then I can use another Linux utility to compare both results in my case it will not be that bad because I've already done my Copper control just before the talk and now I see that there is only one package out of those 4 that actually succeeded in the Copper which did not contain my update this is my problem this is my Copper this is something I need to really look at that's the process and now we come to results evaluation and this is for now a manual step all of the steps I showed you can be somehow automated you can write Bash scripts you can even script opening a browser with logs from the command line it all depends on the workflow which you are used to each one of us in our team does it a little bit differently so we don't even have one process for all of us but we have automation to some degree in my case I will just go to Copper for the moment I'll show you the most mundane thing that can happen so I'm in my DevCon Prep which is the repository I was doing all the work before the talk I will find packages it was Python Typer that I'm interested in it actually failed not that I didn't believe the CLI but it did there's a lot of builds and they all failed that's not a good sign it's not probably just a flaky package you can see that my build was running in the background and there are the logs there is the Groot, this is the Rohite build which failed and there are the build logs that I'm interested in what gets open is a really nice build log everyone who built a package knows them by heart and I'm of course interested in whatever is at the end of the build log I see that some tests failed and when I start looking at what failed it's like emoji has no attribute replace remember what my update was about I've got it so this is something I would not know if I just pushed my update to Rohite and be happy about it this is an actual problem this is a package maintainer who didn't care about all the rest of the ecosystem but right now I'm looking at it and the real maintainer of the package Python type doesn't know about it they have no idea that this is going to happen how they they are not going to actively go through all the packages their package depend on so when I'm looking at that at this point I'm the most competent person to actually do something about that and I'm going to talk to the maintainers of Python Typer and tell them heads up hey I'm going to bring this update and this is going to break your package you can do it by any means possible maybe opening a bugzilla ticket if there are more packages than one if there are 10, 20 broken packages and you don't know exactly what to do you can help all of them be again buildable you can write to FedoraDevil and say okay I'm bringing this update those are packages that are going to fail this is some guidance I can provide but please in a week I'd like to merge it and maybe you should take a look at it if I'd have 60 packages that fail after checking my impact check that could be a good idea for a FedoraChange so increase the visibility of the issue and if I'm preparing a really big update then most of the time I've read the changelogs and I know or I can find, I'm in a better position to find a migration path so I can help from just telling the maintainers reporting it upstream fixing the issue upstream downstream and providing visibility to providing some guidance I can pick from a big variety of activities in order to help make all of it smoothly seeing that we are going to an end coming to an end I just wanted to tell you about one big copper gocha some packages are quite aggressive when it comes to version and they say that this package only will work with my library which has a version lower than 14 for example my update doesn't bring such version bump but if it did I would expect copper to say ok I have python free reach 15 built in copper this package requires python free reach lower than 14 so we are not meeting the condition so the build fails that is a very reasonable expectation except in copper it would not work it would be false copper has no problem with skipping my just build version with update and going to Koji or Rohide and take the version of the package of the dependency that it needs and in order to mitigate that there is this beautiful shell snippet dependencies all the packages that require my library and check whether they require my library with the lower than character and if they do it will print them out and in my case python reach I will not show it it's a bit slow but actually there are two packages that require python free reach lower than 14 and that would be again my big candidate to reach out to the maintenance of the respective packages and tell them hey your packages will stop being buildable installable and usable when my update reaches Rohide there are some further resources this is a piece of a tribal knowledge that my team has assembled we are trying to write it down and prepare it for further generations so the impact checks are described in our full request review guide also copper has got really usable documentation about the master builds and this presentation with all the code snippets can be found on github this is it thank you very much for your attention so if I have any idea how we could speed up things when it comes to the browsing build logs it is yet to be done the actual build logs the okay so the URL is okay so the URL is divisible I only need my copper name the build number, the package name so I can get all of it from copper monitor I can automate creating this URL where the logs sit then I would need to download all of these logs and somehow assess how much of this output I need to take and maybe provide some automation of results, some aggregation of results whether a certain string is found so that I don't have to just go through 10,000 logs but in some nicer place so yeah it's something that can happen but it didn't yet maybe my colleagues have some ideas some AI engine so yeah maybe this is what will solve all of our problems the issue is that the outputs of the failed logs can be really weird and different sometimes you need to grab more context it's moot to go through the logs of packages that don't build for half a year because probably their failure is totally unrelated to my update it's definitely doable to some extent I don't know to which and none of us had actually much time and cycles to do it so maybe this is waiting for some really brave community folk who would just come and solve this problem for us I'm sorry I hear you really badly so I didn't even grab the core of the question maybe we can talk about it later when I can actually hear you thank you