 necessary copies. Now why am I talking about BSD? Well, Linux has the same Helicompatible VM, and you can use that for packet filtering as in BSD. You can also use it together with the secure computing set comp for sandboxing programs and restricting what system calls they can make. So instead of looking at the headers of an incoming network packet, it'll look at the system call number and parameters and say yes or no or kill this program now. It's bad. And you can also use BPF filter code in your firewall with the XTPPF module and for network scheduling. But as time has gone on, it's still echoing a bit. As time has gone on, network packet rates have increased a lot, and this use for system calls means you potentially have a much higher rate. The filter code is being run many more times per second. And so the performance of the interpreter is kind of a problem. So starting with Linux 3.0, the kernel has gained just-in-time compiler so that the filtering code is turned into native code inside the kernel, and that's been implemented for several different architectures listed there. However, this feature is disabled by default. There are still some concerns about whether the generated native code is actually correct in all cases. And there are some security concerns about do you really want programmers to be able to generate native code in the kernel? Even if it's based on a very restricted language, which is supposedly safe, there are some risks. And the virtual machine used for BPF, it's a 32-bit machine. It only has two registers, which is kind of weird. It has a bit of a stack, but it doesn't really match how modern CPUs work. So there's a limit to what you can do in terms of efficiency. So extended BPF is this new feature. It supports sensible conditional branches. In BPF, if you have a conditional branch, it says, if this condition is true, branched here. If it's false, branched there. And in real programs, normally, you only want to branch. In one case, and that makes it the CPU should be able to predict branches, and it really can't if you have this weird double branching instruction. So EBPF has single-conditioned branches, so it's simple branches, has 64-bit registers, and more of them has more memory. It can do byte order conversion, arithmetic write shift. It can do atomic ads. And it can, as well as a standard BPF program, just returns a single number. And in practice, usually it's a 0 or 1. Yes or no. Extended BPF supports associated for raise, and those can be mapped both into the filter program and read only into user land. And so the filter code can then update a histogram or other statistics, or it can even provide a kind of a log and a ring buffer. So as yet, this is not usable for Setcomp, unfortunately. It is usable for packet filtering. It's useful for network scheduling. And you can use it on TracePoints. If you insert TracePoints into the kernel, you can have this filter code to control what's actually done when that TracePoint is hit. And this, so because it's a 64-bit machine, unfortunately, you're unlike to see JIT compilation for any 32-bit architectures, but who cares about them anyway. You have JIT compilation for ARM64 and AMD64 now. It's still turned off while it fold, but turn it on if you think it's safe enough. And in fact, the BPF interpreter has been removed. All BPF programs get converted into extended BPF and interpreted like that, and that turns out to be faster, simply because this is better suited to the way modern CPUs work. Both BPF and EBPF are their machine codes, but there is work in progress to allow you to compile to EBF from a restricted set of C using the client compiler. So next feature, an entirely different note, is the overlay file system. It's a union file system that lets you take a read-only base file system, so that could be perhaps a network NFS server, or, well, in this case, non-NFS server, but you could have a read-only medium. For example, it's a live CD or similar or a shared base file system for containers. And then on top of that, you add a second file system that contains all the changes that have been made, and the overlay FS presents a unified view of those two or potentially more layers. Debian has already had a union file system for quite a few years in order to support Debian live projects and also derivatives such as tails. And that's AUFS, but AUFS is big and complicated and ridden as lightly odd style. So there's basically no chance that that's going to go into Linux mainline. And, however, overlay FS is a bit simpler, and it has gone in. And as of Linux 4.1, we're not... No, sorry, it was a bit earlier on that. But currently, we're not including AUFS in the kernel packages anymore. So limitations of overlay FS, it doesn't seem to work or it doesn't work correctly on top of NFS file systems. You can only use it to join local file systems at the moment. You can't export an overlay FS over NFS. Deleting, whenever you delete a file that was present on the read only file system, the upper layer that contains changes needs to contain a sort of dummy file to say ignore the file on the read only layer. That's called a whiteout. They're not implemented very efficiently on overlay FS. Whenever you modify a file that comes from the read only lower layer, that has to be copied up to the upper layer. And overlay FS is a bit silly about copying it up. Sparse files. And AUFS has an intriguing feature where you can have more than one writable layer. Not sure quite how that works. Overlay FS doesn't do it yet. And link trees. Unfortunately, if you create, even creating a hard link to a file that's on the read only layer requires that file to be copied up currently in overlay FS, which is not necessary in it, not necessary in AUFS. Hopefully, some of these limitations are going to get fixed in overlay FS eventually. So I'm moving on to networking features. There's a new way of managing switches. As you probably know, Linux is used on a whole bunch of network appliances, wireless access points, switches, routers, and yeah. So you have a bunch of things where you have a hardware switch integrated into the system. And generally, those are configured using an API that's specific to that vendor's driver. You also see on PCI Express network cards, a lot of them have features to sort of virtualization. And they can present multiple virtual ports towards the host of the PCI bus. And then so there's a switch on that card. Currently, there are a couple of networking APIs that you use to configure the address. You can use to configure switching of packets between those local ports and the external ports. And finally, Linux has always had a software bridge driver for a long time. And that would you configure using the Netlink API bridge and the bridge or the archital command. So these are all switches, but they all have different ways of configuring them, which is not really ideal. So the switch dev concept has been introduced. And it allows every driver for a switch. It should implement the same set of operations, some of which are optional, and the kernel will then present the same interface to USELAND so they can be configured in the same way, whether they're software, hardware, and whether the switch hardware is smart or not so smart. So currently, it's not supported by all the drivers that you would want it to be. It's the Intel 10 and 40 GUG Ethernet card support. It's QLogic support set. There's this thing called ROKR, which is a switch that's emulated in QMU, which is not terribly useful, except as a testbed for this new API. And there's the Mac VLAN, which is the alternate software bridge. Every port on a switch now gets its own net device, and you can use EtherTool to configure the things like speed and auto negotiation on those network ports. And then use the bridge command. We'll now work with the hardware bridges as well as software. The hardware can do learning like it figures out which MAC addresses are attached to which port, or it can let the software deal with that. And it's even possible to offload layer 3, so IP level folding decisions to the hardware. Wrong way. So something that's ongoing in graphics, so not completed yet. Quite a way from being completed. Something called atomic mode setting. So you probably know about kernel mode setting. This is making the kernel responsible for managing the setting up your video display generators and managing memory for the graphics hardware. And that removes the need for previously the X server or the X video drivers inside the server. We're doing all of this from user land, which was really not great. So there's a kernel subsystem called DRM, which is Direct Rendering Manager. Originally, it was all about the 3D GPUs, but it now takes care of this mode setting and memory management as well. So it models as your display generator as having several pipelines. The pipelines take input from frame buffers or also called planes. So you have one frame buffer for the pretty much everything you see on screen, in fact. You have another plane which has the mouse cursor that's shown on top of that. And you might have other planes for showing video on top of that, because video special and video plane can provide scaling and color space conversion in hardware. Now, if you're using a graphics chip that has a 3D GPU, actually, you probably don't have a video planes anymore because the GPU can handle all of that. When it composes together Windows and it can compose in video and do the scaling in the same way as it can do any kind of 3D rendering. And the output of a pipeline can be rooted to one or more screens. So at the moment, for example, my laptop has a single pipeline, I think, which sends the same signal out to the internal display and out of the VGA port. There are differences in the encoding there, but it's effectively the same signal internally. So when I plug and unplug the projector, this pipeline needs to be reconfigured to talk to two ports rather than one port. And KMS supports doing that. That's fine. You can change the refresh rate as necessary, and you can add and remove planes. For example, when you open and close your video player. But these changes, reconfiguring, can result in flickering of the screen or tearing where you have one half of the screen generated with just on one update of the free screen, you could have one half generated with the old configuration, one half with the new. And there's this odd join in the middle. It only appears for a moment, but it's slightly annoying. And also, because the changes are broken up into multiple steps, you could get a configuration halfway through that is invalid. For example, it requires too much memory bandwidth. So even though the X server is trying to reconfigure to a state that's completely valid, this intermediate step might not be, and then that fails completely. Now, I don't plug and unplug the projector very often. So why should I really care about whether there's flickering? Well, it turns out that using GPU to do all your composition uses more power than combining multiple frame buffers at the display generator. So this is Android, for example, is making much more use of a composition out of the display generator instead of the GPU. Now, Android has done its own thing. It's not using KMS. It doesn't support multiple pipelines at all. But you're going to see, hopefully, you're going to see mobile devices. And hopefully, you want your, sorry, you're going to see, hopefully, mobile devices using a normal Linux stack, graphic stack. And also, you probably want your laptop to run for longer and using less power. So it would be really nice to make more use of multiple layers in X or Wayland. So yeah, the Windows system was going to need to reconfigure planes more often. And that means we really don't want them to flicker. So the atomic mode setting API makes the reconfiguration transactional. Either all the changes are applied or the driver detects that this isn't going to work and rejects the request from user land. And all those changes, if they're applied, that can be carried out during a vertical plant period, i.e. between not while a screen is actually being updated. So you have one frame with the old configuration and the next frame with the new configuration. So the graphics drivers in kernel need to be updated. So let's put this, not many of them have been i9-15 for Intel hardware. Here's a thing supposed to be completely updated in Linux 4.2, MSM and Tegra, which we used on some ARM-based SSCs. Those have been updated. And user land will also need to use this new interface. So that's going to take a while. Hopefully, hopefully we'll be ready and we'll have lower power graphics in stretch. So live patching, I've had a few questions about live patching and can we use it in Debian? Well, maybe. So if you don't know about live patching, currently, if you upgrade the kernel to, you can't simply switch over to run the new code. You have to do a full reboot or possibly use KXAC. If your applications are all containerized and you've embraced the cloud, then no problem, you can reroute VMs anytime. It's not going to be disruptive. But if you have a more conventional service, then, yeah, you have to schedule reboots. And you don't have to do them too often. But if there's an important security update, you really want to apply that. So live patching offers a way to fix some bugs by editing the kernel code without a reboot. That was implemented quite a while ago by a company called K Splice with the product of the same name. They've since been bought by Oracle. Their patches are free software. But as far as I know, the tools they use to develop them are not. They don't develop in the open. And currently, they release patches for Oracle's Linux and Fedora and Ubuntu, not Debian. Red Hat and Susie each saw this as an important feature for their enterprise distributions, rel and slayers. And so they implemented live patching again. And because they actually work within its community, they tried to get these, they tried to get their implementations upstream at around the same time. And then they had to go through some discussion and compromise on some sort of an implementation that would suit them both. That's now happened in X-Roll.0. So yeah, it would be really nice to use this feature for Debian's kernel security upgrades. But naturally, it'll require more work. Not everyone's going to use live patching. So we're going to need to ship both the full kernel update and the live patch. And building the live patch is extra work. So there's five minutes till questions, right? So if anyone wants to help the kernel team work on this, or if anyone wants to pay a member of the kernel team to work on this, we'd be interested to hear from you. So in storage, we have some exciting new hardware, non-volatile DIMMs. Because flash storage or flash storage arrays get faster and faster, mostly by putting more flash chips in parallel. And there are several interesting non-volatile memory technologies that are in development, price anytime soon, and supposedly they're going to be faster and more durable than flash. So up until now, non-volatile memory has usually been treated as a disk. It's attached to a SATA interface. Or there's a new-ish interface called NVME, which is based on PCI Express. But maybe that's still not fast enough. So NVDIMMs let you attach non-volatile memory to the memory bus. Same bus as the dynamic RAM is attached. You still have the problem that flash can't be rewritten that many times, not as many as RAM, before it breaks. It stops holding data. So there are two different ways of dealing with NVDIMMs. One of them is you map this whole block of memory, block of flash memory, into the ordinary physical memory space. And then you can directly map that using the MMU into the processors that are accessing data on that NVDIM. So that's called the Persistent Memory or PMEM mode. And then there's the BLK or block mode, where only the kernel accesses the memory directly and it uses several. It only accesses a small portion at a time. You can also compromise and spit on it your NVDIMs into regions that are accessing either mode. Hang on, I was going to have got this out of all. There we go. Right. So the second, so with Persistent Memory, you have direct access from processors. But if the hardware fails, those processors are going to crash, basically. There's no way to recover from that. And you also need the file system to support direct access, but X4 and XFS support it. So that's probably good enough. Block mode is not as efficient, but you can put a RAID layer over multiple NVDIMs so it's more reliable. So I'm going backwards. Yeah. We've got an encryption layer in X4. Now, we already have an encrypting file system called EcryptFS that's been there for ages. So you may wonder, why would we need another implementation? There are good reasons for this. In fact, encryption in X4 is being implemented by one of the developers of EcryptFS. So he knows what trade-ups are. Currently, when you read through EcryptFS, all the data you read is cached twice in memory, encrypted, and decryptors. So there's some wasted memory. And EcryptFS can't assume very much about the underlying file system, whether it supports extended attributes and so on. And that means it has to deal with, sometimes it can store metadata in extended attributes, sometimes it has to use another approach. So it's more complicated and less efficient. Putting this in the file system layer also means there's no need for users to be able to mount before they can use encryption. The implementation in EXT4 allows it to be turned on per directory. And you have a different key for each directory. F2FS is, I think, the same interface. Part of the reason for that is Android wants this, and some Android devices use EXT4. Some of them use F2FS. OK, I've got some more things to talk about, but I'll open up to questions. And then if there's time, after the questions, I can go through those. We have 10 minutes more. Yeah. So for talking, OK. All right. I thought it was 45 minutes, then I have 15 minutes. OK, then I have 10 minutes for questions. And if I run out of questions, I can talk some more. Any questions? Any questions? The multi-Q functionality which was added recently in the Linux kernel, is it enabled by default for Debian or when do you plan on it? So which functionality? The block multi-Q functionality that was added for the block and the SCSI? You have to, there's a runtime control for it in the SCSI driver. Yes. I think it's a module parameter. It's not turned on by the full upstream. It's not turned on by default in Debian. OK. You have to opt in. All right. Questions? To what extent can EXT4 encryption replace Linux block-device encryption? For some users, yes, it can replace that. But it deals with a different threat model. There's an article on LWN that goes into quite a bit of detail about what this is for and what it can and can't do. And I think there's a link in the comments there to the public design document from Google about what they needed from this. Like, for instance, if I encrypt the whole file system, including from root to downwards, can then the user encrypt their own files on top of that? Yes, yes, of course they could, yes. There's nothing that you can do encryption. You can have as many layers of encryption as you want. OK. More questions? There are lots and lots of drivers upstream. The new process for upstream is they put lots and lots of drivers in the staging tree. And then eventually based on how the individual device developers maintain them and how the quality of the drivers are, they eventually progress them to the mainline tree in the same kernel. What does Debian have a policy about the drivers which are in the staging area? The current policy is that they may be enabled on request. But drivers in other areas tend to be enabled as a matter, of course, if they're for something generic like a USB drive device that you could plug into almost any system. If it's outside the staging area, then we'll probably enable that driver without thinking. But if it's in the staging area, there's some risk associated with it. Probably not any of them could eat your hardware. But the staging is considered a bit more risky. Also, some of that code is not very portable. So generally, we'd enable it only for, if it was requested, we'd enable it for x86 only. In the title, you mentioned missing features. What did you mean? Missing features in what? You mean right at the beginning? I guess it was features missing from Debian. Yeah, so the bottom, this one. It was in the May title, perhaps. Oh, yes, and what missing? Yes, so this is about integration and packaging things. So yeah, the features I've talked about so far, it's mostly going to be about applications taking advantage of these things. And then, over AFS, that's going to need, so Debian Live needs to be changed, or possibly has been changed, actually, I'm not sure. But any live distribution is going to have to, they're going to have to repackage a UFS or they're going to have to switch from a UFS to over AFS. So life patching, as I said, that needs a lot of work to actually use it. So it's usable, but... The features, I'm not sure whether it's actually turned on in the Debian kernels at the moment. But it's not something you can use today in Debian. You've had the features there in Linux. In the sense that the kernel team doesn't support doing kernel upgrades or stable updates by live patching. I'm sorry, could you speak a little louder? Or bring the mouse for your mouth? In the sense that the kernel team does not support the upgrading stable kernels while live patching. Yeah, that's right. But I could still use it for my own purposes. Yes, you would need to have the expertise and the time to construct these patches. And if you do, please join the kernel team to do that. More questions? Something that is not yet in the kernel, that is GRSEC. Status, progress, somebody's working on that. Do we have any chance to see it sometimes in Debian? If you look very closely, you could see that I actually extracted one little feature from GRSEC and put it into the end 4.1. But are we gonna include it wholesale? No, it's a very big patch. And you need a fair number of changes to user land as well as performance cost. It's not going to... We can't assume that the GRSEC upstream would support it. In fact, it specifically said no, I don't want to deal with combining GR security with other distribution patches. So, sorry. There's nothing to stop you taking a supported GR security branch and that on top of the upstream kernel and running that together with Debian user land. With the overtly FS and the, as you said, the EUFS integration will be gone. I mean the packaging from Debian. Is the new one the replacement actually good enough for practical use or the things you were saying or showing showstoppers? What are your feelings about this? So far as I know they're not showstoppers. Last question. No questions. Then thank you very much for your talk and see you again. Thank you.