 Next up, Mr. Christian Browner. Hey, so I just came back from Paris yesterday evening, and the first thing I saw on Twitter about this conference is that Leonard is going to steal our home directories. It was great news. No joking, you've been talking about this for the last couple of weeks and months already. Right, I'm Christian. I write code. I work for Canonical right now. I mostly do upstream kernel work. I also maintain Lexi and Lexi container runs at times you might know about. And we do a lot of work in the upstream kernel for new container features, security, crossing a bunch of subsystems, also maintaining a few bits and pieces. And this is work we've been doing over the last couple of kernel releases. I've spoken about this for the first time, I think, in depth at Linux Plumbers a couple of weeks ago. And this is a concept which we call PIDFDs. Who has heard of that, by the way? Some people read LWN. Okay, so in short, what is a PIDFD? The idea is it's a file descriptor referring to a process. I don't know where I'm going to go into details while other systems had that before and so on. It's not a super novel idea. So a file descriptor referring to a process and specifically right now it's a file descriptor that it refers to a threat group leader. If you have any questions about this or stuff you can just yell right away. So right now, but we don't exclude the possibility to do this in the future is to make it possible. It's not possible to make a file descriptor refer to a single thread. Reason for this being that at the time when we started doing this work there was no real reason to do it. Nobody really wanted it or yelled we wanted. And also it would have made the whole code a lot more complicated. So threat group leaders in the kernel are really horrible in general. So it's a stable private handle. So the PID file descriptor guarantees that you maintain a reference to the same process as long as you hold that FD open. And it works in a very specific way. I will go into this in a little more detail in a little bit. And PIDFDs use a pre-existing stable process handle that the kernel already knows about, which is struct PID. So it's already used in PROC to pin a process. So all the PROC PID directories stash away a reference to struct PID. So the first question if you know a little bit about the kernel you might ask why are we not using a task struct. Ideas, takers. So struct PID in the kernel is a way, it's the kernel's idea if you read the comment in the code. It's the kernel's way of maintaining a stable process handle but without having to pin task struct. Why is that a problem if you look at task struct in the kernel? It's like this chunk of code and there's like lists in there and pointers and probably arrays and all that kind of stuff. And so struct PID is a cheap way of getting around the problem of having to pin a lot of memory in the kernel for a long time. Because sometimes there are various code paths in the kernel that take references on struct PID to keep it alive because they need to look at some information or need to look who's the threat group leader or who's the session leader or and so on. So struct PID has a bunch of members. You already see something interesting in there which we added which is wait queue for PID of denotification. We will get to this in a bit. And the idea is from a struct PID you can get to all of the interesting task structs which reference a type of PID that you're interested in. So the kernel makes difference between a threat group leader, a process group leader and a session ID leader. And you can, if you reference struct PID, there's an array in there which is, you can get from a struct PID, right? With HList head, tasks, PID type max, you can get to all the PIDs that are used. Yes, you can get to all the task structs that are used by this struct PID for a specific process type. You can get at the process group leader. You can get a session ID leader or the threat group leader depending on what information you need. So that's a PID of D. PID of D stash as a reference to struct PID doesn't go away. And why do this in the first place? I mean this is usually the first question. I would have been happy to just write it for my own entertainment but we actually had a bunch of use cases. The first one that is pretty obvious and it comes up even though it's always heavily debated. Whether or not this is a real issue and there are other ways to fix it but it is PID recycling. So avoid PID falls of PID recycling on high pressure systems. PID allocation, PID number allocation I should be precise in the kernel works cyclically. That means the kernel keeps on ramping up the PID number until it hits the maximum number of PIDs on the system and then it wraps around and takes the next free PID number. So if you have a lot of processes exiting and so on and you can get in a situation where you can have a process whose PID has been recycled while you're still operating under the assumption that it's actually the prior process that you're operating on. This is useful, this is basically you can use this to have timing based attacks of which we actually had quite a few. So there is one against which Jan found, against androids get PIDcon. So you could trick it into operating on the wrong security context if I remember it correctly. There are actually two bugs related to this or two CVEs related to this, the first two. So you can have a look at that if you're interested. There are a bunch of PID based MAC exploits which I didn't know about. So Mac OS doesn't have, as far as I know doesn't have the concept of a process photoscriptor. There is a bunch of stuff in Qt that had problems with this. The last ones are actually hardware service manager arbitrary. The last ones are the get PID exploits. I think the first one might actually be for Polkit which you could attack with this as well. So this is really an issue. One thing you can prevent this to make it at least more, less likely to run into this issue is bump the maximum number of PIDs to 4 million. I think which system you started doing at some point. You could also probably get around this problem by using UUIDs and not file descriptors. There was a lot of discussions going on how exactly we should solve this problem. We went with PID of these and I'm going to explain why. I think which is a good reason why. So PID recycling, that was one of the issues we really wanted to get around. And why do this in the first place? Again, there were a bunch of other reasons. One thing that came up repeatedly was shared libraries. I want to allow, I want to spawn of invisible helper processes. What does this mean? So an exit notification on Linux works in the following way. Oh, by the way, if anyone knows more than me, please yell as well. So you get a sick child signal usually on process exit, right? So that's how you get informed. But for example, if you have a main loop running, like a large main loop running where you have a lot of callbacks, and one of your callbacks is there to reap helper processes. Now, this callback calls wait PID 1, wait ID 1, minus 1, sorry, which means wait for all my children or wait ID P all, wait PID minus 1, wait ID all. So it wants to wait for all children, specifically use case that probably in its systems want to have or want to support. But now you can end up in a situation where any other callback in your main loop could have spawned of a helper process and also relies on sick child and exit signals to be received. So now the wrong helper wakes up in your loop, gets a sick child signal and calls wait ID P all and is like, yes, I'm going to reap all of my children now. And then accidentally reaps someone else's child. Another process now gets confused as to where the hell is my child. So this is really not a nice situation. So it's fogging off invisible helper processes. With some work you can do it, but then you run into issues with, then you run into issues with threat safety and signal handlers. There is a long blog post, I think, from Tiago out there. Who is one of the cute project maintainers that want to make use of this feature? So PID of these, we'll see how hopefully, make it possible to spawn of invisible helper process pretty nicely. They also allow you to get notifications for process exit as a non-parent process in a clean way. And process management delegation in general, a handle to a non-parent process for a bunch of operations that you want to perform, which you cannot safely do right now. I mean, you can pass a PID, but apart from, if you're the parent, you usually know that this is your child and you can be sure that this is still your child. If you're a non-parent and you have more issues to figure this out, you need to pass through PROC, look at start times, all kinds of hacks. Kriu has had issues with this for a long time, for example. Yes, so hand of a handle to a non-parent process for waiting, signaling, whatever, which if you have an FD as a stable private handle on a process, that problem should go away. Another reason why we did go with FDs, the ubiquity of FDs, which sounds like a trivial argument, but it's actually not, I think. There are common patterns everywhere in code bases that make use of FDs. So most people have an ePoll loop to listen for events on file descriptors. Most people have parsing logic to parse out FD info from PROC self, FD, FD number, and then FD info or something, and have logic for sending around file descriptors via SCM writes and so on. So there is not a lot of adapting that you have to do. If you would have to build UUIDs on top of everything now, then it would have been annoying for the kernel to generate them and handing them out to user space. I know Leonard still wants them very much. You're not going to get them. And so FDs seem to be quite the obvious solution. And also, here's where we a little bit get into this part. There was prior art for this as well or similar art before. So ubiquity of these I think is a pretty good argument for doing it this way. And last but not least, does user space really care about this feature? So nowadays, when you try to bring something into the kernel, it's usually not that you just get to do it. You usually have to say like, this is a problem, and people really care about this. So we need a justification for why we wanted to do this, even though it seems obvious that it makes a problem go away that is a pretty big deal. Yes, user space really cares about this. Some of those have talked to me before, some of those have written mails after, and some of them I just figured out by pure chance, by people pointing out, hey, they're making use of this feature. So, and this list keeps growing. One is D-Bus. D-Bus has an issue open where they want to switch from doing pit-based authentication to a pit-of-D-based authentication because they have issues with pit recycling as well, or at least are afraid of running into issues with this. Qt wants to use it for sub-process management, so it's forking off invisible helper processes, which I mentioned before. Systemally, the only issue that I currently know about that is open is using it to reliably kill processes per C-group. But they probably have other use cases for this as well in the future. Crew, which uses it to detect pit reuse, so it has a function that is called detect pit reuse, which is a hack, correct me if I'm wrong, and we can switch out this function for pit-of-Ds using it to reliably track processes. Android, a low-memory killer demon is using pit-of-Ds as well. They were actually one of the first that got really excited about this. It all derived back from a debate. So, parts of that, I had an argument with a discussion with Case Cook and I think David Howells a while back at Linux Security Summit somewhere in Edinburgh or something where we talked about various things that we could do and how we should do it, which is where the pit-of-D stuff started. Then there was also in parallel a discussion that started on the mailing list where people started to hack around in proc to make it at least possible to send signals via files and so on. So, there was a lot happening at the same time and this is ultimately the approach that came to fruition. So, the Android guys had to give it to Joel, who works for Google as a kernel engineer who helped with this work and who also did a lot of polling work that we'll see in a bit. So, low memory killer demon is using pit-of-Ds to reliably track processes and kill them. This basically runs on Android 10 already. Oh no, not pit-of-Ds, sorry. They will be in Android 11. They back ported it to all of the LTS kernels, but they're definitely going to use it to get around issues where they have to make sure that the process that they are killing is really the process that they want to kill and so on. And BPF trace, I want to switch to pit-of-Ds as well. They have an issue open as well. Don't just trust me, click on the links to verify it. Maybe they're all wrong. And there's prior art. This goes back to a former slide, to an earlier slide. Why do this in this specific way? Well, there is precedence in other systems and I usually think it's not a good idea if you keep deviating too much between different operating system implementations because it makes it horrible for user space that at least try to be compatible across different operating systems to write working code. So I always was under the impression, I have to admit I did something which is not very smart from the perspective of a kernel guy. I didn't look at other kernels before actually starting this work. I did it later. And we were kind of lucky that we didn't get run in a lot of issues they originally run into, but that was just by pure chance and having a lot of smart people yell at our work. But it also means I was falsely under the impression, for example, that Solaris had pit-of-Ds, which is wrong. They actually have it not, don't have it. At least the Lumos, the open source implementation of Solaris, only has a pure user space emulation of stable process handles, proc-open, proc-run, proc-close, and proc-free, but they have the same problem, essentially. OpenBSD and NetBSD don't have it. FreeBSD is the only system other than Linux that has it. They have a concept called procdesk, proc file descriptor, or proc descriptor, sorry. And they have three system calls, PD fork, PD get-pid, and PD kill, and they have gone with different decisions or they have taken slightly different decisions than we have on Linux, parts of which are implementation-based and most of which are implementation-based. And if you have questions about this, I can go into detail, but probably not going to be enough time. PD fork gives you a back-of-pid file descriptor, essentially, a proc file descriptor. And PD get-pid allows you to translate it to a pit, and PD kill is used to send a signal through one of those file descriptors. And on Linux, there are multiple approaches to get this into the kernel. At once, fork FD, which is, I think, was originally done by Tiago as well from Qt, or at least was one of his suggestions. And then there was another patch set, which is called clone FD. None of those made it. And I think one of the reasons, the patches were fine. They were interesting. They had interesting new concepts. But for example, clone FD tried to do a lot of things at the same time. So they mixed auto-reping semantics with file descriptors for processes and so on. So a lot of contention going on on how to actually do this correctly. And I think, ultimately, it didn't go in because it tried to do as much, too many things at once. Maybe there was another reason. I didn't see it from the thread, but there was a lot of stuff in that patch set. So, okay. So we started building a new API around process management. And I want to start with this right away, which I also did at LPC. My intention has never been to say, we have PITFTs and we have PITS and they are totally separate worlds, and you either use the one, but then you can't use the other, which I think is the wrong way to think about this. PITFTs get around a very specific problem that you have in multiple, but rather specific situations. So you probably want the way to cross between PITS and PITFTs and use it both at the same time. So it's not like we're deprecating the PIT API and only going to PITFTs in the future. That's probably not what's going to happen. It may still, it may be the case, and this is what I expect, that a lot of new interesting features that people care about can be built upon PITFTs just by being a stable process handle, which you couldn't build upon PITS. So the first thing that we did in 5.1 was implement a new Syscall, which is called PITFT send signal, which allows you to send a signal through a PITFT. This was the really obvious piece because PIT recycling is usually concerned with sending signals, right? You're operating on something that is not yours. So it's pretty obvious to make the argument, look, this solves an issue. This lets us cleanly solve an issue. It's clearly something that user space has run into. There are a bunch of people who pointed this out that this is an issue for them. So we should do this. And it's actually in a lot of codes you can see here. There's a bit more to it down below, but the whole FD handling part is encapsulated in what you see here in the top. The first controversy that we had about this was what exactly is a PITFT going to be? And people had very strong opinions about this, which also derives from the fact that it's an obvious problem in a sense, that it solves something really obvious. And here's my opinion, and here's how we should do it. And I'm not going to back down. So we kept yelling each other for a long while, which is usually what happens. And the first idea was to use slash proc slash PIT to dear of these, as PIT of these, because they already pinstruck PIT. They're pretty easy to get by. And it's a nice shortcut. You call open on slash proc slash PIT the process you're interested in ignoring for now that this also has PIT recycling issues. Then you have an FD. It can't be stolen from you, and then you stuff it into a PITFT send signal and you send a signal to it. And if the process has exited behind your back and it's not around anymore and you send a signal, you get ES arch, which is Colonel's week four, no such process. So this brings me to another point. We don't pin PITs. We don't pin PIT numbers. It doesn't mean that when you hold a PIT of D that now your PIT is not going to be recycled. Your PIT is going to be recycled. We don't care about this. DFD is your stable handle. PIT can be recycled. We're not stopping the Colonel from doing PIT allocation or something. So, right, we use proc PIT as a shortcut of these, as sort of a shortcut that is really handy for user space. But then we started, or we already had thought about this, we were faced with implementing the part where you return a file descriptor from one of those nice four-core clone functions that we have on Linux. And here is where we ran into real interesting problems. So this is where, so Jan and I started discussing about this because he had good input on this. And some people have the opinion clone should just return file descriptors from slash proc slash PIT. Sounds straightforward. If you ignore all of the security issues, like for example, there is a net directory in proc PIT which allows you to snoop on the traffic of another process. It's also really horrible in terms of how file systems, especially the proc file system, works in the Colonel. So if you return, you want to return a proc PIT file descriptor and you have to pre-allocate a dentry. Well, you have to pre-allocate something that the Colonel uses internally to refer to a file. And then later on, splice it into proc. Believe me, it was really nasty code. So what we did was we showed our preferred implementation and we showed the implementation that some people preferred and wrote both implementations, which was a lot of work. The one implementation showed slash proc PIT file descriptors used as PIT of these and the other one showed our implementation. And that was like really, if you compare this like this much code and then you have this much code. And so people were like, yes, let's go with the implementation that you wanted, which was lucky for us. I think it's safe. There's a lot of headache and I would have been very unhappy. Actually, I considered if we go with proc PIT directories, if I'm going to abandon this, but we didn't. So we got lucky. So in 5.2, this is where we landed support for returning file descriptors from the clone function. We were always under the impression that all of the clone flags were gone. Ha! No, there was one left. Which no one knew. I mean, we only saw this because Linus pointed it out. No, there is one flag bit left. I always assume we're out, but okay. So we added a new flag called clone PIT of these, which creates PIT of these at process creation time to completely let you get rid of the race where your PIT can be recycled. They have a bunch of interesting properties. I'm going to talk about this a little bit. This is more or less, there are a bunch of more places that we had to touch or that I had to touch, but overall, this is the code that you see in the kernel internals clone function or fork function in this case. You can know the comment for now, but if the flag is said, you allocate a new file descriptor and you return it to user space. It stashes away a reference to a struct PIT, which is a kernel internals notion of a stable reference on a process file descriptor. And you see these are a non-inode file descriptors. If you use a timer FD, if you use an event FD, if you use a signal FD, if you use an FD for the new amount API, and there is probably a bunch more I'm forgetting. The second notify FD, they're all anonymous inode based, which is just a single inode in the kernel that gets allocated when the kernel boots up and then you can get a new file from this. So the inode number is the same for all of them. Basically means you don't need to allocate an inode and that's why it's very cheap. It doesn't waste memory, it doesn't waste allocation time and so on. So this is ideal more or less. You stash away a bunch of operations you want to allow in that file descriptor. So this is the PIT of the FOP stuff. PIT is the struct PIT that I talked about. So it's pretty simple code overall. And one of the things that we also did is we made PIT of D close on exec by default. So if you get a PIT of D back, you really want to be sure that it doesn't leak into the child process when it execs, for example. And yeah, so this is what we did. And I think if you think about any new file descriptor type that you bring into the kernel, please make it close on exec by default. I tried to convince people to do this for the amount API. It didn't fly well. But it really helps user space. It's one of the major pain points, actually, that you get file descriptors that stay open after you exec. And close on exec is really easy to set. Like, this is all it takes. Or read, write, occlo-exec. And then you get an occlo-exec file descriptor back. Really makes life easier in user space. And another property is that we added an FDINFO file. So the FDINFO file will contain the PIT of the process as seen from the PIT namespace with which your PROC mount was mounted. So any PROC mount, especially in containers, is attached to the container's PIT namespace, if you remount it, at least. And in a new PIT namespace, the PIT that the process has will be different from the one that it has in one of the ancestor PIT namespaces. And so we write it in there. So if you parse it, you will get the PIT of that process in your PROC instance, which is, for example, helpful. If you send it around to PIT of D, and it was created in another PIT namespace, then this is how you get the PIT. But we also made it such, and this was Alec's idea, Alec Nesterov's idea. Originally, we had it implemented in a way that when you set clone PIT of D, you got a file descriptor instead of a PIT, which is problematic in multiple ways, because file descriptors start at zero, and zero is obviously used to differentiate between a child and the parent. So you cannot really return zero as a valid file descriptor. So PIT of Ds would have started at one, which is not nice. So for legacy clone, we made it such that we abuse one of those return arguments it has, where usually the TID of the parent process, the TID of the child process is placed to return a PIT of D. So if you set clone PIT of D, you get the PIT back, and you get the PIT of D back at the same time, so you have no disconnect, you know both at the same time, which I think is pretty nice. It's even a little bit nicer, I think, than FreeBSD's PD fork. And then in 5.3, I think this is really exciting. We added Joel added polling support. This is something that they really wanted for the low memory killer demon, so that you can get exit notification from the parent, and in a more complex sense, it allows you to turn off the exit signal, which means that when a process exits, you can tell the colonel already today that I don't want a sick child. I don't want a sick child signal. I'm not going to explicitly ignore it, because then you would artery, but I just don't want a signal when the process exits, which was a bit problematic, because then how you know when the child is going to do this, because what it essentially does is, as soon as the process exits, so it's a threat group, but leader exits, and the threat group is empty, then you get an exit notification saying I'm ready, I have exited. So if you hand up one of those PDFTs to a nonparent process, you get reliable exit notification, and you don't need to rely on sick child signals and so on, which is pretty nifty. It's actually in two different files, it doesn't matter, you can grab for it if you're really interested in it. It's the polling implementation, so yeah, the one caveat that we have, poll is only when the whole threat group exits, if the threat group leader exits before a lot of threats in the group, then polls should block similar to the weight family. That's actually a problem you can run into and that's why threat group leaders are not really nice. But yeah, polling support is pretty exciting for process management. In 5.3, we added another syscall, right, another syscall, PDFTs without clone PDFT, PDFT open. The idea being that when you have forked the process, you sometimes still want to create a PDFT, especially if you want to watch a bunch of other processes, and you can't rely on them using clone PDFT. PDFT open will allow you to do just that. It gets you a new PDFT. So you give it a threat group leader so you can't get a PDFT for a threat. Yeah, that's 5.3 and 5.4. Excellent. That was proposed. That's actually no longer true. Linus pulled it from me. So we have that in. You can now wait on processes through PDFTs. So you can pass, the wait ID has gained a new flag, PDFT. You pass it a PDFT out of a PID and then it retrieves it and then it waits on the PDFT which I think is pretty neat. And we have a bunch of work plans. There are some work that I've there's a lot of work that or ideas that I have. I'm just going to speak about two of them. This at least came up as one of the original ideas or the first ideas that we had was to make it possible similar to what FreeBSD has. So on FreeBSD, if you have a PID file descriptor, if you have a process file descriptor it is a key long close by default which means if you close the last file descriptor which has a reference to the corresponding struct file and the kernel then it will kill the process that's it killed the process which is pretty neat. I wanted to do it the other way around so the per default that's not the default so if you close the last PIDFD then things are still fine but we could add a flag that is set at process creation time. You can take it away afterwards so no PR Cuddles or that nonsense and then you kill the process when the file descriptor is closed. There is a problem though that has been bothering me and I've been thinking about this so on FreeBSD closing file descriptor this is synchronous so that means when close returns and that FD was the last reference to the struct file inside of the kernel then by the time close returns you can be guaranteed that all of the cleanup operations have finished. Linux is smarter in some ways or let's say complicated maybe in the sense that close can return without the corresponding release or cleanup method that belongs to the struct file that the last FD has been run. It adds it to a work queue or a K thread and then at some point when the kernel thinks it's fine I have some time to run this right now then it cleans up everything so that means if you close the last FD close returns you are not guaranteed that the process is dead which is not ideal but usually usually the kernel cleans it up really quickly or calls the release method for the corresponding file fairly quickly. It should only delay it when there is a lot of memory pressure for example in which case you screwed anyway and another thing which is for the shared library case is exclusive waiting so right now anyone can still wait on the process that you forked off either via PID or via PIDFD there is no differentiation there right now and I think that just makes a difference then you have this connection between the PIDFD and the PIDAPI but there should be a way to say I'm now going to separate this connection and exclusive waiting would allow you to do this so you have something like clone wait PID flag name open for discussion which hides the process from generic wait requests similar to what 3BC has as well but I actually would like to make it stronger which derives back from a discussion that you would only be able to wait on a process through a PIDFD as long as there is a PIDFD referring to it and when the last PIDFD is gone the process gets auto-reaped. Oh, do you know auto-reaping semantics? Thank you. So auto-reaping semantics really you explicitly you set sick child you ignore sick child explicitly you say I don't want sick child but I explicitly tell you that I don't want sick child because it says okay then I'm going to clean up that process for you if it exits then it exits and it's gone which is really neat. It's different for example on Phoebe SD which is why they have chosen to implement proc file descriptors a little bit differently if you explicitly ignore sick child on Phoebe SD then the process doesn't get auto-cleaned up by the kernel it gets re-parented to PID1 and PID1 then gets a signal for that process which is basically saying like I'm done you take care of it whereas Linux really cleans it up so it would be really nice if you have clone wait PID as long as you have a PIDFD you can wait on the process explicitly if you close the last PIDFD telling the kernel I don't care about this process anymore if this process exits just clean it up. The problem is the implementation usually it's pretty tricky to get right I think but I might put a patch set up there soonish and there's a bunch of other ideas but I could keep talking but I probably shouldn't so this is what we built over the last couple of kernel releases we're at 5.4 we've obviously also been fixing a lot of bugs along the way so this wasn't a whole bug-free process but overall I think it was pretty good it was also a pretty good collaboration effort a lot of people took part in the discussions brought in really good ideas gave reviews and helped with this so it could probably give a shout out to a lot of people here but yeah, that's about it so if you have questions go at it, yes So what about integration with CM credentials and other places where you send a PID to someone else what do you mean? Could we have a flag where a CM credentials contains a PID FD instead of a PID Couldn't you just you can just send it as a regular FD why you want a special flag because well it gets sent implicitly you set some flags on the socket and then the kernel does the job for you and this it is possible that by the time the consumer looks at this data the PID could be not valid anymore so you're saying the PID is sent implicitly but instead of a PID you now want a PID from the top of my head I don't necessarily see a problem with this I guess the problem is like a process sends a log message to Giorno and dies so at that time Giorno wants to look at the process to figure out which C-group is running which services I remember this right trying to think if yeah we should probably talk about this shouldn't be something that is I wouldn't put this off the table it sounds useful if there is a really good use case for this and it doesn't really complicate in kernel code too much and I have reason to like a good cost to justify this should be fine I have a similar question which is we now have PID FDs and namespace FDs but for the namespace FD we have to go through the file system to get them is there a way to derive namespace FD from a PID FD maybe I have plans for that so it's official request can you do that one of the things that has always bothered me is if you do a set an S into a namespace you have to do it iteratively right? so it actually iteratively in two stages you have to call open like seven times nowadays and then you have to call set an S seven times and often in the correct order and that's obviously a problem well I see it as a problem maybe some people don't see it as a problem I think this is the wrong approach ideally we could change I had one set the idea played with the idea I may have mentioned set an S would take a PID FD and interpret the type argument that it has right now as a flag argument so you could specify the namespaces that you want to attach to and then it derives it from the PID FD and gets you into all of those namespaces which would make attaching for containers and so on way nicer yes that's definitely something which has been on my mind now it becomes a battle of the maintainers that I would like to call it because then we need to agree does Eric think this is a good idea how exactly does the API should look like and so on but overall yes that's something I definitely have thought about I also have thought about just recently forking into namespaces actually I shouldn't claim this completely for me David Howell suggested this once in a discussion so ideally at clone time you say not just create me a set of new namespaces but create me a process in this set of namespaces but there needs to be a strong justification for why we would need this like is there some security issue or something when you create a process and then you do all of the set of stuff on it and so on yeah that's definitely something we can think about Hello, question on Kill on close feature if the process exit set you ID program and become more privileged and then the parent can kill it can the parent then kill it yes in the naive implementation that I prototyped yes because it's an internal signal so there is no security disconnect so the question is the answer is yes so the set you ID program that you spawned even if it runs with more privileges for now even haven't thought about this but I haven't worked on this in a lot more detail if you have specific concerns about this we should probably we should probably talk so one of the things that I really want and this is important I think is we have a tendency on Linux and this may be a good thing in some situations the good things that we for example we create especially with process we create a process with a specific property and then later on we add a PR cuddle and then we do a PR cuddle and the PR cuddle takes that property away from the process which is horribly annoying if I as a parent say I am creating a new process with these properties set then this property needs to stick to this process it can't be taken away anymore after the fact which is the thing what I want to do with the clone wait pit for example close on kill flags it's a property that sticks to the process as long as it's alive and if it's gone it's gone I don't want to end up in a scenario where suddenly you can change all bits and pieces and flags again on processes so I want sticky properties essentially I would like if it's useful at least yeah so Christian Browner thank you very much okay so if you create a child process then you send a pdfd to that process to some other process does the parent then get a sick child notification if the other process wants an exit notification that you just added in 5.3 so do both processes get notified when the process exits one over pdfd and the parent or normal sick child if you have said sick child yes if you want that but you can explicitly turn it off yeah I want it in a parent yes anyone yeah so these are just a few remarks while doing this work forgot that slide okay thank you very much