 All right, hello, I'm Andrei and I guess we'll be switching gears a little bit to more Everyday topic I guess So I'm gonna talk about trusted and privileged BPF, right and trusted is the important part here because I Don't think we I guess as a community we made a decision that unprivileged BPF like unqualified Unprivileged BPF is not a future for BPF because we just cannot really like trust enough That we can prevent any possible misuse and all that stuff, right so The only option I think is like to Allow unprivileged, but only once you verify that the use case the application that you run is actually Through some form right like through code reviewing production infrastructure is trusted and like it's sort of like the one that we know about Right, but getting kind of like a half step back How the BPF is typically used today, right? Like typically you need to have either root or root like capabilities Specifically you have cap BPF combined with either Perfmon or net admin or both Depending on like which kind of programs you are running like tracing usually needs Perfmon And it's a network related net admin if you have network related that is doing tracing that you need all of them Basically, and then like even in addition to all of that Sometimes you need capsis admin Which is basically root to do operations like Translating ID to file descriptor to get some file inform object information stuff like this and I mean it works today, right? But it's pretty coarse grained You don't have that much granularity in like who can do what like you either grant those permissions Or not and like if you grant them like you can basically do way more than just using BPF Which is obviously a downside right and as I mentioned before that vanilla Unprivileged BPF and I'm making up my terminology here But like just unqualified unprivileged BPF the process was out any of those capabilities Just too dangerous and impractical to enable using like lots of modern BPF features, right? Haven't said that right in production It's still very desirable to not give root permissions to your applications just to be able to use useful well-known sort of reviewed and trusted BPF functionality, so how can we change that and underlying problem is that We do check the real Capabilities, right that means that we check them that you have them in in it name space I guess I'll jump through half of this slide. Why do we do that because? BPF just by sort of like a definition is not something that you can send box to a C group or a single process or anything like this Due to us having BPF probe read kernel BPF probe read user and lots of other different things You cannot really contain BPF program from like looking only at kernel object that belong to given C group container process or like user space memory just from like process one two three, right? so inherently like the cab BPF and like by Definition also like the combination of BPF and perform one and that mean I just like not compatible with username spaces So just wanted to answer the question that I often get Not I just like in general. This is bring brought out like very frequently. Why can't we just name space can be well We cannot because we cannot guarantee the sandboxing, right? So instead of like pretending and actually You know letting programs like snoop on some processes. They are not intended to snoop on we just say like no, sorry in general This is not possible So we did a few we had few attempts to allow Sort of unprivileged BPF use cases like one of them was few years ago by song It was called code named dev BPF And I'll just give you like a brief overview. What was the kind of the big idea? The big idea was that we had a device Dev BPF which would sort of represent Ability to do something with BPF system, right? If your process was allowed to open this file and get like its file descriptor You could use some special eye octal To to basically set a persistent bit on current task on current thread Which would say that like all subsequent BPF's skulls and until like you disabled this obviously I Allow to do BPF, right? So we would say I said like PPF permitted Boolean flag on current task and then if you do subsequent BPF's skulls no matter what the command I think The internal BPF code would check if this bit is set and if yes, then like we would basically just allow to do like perform BPF operation regardless of like the capabilities root permissions and stuff like this this was rejected by upstream and From my reading of the very long thread from back then I think the biggest problem was not so much the like the dev BPF Representing ability to do something with BPF system and more like this persistent bit on the task It was also brought up that like this is kind of fundamentally incompatible was run times like go where I Like user visible thread is not the same as kernel thread and like you you have like as a user You have no control of how is that migrated so that would cause like even more issues So what happened after that like eventually we ended up with the current schema where we have cub like we split out like CIS admin into Cub BPF per month and that that mean and that's what we have today But that that's not enough because of as I mentioned before in like fundamental incompatibility was username spices in general like Not enough flexibility and granularity so recently we did another attempt to rectify that and that attempt was using a thoughtative LSM approach and Sort of like high-level idea was we added a few new LSM hooks in BPF's call and the semantics of those hooks were that they could post reject as like all the other LSM hooks do but also it could grant permissions to to use BPF functionality like BPF proglow BPF map create Stuff like this Well turns out that this kind of brings back like 20 year old debate about restrictive LSM and authoritative LSM and LSM maintenance were like very clear about like no We are not adding authoritative LSM. So this this whole approach had to be scrapped Unfortunately But maybe the third time will be a charm I Went back to the original dev BPF discussion and try to take good parts of it and drop the bad parts and like try to like Slightly change it to allow basically the same idea that like some file You can create like open some file which would basically grant you permissions to do some functionality some BPF functionality So the FD idea I think was good and like talking was a bunch of security related folks That seems to make sense to them as well The I octal and the persistent bit on the task track was definitely bad And that was one of the reasons why it was rejected. So like no go for that I've also been told that using the character device files is kind of error prone and like not great And we can talk about that later why I guess but generally speaking like anything that it looks like real file And can be like copied and stuff like this is it's just error prone and like can be misused unintentionally in production So ideally we avoid real files and just like stay to like anonymous. I know the virtual files But besides of that like We can still use LSM on top as long as it's restrictive, right? So LSM can provide this extra layer of like super dynamic and very fine-grained Restrictive layers like even if like whatever mechanism you provide to like Love some functionality in BPF you can still like build LSM on top to Kind of restrict it to only like specific conditions and specific use cases So I call this this whole approach BPF token because it seems like that's the terminology that's used by like Framework open Authentication like to kind of generate permissions grant permissions and then like transfer them and prove them and all that stuff So basically talking seems to make sense as a terminology But if someone has better idea, I'm open to it But basically the big idea is we add a new BPF syscall command BPF token create that thing if you have Right capabilities right permissions will create the anonymous file basically right it will give you file descriptor and For starters, I think like we should require capsis admin because The idea of this FD is like it basically grants you like the same permissions that capsis admin if you had them But once you have this token FD other existing commands like program load Map create like all the operations to get file descriptor by ID would accept this token FD as optional attribute and If that FD actually represents BPF token and like that token allows you to perform that operation then you as an unprivileged process would be allowed to do BPF functionality that normally is reserved to root because Kind of the the big idea is that someone is someone with permissions granted you like ability to use BPF subsystem So they trust you and like you you kind of have this token FD as a as a proof of that So How how kind of like the flow works right like you have some real root Process or at least the process was like copy PF cup of morn in it in a namespace that thing creates the token and then it transfer it to To unprivileged process that like it verified that it's trusted validated signed whatever right One way to do to do the transfer is like the existing UNIX domain socket mechanism with SCM rights But that's I mean it's doable There's probably not that convenient in practice not every not every like framework supports and all the stuff But we can use the existing idea of BPF FS pinning right the same way as we pin BPF map program and BTF link we can start pinning this token file This this allows you to just expose it as a like a normal file within the file system within BPF file system you can change owner change access mode and Basically control it using the normal file permissions The sister fast itself you can control using the mount namespaces mounting on mounting like as necessary within the container Stuff like this, but eventually The idea is that unprivileged process should be able to open this file using BPF object get mechanism that we do for any other BPF object and Once that succeeds then they get like the token if you did they can use later for all the BPF Operations and as I said before LSM still plays a role here if you want it Because BPF LSM can kind of dictate like who can open token Even if that token is like accessible through file permissions, you can still reject them if you Detect that sounds some process is like an untrustworthy From from kind of like the implementation BPF specific approach I Think it's it's pretty good because it reuses the BPF scol. We use this extensible mechanism with BPF after union So while initially I would like to just do it as a like all or nothing thing Like if you can create token it gives you basically all the like access to all the operations of the BPF It's called just like Capsis admin eventually, I think we can fine-tune it to to be able to specify What kind like what type of programs for example you can Load right like which which kinds of BPF hooks you can attach in addition to that I remember from from last year. I think John Had like this request like we can even use this token to fine-tune Some of the kind of fund them like currently fundamental constants and parameters of the verifier like the instruction limit and Whatever right, but didn't want to do it from the very beginning just to avoid a lot of discussion and bike shedding I'm like how exactly we do this and what exactly we do this so I Hope that like all or nothing will be the first step and we have like all the extensibility of the union BPF after To kind of discuss it based on the real-world experience and feedback So some of the other aspects that seems to be good to do from the very beginning would be to allow whoever is creating token to specify some Relatively large buffer which will be like black box to the BPF subsystem. It will be just passed through to the Kernel structure, which will be visible to BPF programs right like BPF LSM programs For example just some sort of user context to to be used as like a additional config on at runtime like what you can do was With the application that is using this token and in that sense token becomes a representation of the use case All right, and like you whoever is creating use case can specify like what the security policy can Can or should enforce later on So yeah in that sense while the original that BPF idea was mostly like the singleton file like was talking I think it would be more powerful and flexible to allow multiple of those and Potentially have like different talking for each container different use case So so one question like how would this work in practice So you would basically have like a demon and then this demon would pin up on request Somehow like those tokens into the BPF file system and then applications should be able to yes Yes, something was root permission So like for example like at matter we have Tupperware right like which is our container stores It has Tupperware agent which spins up like new containers sets them up like through system D interaction I don't know like all the details But basically there is like this privileged demon that like sets everything up downloads images and all the stuff So the way that I see this this Tupperware agent would would know that like this specific Use case service right is allowed to use BPF right? So then like when setting up the container They will mounts is a fast in like well-known location or something like this and just pin this token file and then like all the Like normal and privileged processes inside. They will just know where to expect this Token they will open them and like pass to the BPF school And I think like similar idea probably can be done with system D like for Which is generic open source solution So the you said that in the beginning you said you kind of don't want I guess you don't want Tupperware to give programs Root per se that's kind of the initial motivation. Well, both we don't and we cannot maybe I probably messed up that part The cab BPF did like we require like real cab BPF in internet states They are fundamentally incompatible with username space and like at least Matta is trying to move towards containers using the username space for like battery isolation Okay, and like that just leaves us like with no technical ability to even grant Ability to use BPF inside those containers. Okay, so there's a username space in play You start your program in that username space and you want to give it the ability to do some BPF things kept BPF things. Yeah, okay, and That instance you there's no mechanism in the kernel to start it with like the global cap BPF because the way I understand And this token would still be like global cap BPF, right? You can still go and read others process these memories if you wished, right? Yes, so there's no semantic difference but like I think I'm missing like a little bit where What you're trying to achieve is not possible with cap BPF at the moment That's probably just because I don't understand how the system works enough So which video confused like why we cannot achieve it with cap BPF right now Yeah, so so like the combination of cap BPF and username space Understand that like in the kernel that check is kind of it's kind of the wrong kind of checks So you always check the the capability in the root namespace or something like that. Is that where it comes from or yes? Yeah, so like it's hard coded in the in the kernel that like when you do capable cap BPF and You are actually running these user namespace then like it just fails Okay, never get to like checking did this process have cap BPF granted in initial namespace So the user namespace it has maybe cap BPF, but then The check still phase on kernel because it doesn't have any version namespace and there's no way to giving it cap BPF yes root namespace because yes, okay, and and to do your question why like so sorry to jump in so To extrapolate on this so like initially actually when we added cap BPF I thought like we will figure it out and I guess we misunderstood I misunderstood how user namespace supposed to work so Set the user idea like real there is no real root inside us on this once you inside us on this you can never become See sudden or root In the need so just no way what's ever to escape it. So that's an escape claw So once you start anything user on this whatever it is you cannot use BPF at all And there's no mechanism currently none Yes, I think the only way to start the container as like privileged right and then you basically lose everything you have to give it everything or you just There's no way you either you give all the privileges and it's root equivalent or like there is no ns capable cap BPF It's all possible. Yes. So once you start the container with with us or not That's it like you cannot go you either start with us or not so without yeah Yeah, so right now like we we can run BPF in container only if that container doesn't use user namespace But we actually want to migrate everyone By design you can't have like you couldn't have a container that has a new namespace And then it has a capability in the host namespace. That's not possible like by design Yeah, and just to answer like the second part right like how do we know that like we can grant this stock and that that's basically The part where like the trust is and I don't define what is like trust for see program because it will depend on like production Right like it matter we can have some setting some config that like this container is trusted to do BPF And then like this this agent will mount extra token inside the container, right? Well for others like we know that they are not supposed to use BPF So there will be no token so then even if they try they will fail Like in some other environment like it will be based on signing or based on whatever, right? Like it's it's separate orthogonal to To the mechanism itself. Yeah, sounds really interesting. I think it's as you said, it's Like I like it much better than the deaf BPF token initially with the kind of the very fixed semantics. This seems really nice Since we're talking about username spaces, what is what is it going to look like with mount namespaces and like this BPF FS That's currently like in a gray zone I think in terms of like if you have a container that's set up does it get the host FSBPF or a completely new one or so as far as understanding someone can You know correct me. I think like each sysafas instance is like independent So you can have like each container having their own sysafas, right? So you don't have to share anything, okay? So So I had one one question like would this also cover like the retrieving something by the ID for example Map ID pro FD or token ID potentially because like then you would kind of break the The namespacing again right like for example, like if you give this well, it's a trusted process in the end But if you give it the capability it like in its container It will also be able to access the file descriptor from outside potentially tokens even and it might be desirable actually Like maybe they share their map, right? So yes, that's what I want to start with just like as I said all or nothing So to grant the token to create the token you need to have capsis admin, right? If you have capsis admin you can do all of that, right? but then eventually we should probably like lock it down and Like was spinning or you know like we have this this like kind of a bug was maps, right? Like where you can like pin it as a read-only and then open that street, right? Right, so I don't want to go there Initially, but like eventually we need to fix all that and then I think we can dictate also that If you have token that was open as read-only then you can do only like read-only operations, right? Like query information stuff like this, but you cannot Do anything destructive was that program or map or whatever, right? But second stuff Just a clarification. So you give the token to I guess a Username space for example, and then that username space can load any PPF program currently. Yes Well, that's where we want to start right like I don't want to solve like Over-engineered is from from the very beginning before we start applying this in practice and seeing like what actually makes sense We can like add like filters of what kind of programs and what are the ideas of the nuts and stuff like this But that will be probably it feels to me like but you really want it's not a it's not a nothing at all approach because I Don't know it currently makes it pretty Useless, I would say what you really want to allow this per program, right ideally per load Basically, I'll load an attach Christian You can you can probably combine this with something like signing right you can have like The FT like as Andre mentioned you can have this combined with the LSM led policy which says if you have an FT you can load sign program something that is Trusted that is signed by a key that is trusted by the container manager or or some level of provenance in the container stack This could be the container manager in the in the Other kernel or some build server somewhere else They can use this in conjunction with it just gives the a bit of a easy loading semantic from within the container that you don't have to Go to through some routine when to have your loading done You can have the FT and have the token from that there and then have no just sign programs You can build a more complex policy on top as of this as well Hello KP Hello Christian Yeah, so that that's something that like I kind of anticipate and that's why I have this user context, right? That user context could be like arbitrary user defined set of filters or Permissions right that will be enforced by BPF LSM on top and BPF LSM will have like direct access to this this array of bytes I Have a slightly tangential question under it had this discussion about authoritative LSM's gone the way you wanted Would would you would you have still preferred the semantic with the token stuff? right, I don't want like Uninformed or sort of opinions to dictate technical decisions. So had this discussion been that hey, we can have authorized LSM like authority or like LSM being more on An allow path and not just on the restricted path. Would you prefer that approach or would you still prefer this one? it feels to me that BPF token approach is actually more flexible and more Convenient in practice actually. So yeah, I would probably still prefer the token. I just didn't think about this initially. I Agree with this. I think what it does is and and I'll get to the next slide and explain kind of like why Okay, okay, then we we shut up and we wait for your next slide So what I was wondering about it like you the token thing is only about access, right? It's not about anything else right like because usually what I've realized is when this model shows up What you actually want is a session right like because let's say you have a container that has access to certain BPF functionality And then you shut down the container. You probably want to destroy Good chunk of that isn't the model that you're looking for really that you want a BPF session that But that's the fact that what it is right like so if you have sister fast in sorry BPF a fast instance mounted Just for the container when that container is Shut down that BPF a fast will be unmounted right and like token will be gone Okay, but then there's the question is why do you need a token and and this instance isn't the instance and enough that you Bind everything back to the instance of the BPF FS I'm not sure I understand the question Mike. Okay, I remember wrong on that I Mean like you would have to not have another way of saying please ignore. Please. Please don't do cat capable cap cat BPF because You you can't just have a check of me saying like oh do you have this mount in your mount tree because there are many issues that you You can have with that like for instance for for containers in I don't know if currently docker slash whatever whether we use the hosts slash BPF mount from sys because we actually copy the host one and make it read only Except for the ones that we don't that are like special because because not because most of this is not namespace So you can't just mount a new sys. You need to copy the host one for instance. So you so a check like that wouldn't work. I Mean they they I would assume like we have a file systems like this before like DevTemp FS for example What's called DevPTS Basically every container gets by the container manager mounted a new instance That basically gives them control over their own stuff This sounds to me as if the container manager if it wants to allow access to BPF should just mount its own instance of BPFs into it That is a privilege thing that enables the That they can open that up via whatever it wants like access modern things like that And then when you go away you destroy it so that your token your tokens a file descriptor anyway It could just be the the reference to this BPFs. You don't even have to mount it You can just leave it as a sweet block and memory Maybe I'm misunderstanding But that's exactly what we are proposing here that like you have the BPF a fast instance and then you pin the token because like But it doesn't even have to be that's my question like why is a different thing Why don't you just take the file descriptor to the BPFs as the thing that gives you access because if you have access to the BPFs you have access to the BPFs and like what I'm the only thing that I'm not getting is like why do you need two things? Isn't this shouldn't this be the same thing that everything if you're allowed to create something it also implies a life cycle That's all I'm basically want because the the BPFs gives you a life cycle this token thing as far as I Understood does not that's all I'm saying Joe So if a single container ever needed two tokens right then the life-cycling of having separate tokens and being able to mount like Hey, if this is whatever you know token one token two and then somehow Like whether it's to do with the secret hierarchy and there's like two containers in the same You know So if it's like that then having identity of individual tokens under that file system would be helpful Versus like I think what you're proposing is basically the file system itself acts as that token Yeah, so this one is just more flexible But basically you can do like with this setup you can do what you propose like where it's one file per file system You you can also do away without BPFs at all like you can just transfer the file descriptor through like Unix domain socket For example, or you can have multiple per per each container and then like maybe Restricted based on LSM or file permissions or both stuff like this So it's just more generic. I guess maybe maybe I'm misunderstanding actually what you're proposing So but I mean you can have as many BPFs as you want right like and you can nowadays Just allocate a super block like why are the new mount API's and then you have a file descriptor for that All I'm suggesting is that make the API so that somebody allocates is BPFs superblock Like something more privileged than you allocate the super block for you Just hand it down to to your container or whoever shall have that and that is everything that that they need to get access Which basically means you merge the two concepts of having a BPFs that is Capable of pinning things is capable of enumerating things and the access because it's just one file descriptor at that point because if you Have a file descriptor to the BPFs. It basically means okay So you're suggesting coupling the token and system BPFs FS. Is it right? Yeah. Well, I mean coupling I would just make them one Yeah, but but doesn't that like so there are use cases where you want BPFs FS, but you don't want talking for example But then like for example, how would you grant different permissions for BPF for different applications inside that container? Then you would just give it one Oh So I don't know what currently how does the BPF FS currently work I think what you're you're assuming is that each BPF mount is a separate super block It is very well, then it would work probably I mean and then of course the next thing would be to set a mask of things that you can actually do with the BPFs on this thing But this is what you're describing basically is the concept of like the real capabilities Not the Linux style capabilities where you have an object you pin it you have permissions on it And then the object goes away once you give up all the file descriptors Which is actually a model that you implemented in most of the BPF concepts already So all I'm just saying is I think you don't need the token just use the BPFs may pluralize that if you so well And use that as a primary way because that's your context object That's that's everything you already have is bound to that thing because it's pinning and things like that make it the proper concept object Context object. I don't know if I understand it correctly So back to the slides where Andrew showed that you have other kind of permission you want to give for example Control which helper functions you can access or which memories that can also access And that's one cannot be tied to the up the file system interface now you can use a token to So like Are you suggesting that mount options would be used to associate arbitrary metadata with that instance of BPFs? Because I think that the thing is that in the LSM you want to know which instance is being opened And also you don't want every instance of BPFs to have this capability token There's definitely use cases where we want to mount BPFs and expose maps and don't want the user to have BPF access Isn't maybe maybe misremembering, but is there not a mount on the host that everyone can access or is it it's a return they okay? Okay, I guess that answers the other question. I was gonna have which is that about this some about being able to grab the Because because you say that you you know It's because of the way the pinning works you have to do an article to get the pin, but but um The implication sorry yeah the implication of getting the pin as an unprivileged user means that an unprivileged user needs to be able to do that And the question is is how is that gonna be? Yeah, sorry for this one. Yeah. So so when an unprivileged process does BPF object to get bubble What permission check is being done there is my question I guess file permissions I think right now, okay? So I guess I agree that there is like equivalence in some sense like if you are Okay, it was like ten different instances of sysafas per per container then like yes You could implement it But it seems like less flexible still and just like generally more messy solution because BPF fs is also used like to share maps was it like let's say within the container or within the host right like you can Pin the map and then like multiple different applications will will expect that this map will be pinned in one specific location Then you'll open it and like you will share it right that's like intentional that you want one instance of BPF fast for sharing Right. So like coupling that was like this token like Model is just yes, it might work. It seems like glass flexible I want to voice that I disagree with the dev PTS example like deputy is not the same thing like it's completely different I I don't yeah I don't I don't think there's any other like case where you would have like this much permission access due to having like a Weird mount point thing, but that's just my Sorry Follow-up question like is there a benefit to making BPF fs the same as the token as in Like what you can do your approach is like BPF fs is the token or like have that be a separate thing Is it like a meaningful thing like where where this is? Okay Not to speak for you, but I assume what your point is is that you can always add an extra layer of complexity But there should be a reason for it. Yeah, I mean this thing is like I has a suspicion when you when you delegate stuff I mean we have delegation of all kinds of resources like C groups, whatever else and they they Binding that to file system like if you have to do this anyway, it's kind of nice in a way and then I don't know It's like it's it's to me It's a simplicity because you're pinning ultimately the same object there. You just Different ways to get there like once you just focus on the security security aspect about the authentication aspect In the other way where you just focus on enumeration and things like that thing would be vastly simpler model of it Which just the same thing because usually if you have access to something like This implies pinning and if you have a some object pin, then this also should probably have some access that you have on it Controlled, I don't know but yeah, I can't totally understand if it's hard to follow what I'm actually trying to say here But it's just probably take it online like into the whole discussion and just like try to get to the details The the only problem that I really see is with this By the way, I applaud your persistence on solving this issue because I think this is it's a legitimate request and The LSM thingy I found kind of weird as well But in general, I don't see something specifically wrong with this the only thing that bothers me is it's like a Orthogonal delegation mechanism to all of the namespace hierarchy, which is Which is fine, but the problem usually is and that's an advantage of doing it your way Is you tie into the namespace delegation hierarchy because you have the file system you mounted It's bound to the amount namespace which is owned by the username space and it's tied to the life cycle of the username space Whereas if you build a separate model all of the delegation is Completely orthogonal, which makes it much more risky in the long run Having security issues Which is my only concern which is vague arguably It's I'm certain was long so to remember this I'm pretty sure it can't just open like CIS FS BPF as an even as a privileged user You can open a handle to it. You can't do anything with it. You can't look underneath it But like if you open an opath, it'll let you I'm pretty sure But like what but like how would that I mean, I mean, it has the right thing. I mean opath does slightly more than that. I wouldn't I wouldn't say it does nothing I mean, there's lots of really awful awful stuff. You can do with opath like I'm Yeah, I'm yeah, I'm I'm a master of doing awful shit with opath Still I mean, of course if you pass this this file descriptor to the to the main File system object to the BPF system quality check that it's not an opath file descriptor But a real one that basically means. Yeah, you actually really had the the right to to open this right like this is After going left double check, but I I don't know if you need Do you know if you try to open like a directory that you don't have access to or read access to use open a hell to it You can with us. Sorry. We'll have to just later. Sorry. Yeah, keep going. Sorry I have a little bit of background like how how we do do do things in in User space these day like an insistently like for service management We usually like when we start a demon like the model that we try to follow is like we give them lots of Their own private file systems, right? Like they get their own private Subdirector in slash run, which is just a tempf s instance We do this model like it get might get a df pts It might get a couple of other things like private file systems that belong to them that only exist in their namespaces and things like that For me it would appear very natural like Like this isn't like farce that makes a lot of sense this model that for the BPF thing is the same thing Right, we could tell system be hey give this instance a in its own instances of BP FFS Maybe with some security attributes that would have to be configured and that's all that happens And this actually that's then handed in via file descriptor or even mounted into the namespace if that's that's being used But that's yeah, this is just to to to illustrate what the background is that we have there because What you're basically saying with a token is then we would have to do two things right like we would first of all Mount the file system instance in there But then secondly also allocate like a bunch of tokens that we have to pass in through some other mechanism, right? And then you just want to like why did I just do this? So I actually see like that as a benefit that like they are talking also BP FFS and the tokens because you can actually recreate the token if you Updated some global configuration without restarting the container You can change like this user context and like start allowing or disallowing some like operation, right? We don't have that like in this initial design like we don't have like a Tunable like what can be done using this token, right? But eventually we'll have more right like we'll have how many? Instructions like maximum instructions is allowed, right or like which program types should be loadable under this token, right? And like having the token you can actually Yeah, why and those that's the thing if you have like in the new mount API for example We say like why it's not part of the BP FFS like a mount option if you have some Container like some application that already pinned maps, right? They want this map to be preserved like so when you remount like wouldn't those pinned maps be unpinned basically? Yeah, okay, that's just a superblock survive. All right. I mean also if you don't like options totally fine Make it some attribute files or something weird like this if you want to but honestly, I think mount options would just work fine. I Have a question How do you even now we can have BP FFS and containers, right? how would I differentiate when this is acting as a privilege token for like as a Like it represents this cap BP F privilege and when it is not how would I differentiate it as a user of a container? Whereas this this token based approach which where it appears as a file descriptor And I can do a BP of operation on it to get access to privilege context is very very clear In this scenario just ends up being a confused confused case where I have to look at some attribute of that file system We already so I guess to rephrase the KB So basically you would have because we already have Be that we've asked him in containers. So we already have that So if you were to add this feature that this now allows you to get extra permissions, then it would be a mount option You need to be an opt-in right it would be like your amount of which is like delegate or allow Yeah, yeah, but what's what's the what's the problem? You can't magically upgrade your running containers to use unproven BPF anyway The idea would be if you don't specify a mount option you get the same behavior as you always did if you specify its mount option Hey, this is a BPFS with magic delegation Capabilities and suddenly you can do a little bit more than this and then you can add mount options as much as you want But it doesn't really have to be mount options But somehow you want to attach it to the to the as a user if that is running within a container How do I know that this BPFS that I've gotten in my container you something that is privileged and something that is not I'm like my recommendation would always just execute and see what happens But if you want like I mean if BPF wants to have an API to query this it's not not a problem You can make it an I octal you can just read it from the mount options Like like a million ways how you can communicate the information you can make it a new BPF call We pass into the system call like the file descriptor to the file system and then it responses you I mean this Figure something out and if you do it if you extend it for example via amount up via amount options It might System decode mounted for you like could prepare the container manager would be need to be privileged in this case But then you could for example check for this mount option you need capsis admin for this amount option You need this and that I mean they are kind of equivalent right yeah except the token is a little bit more flexible because you can have multiple profile system and Multiple file system some with tokens some now so basically they equal yes like you can achieve probably the same was like slightly different manipulations But ultimately whoever is like deciding like has to be privileged and like say allow or not allow right and then Within the container the application will be able to query it by based on like file existence or like some extra command or some query and stuff like this right So Last comment okay Hopefully is a good one If BPF FS becomes the token then that would give us also a nice and easy way to migrate existing applications into like So that's my next slide I see I would like to get to it So so yeah so like the last slide I had right like assuming this is file and all that stuff like I wanted to propose besides you know like The mechanism cell doesn't anticipate like specific use case and that's what I actually like about it It's like orthogonal to like file systems like it doesn't have to be pinned It can be like just transferred as the file descriptor right but eventually like to make it easier for existing applications to take advantage of that right We would have to agree on some sort of a predefined Like standard location which doesn't have to be that like it could be overridden but By default we can teach the BPF any other BPF loader library BPF tries BPF to anyone who's who's having like deeper BPF usage right BPF's is called usage To just expect the file like a fixed location like in this case I propose like sysfsbpf.token but you know just just as an illustration And for example from the BPF standpoint like when it creates the BPF object and tries to load all the programs Right it can check whether this token exists try to open it And if it succeeds then like it will provide this token FD automatically like and transparently to To like the user itself with like every load operation Yeah But that means recompiling your application for this to work Yes So if we had the BPFFS approach we could have the kernel say oh this is in a namespace So that I have the mount and then I don't understand like how existence of the BPFFS in the container allows application that doesn't know or doesn't use BPFFS To suddenly get Unprivileged BPF like I just You know this is like an additional thing but like if that was like a Imagine the situation you have a container. It has BPFFS mounted by system D or something And I have application that doesn't care about BPFFS Yeah I never pinned anything I never loaded from BPFFS Yes Would the existence of this BPFFS in the container somehow magically allow me to do the unprivileged BPF I think it's like kernel would just check that like there is some mount It checks the current mount namespace or whatever That seems like very magical I guess I mean it's all kind of magical so I don't know I would I would caution against doing it that way Just because we had I mean There isn't a specific issue that I can think of but like we I Now that they go on I can see my opinion Yeah Like I think that the idea of using the file system is if you hold the Handled file system you basically replace the current Handled file system you basically replace the token idea with just that file system That was the proposal Which whatever that makes some sense I don't fully agree with the justification or whatever But the point is that if you were to then say well no no no If you have any mount whatsoever in your mount name space that gives you extra permissions That in my opinion is it would be a bad idea You wouldn't want to do it that way First of all because you could suddenly have programs that have more peruges than they used to Without having any way of like knowing it and originally And the other thing is that you have Like the way that we talk about like what mounts are in a mount name space and everything else That's like a whole another kettle of fish like technically like what is a container Like you can get like very philosophical with this discussion in about like three seconds So I would say don't touch any of that You have to hold a thing that you are then dealing with would be my suggestion Yeah, and so all programs they can deal with it by restarting them with a new library So you're basically saying that this makes sense to you I mean I could go either way with either proposal I don't know Yeah, that's yeah it seems alright to me Yeah and like so basically by having some sort of like an ecosystem Just two words By having ecosystem wide kind of agreement that like In typical situation if this special dot talking file exists Then like libraries and tools will take advantage of that to To bypass like root permission requirements basically Right and then like it will be on system D and like whatever Solutions to either grant this token and pin it and expose it or not That's all I had, thank you I think one more thing you can mention from security No, no, no, we have to cut off, I'm sorry We have to take this to the hallway or like we have to do another office Yes, K.P. come to the hallway Okay, I'll see you I'll take a flight and be there in five minutes Alright, thank you K.P. just message offline we can talk about that Thank you You can say it now while Jerry is preparing his laptop It's also fine if it's just one comment That's just one comment, I think one good aspect of the token approach Is that you can create multiple contexts and if something needs a privileged operation Just for reading or for a limited context like a tool That is introspectibility tool for BFF management It can use a token that is created for a smaller sort of context in mind Than like something that is a larger context So you can create multiple contexts rather than exposing everything via BFFS So I still sort of prefer the context-based token-based approach Alright, thank you Okay, are you good?