 It's actually not me, it's more Chuck. Of course it's actually, so the problem statement. If you want to, yes sure, might be helpful because after all it's basically your code. So the problem statement, we are trying to do, to implement TLS for network storage. So Chuck is doing the NFS bit, I'm doing the NVMe bit and there are proposals and specifications how you should be running TLS on either NVMe or NFS. The problem is that we can't use the normal or the time-honored approach of doing the initial handshake for TLS in user space and then pass things down to the kernel for handle the actual encrypted traffic. This is the model how the current in kernel TLS implementation has been designed. So you have to do the initial handshake then pass the initialization vectors into the kernel and then the kernel will continue to do the encrypted traffic. Sadly this model doesn't work for us because in our case the socket, the initial socket has to be established in kernel as there is some initial handshake unencrypted needed to be done on an existing socket. So we are having the reverse problem that we need to pass an existing socket slash file descriptor up to user space. And we've searched high and low and found that there actually is no way how you could do that in the kernel. Yes, so the one thing is one could do the handshake we could do everything in the kernel. As it so happened there is a company who did that. So yeah, so okay so there's a company called Tempesta who did exactly that and surprise, surprise it works but surprise, surprise, surprise it's a lot of code and it's a lot of security-relevant code. So it's not only that you have to code it but you also have to do an audit to see whether that what you code is actually sensible and doesn't have any security implications. Right, right in rust. No, yeah, exactly right in rust, yay, of course, yeah. We didn't think of that. No, so and then, and once you did that then you have to fight the kernel community to put it in because that's arguably something which really shouldn't be in the kernel and for half of the developers would say, oh no, gosh, that's a user-length thing, that's a policy issue and doesn't belong into the kernel. So if you decided to try and solve this by passing a file descriptor up, what about just using the kernel as the man in the middle? So just pass the packets up and down. That we looked into but if you do that then you lose the main benefit of file descriptor passing which is you can use existing libraries. So if you pass up the file descriptor you can easily just hook in OpenSSL, you know, TLS, that's precisely what their current design of these libraries are. So you just need to have a very, very simple stop basically just telling OpenSSL, oh, instead here's your file descriptor go and it does. If you're passing up packets, you would need to teach in TLS or OpenSSL to, well, read, do the packets. Well, not if you put them into a PTY and then attach the PTY to OpenSSL, TTY to OpenSSL. Yeah, and just do a cat, yeah, right, of course. I mean, no. What I was thinking about at a very high level is that even if you didn't upstream that code there's value in having all, I'm assuming for argument's sake, that the TLS code in user space that you'd be linking to works perfectly. Yeah, sure. Because it's well tested and it's been around for years and all that. So really this is not a direction we actively pursued because it's such a thing. But for testing, it's your guinea pig, it's your reference because you do your whatever at the NVMe level or at the NFS level, you convince yourself, yep, it works. All of this stuff works. Whatever the minor protocol tweaks you have to do with your stuff work because I've been thinking about this for SMB over quick is the same thing. It's like find an example where everything is up called to user space, convince yourself you have a reference platform, stuff it off to the side, and then begin the work on this. But the problem that I just banged my head in the wall is I could find zero good samples for calling anything socket-like that doesn't up call. So like take, whether you're talking about TLS or quick, I want to send a packet over the network, whether it's quick or the TLS over TSP, whatever. I want to send it to a user space library. I could find no mapping example, no good up call example. Yeah, exactly, neither did we. So yes, you're right, that is none because there's no mechanism for it. So we have been pondering how to do it. I had been looking briefly, looking into netlink where that one could use update netlink to do this file descriptor parsing on the grounds that there already is this POSIX compliant file descriptor parsing stuff which goes via the Unix socket. But then looking at the code, this is pretty much centered around the Unix socket itself. So there was no easy way to copy that over to the netlink socket. And it essentially would mean that I would need to invent a completely new mechanism how you could do file descriptor parsing via netlink. And what with me not being a network person, I didn't really feel comfortable with that. So you can look at NBD, NBD does this. Like NBD passes the file descriptor for the socket that's been sent out through netlink. You can configure, like we did this for NBD because we wanted to be able to like let you, oh yeah. Didn't they use in the end, if fallback to call user mode help? What? No, like there's the old IOCTL interface but there is a new netlink interface for configuring NBD and we pass the file descriptors through netlink to NBD to connect it up. I thought the mechanism was user space creates the endpoint and passes it down to the kernel, not the other way around, which is what we need. Oh, okay, I see. I thought you were saying that you couldn't get it through netlink from user space, which you can, but I guess the other way is, yeah. And the problem with the netlink socket is that you have quite a potential security issue there because you need to format the netlink message and install the file descriptor in the process table of the receiving process in two steps. But if you do it when creating the message, then the process receiving the message has no way of saying no. It can't reject the message because the file descriptor is already installed in the process table. And if you do it the other way around that you try to install the handle when the receiving process reads the message, you have the problem of a potential security violation because what it means that one trust needs to reach out to something else into the kernel. Give me that, trust me, that'll be fine. You can just uninstall the file descriptor. You don't have to keep it there once you install it. So if you get a failure of the message, you can take it back again. Yes, but the point is that it's an easy way to overflow the file descriptor of processes. The process in a way of saying no, I don't need, I can't do it because I need the file descriptors which are open for other purposes. So please don't do it. You have nothing to tell me. And you shouldn't be closing that file descriptor because between you installing it and you getting a failure, user space may have closed it and opened something else in the same file descriptor. Yeah. And you end up with the same open file descriptor, which is awesome, not a good thing. So, not a good thing. So anyway, so here comes Chuck. Before I'd like to interject one thing, with the, I think it's TLS 1.3, I think the Crypto Layer actually has all the bits you need. You just need to call them. So all the KDF stuff bits, I think they're all there. So the Crypto Layer has the bits for parsing and handling control messages once the encrypted connection is established. You still have the old mechanism that you have to pass in the initialization vector and the socket. And then you have encrypted communication running across this. Granted, with the TLS 1.3 protocol and all the mechanisms there, yeah, fine. But the initial hashtag still is in user space or expected to be in user space. Well, it can't be turned into a kernel. Yeah, because of dollar reasons, as we just discussed, we can. So in chat, Enzo said that FreeBSD NFS does NFS over TLS. Do you know how they do it there? They pass an open socket and point up to a user agent and it does a handshake with a library implementation, probably open SSL. Yeah, most of these internal consumers are passing the socket up to user space and using an existing library implementation. So, and the security community has been encouraging us to take the same approach. Another reason we need to pass an open connection up to the kernels because for the server side, it's doing the accepts in the kernel. So the kernel has a listener, it doesn't accept there and that's a connected endpoint that we have to pass up to user space to do a server hello. So there kind of isn't any way around passing an open connected endpoint up to user space. So what I did was I used a second listener. I created a user agent that listens on a special address family and when a kernel consumer needs a handshake, it queues. It's connected endpoint on this listener and the user agent does a poll and it accepts. So that materializes the connected endpoint in the user agent's file descriptor table immediately right there. And then we just pass this to Canoe TLS. It does its magic and it does the setSoc opt calls to initialize the IV in the socket and then it closes it and the close basically tells the kernel, okay, the user agent's done with the endpoint and it can use it. So at that point, the kernel checks to see whether the IVs actually were implanted. That means it was a successful handshake. If not, it was an unsuccessful handshake and the socket thrown away. And then a way we go to the race is based using the existing KTLS infrastructure in the kernel. So we both have implemented prototypes. NVME on TCP can use this. I've got one, RPC with TLS can do this. We're hoping we can build infrastructure that Quick can reuse because Quick v1 uses the TLS 1.3 handshake protocol as it establishes connections. But we are getting some pushback from various peoples in the networking community. I guess to flesh out the objection to an in kernel handshake, it was that basically doing it in kernel will expand the attack surface because the handshake code is, I don't know, probably 15,000 to 20,000 lines of code. Even though we're basically, the handshake itself would be the new logic. The existing Cypher code, the cryptic code and the X5 and I code is already in the kernel. So we're not adding any footprint there. I'm actually not sure what we're coming to this August body to ask. We wanna ask if anyone else is interested in this. That's one thing, yeah. Microphone. So how would people feel? Microphone. And the overarching question is whether we should keep on going with having things in userland, meaning doing the TLS negotiation in userland, or whether it's worthwhile or mandatory or required to look into having everything in kernel. That's actually the main thing here. Speaking from somebody who's been bitten by this, a fair bit, I am extremely allergic to adding anything to the kernel. In fact, I would like to delete most of it and put it all in user space. That being said, we already have a lot of this code in the kernel, right? Like we already have all of the crypto code. So it doesn't seem like it's a big deal. Like, clearly there's a big deal. No, no, we don't have it all. So TLS has a huge amount of handshaking code that is missing from the kernel. That's precisely what they're doing in user space. Well, we have all the crypto primitives, but we don't have... Oh, we have the primitives, okay, sorry. It's a little less these days. 1.3's handshake is probably half the protocol that 1.2 was, and we don't need 1.2. Both of us want 1.3 only, 1.3 and above. So that reduces the size of the code we're talking about. And I'm happy to do a prototype with that call. It's just, what are we gonna do in the long term is the question. But have you seen the... So I had the unfortunate job of having to look through open SSL bug reports. Have you seen how many they have on the 1.3 handshake? How difficult it was for them to get it right? It may be less to do than 1.2, but it's damn complicated to do it. We have implementation examples. So how hard could it be? The other thing to note here is that even though we might have certain things in the kernel, like for example, certificate processing, it's a very, very small subset of all of the strange and wondrous things that you can put in a cert, which some customers will want. And I would not, it really is better to do all of this in user space, please. We don't need to pass the entire certificate. We just need a bit out of it and then we can keep copy of the certificate to send on to the other side, if necessary. I think you have two different slogs in front of you. One slog is putting all of the handshake stuff in the kernel and the other slog is setting up this communication system so that you can do it in user space. The user space one is going to be much more usable in the long term and it's gonna save you a lot of pain as, you know, there are different handshaking things change or we have security things that we need to patch very quickly. So I would do the user space log. The one issue that we don't feel very comfortable with with the user space one is how we're gonna deal with the root file system or root block device because as soon as you hit a direct reclaim memory pressure, we're gonna need to do an upcall to reestablish the TLS session and the user agent's not gonna be there. So we have to do something to make sure it's M-locked or it's made specials in some way so that the kernel can rely on it. So that's one issue. The other issue is a problem that I don't think anybody has solved yet and that is how does a kernel know it can trust who it's talking to on the other end? I mean, it's upcalling to something. Is it trustworthy? Can the kernel have some kind of attestation that thing is what it expects it to be? We haven't solved that problem. Certainly listen, Paul, except closed does not solve that problem. But we don't feel like any of the other security related user helpers like GSSD have solved that problem either. So we're open to suggestion there. Yeah, I mean, this is the previously unsolved problem of assuming that request firmware and request module are in fact what you want it to be and we're just sort of, the assertion is that S bin request module is sane, right? And I think we rely on that if we need to have another S bin request TLS that has to be there for the root file system. It's as good as what we have for everything else where we do kernel up calls. We can certainly put all of this mechanism into the init ramfs and that kind of solves the problem. But with request module, we do actually have signatures on the module it loads. So may not be able to just what request module does, but it gives us an image we can check. Yeah, I think that these kind of problems are out of scope of what they're asking, right? Like, do we do this in user space? Do we do it in kernel space? I'm clearly biased against kernel space. But practically speaking, it does allow us a little bit of flexibility to do it in user space and adding things like quick and other shit later on. Right. And I think for the prototype that makes all of sense until we have our legs under us and we understand what all of the issues are and what we need to do user space is absolutely the way to go. So the hopes for doing this in the kernel are of course, dwindling every time I give a talk in front of people like this. Well, you say that, but let's back up a little. This isn't the networking summit. If for argument sake- No, it'd be worse if it was. Okay, I'll believe that, but at the same time, if for argument sake, we can boot systems and use this somehow and we figured out NFS and then SMB already supports quick on other platforms. If we supported quick, if we supported NFS and then DME, think about the networking guys. They might actually take this on in their summit and go decide if it's a good idea to do additional pieces in the kernel. Like I guess what I'm saying is once you have a consumer, your consumer NFS, then maybe the networking guys think about what's the best solution because it's really not a file system topic in some sense, right? No, it's really not. It is, and actually you're completely right. It's actually more something like a networking topic because well, this is more or less pure networking stuff. But the beauty is that you're making them by turning on a user space up call and maybe later for SMB, maybe later for other things. You're creating a customer demand for them to do the right thing. Also, I presume it's not likely that the handshaking will ever be offloaded to a network card. I presume it's always gonna have to be done by the operating system of the user space. It's never gonna be done by the NIC. Yeah, I think it's a huge lump of policy so it's always gonna have to be done by the host or us. Yeah, plus not only a huge bunch of policy but also a huge pile of legacy because I found that quite some libraries always try to figure out which version to use. So essentially they always start off with the 1.2 and only if they figured, oh, you can actually 1.3. Finally then go up for 1.3. So I actually hadn't been able to establish a connection by just claiming to the 1.3. So there's tons of legacy stuff involved. And the NIC would require a load of certificates as well I presume to verify as it gets. As far as having the networking people do we've got some pushback on that because they don't have any consumer for this. They don't see, you know, tell us how it's gonna be used and why we should do this. So that was kind of the pushback that I could, we pushed, at least within our world and that's the pushback we got back. They didn't have any reason to do it. So that's, that's, so we can come up with it because we're the only consumers. Do you mean reason to use TLS as opposed to a different encryption technology? Yes, yes. They were not, they were not, they were not, because we're the only consumers. Well, so, TLS gives us, TLS gives us two services that are very important. The first one is it enables mutual peer authentication using the X509 certificates provided on both ends. So a server can authenticate the client to know that, okay, this is a client I recognize. I will trust the users on this client to do what they need to do. The clients can authenticate the server so they know they're talking to the right data store. The second facility it provides is intransit encryption and not only encryption and transit, but also encryption that can be offloaded into a specialized hardware so that the host doesn't have to do it. So we think these two things are great value adds for storage protocols. They're not directly provided by things like WireGuard or IPsec. We kind of have to do little twists and big toe stands, hand stands and whatnot to make it actually work with those. But TLS is a broadly deployed and widely understood internet building block that we have a lot of confidence in building on. That's why we, that's basically why we chose it. Well, the other specific thing if you're gonna do MTLS is the fact that you need a private key somewhere. And most people like private keys to be stored in things like TPMs or UB keys or something else. We don't want all of that pulled into the cart. So that's an implementation detail, but it's gonna be something that we'll have to consider because right now the certificate is coming from a file on the client or server. And it's specified on the mount command line or it's specified when you start RPC, NFSD or however you wanna do it. And basically what we're doing is we're putting those certs in a kernel key ring so that they're visible to the handshake agent. So if we have a special tool that says instead of looking in the key ring, go look in the TPM. Great, we can do that. Our key utilities, kernel key ring utilities for dealing with the, getting stuff out of the TPM now. This is an asymmetric key. We just removed all of that from the kernel. That was for TPM 1.2. This is why user space is good. So it may be a bit early here in terms of where this is at since we're still trying to figure out where it should belong. But I see how in terms of the prototype that you've done using the kind of accept close lifecycle is really nice and clean in terms of getting it to work. But you're not getting any additional data along with the file descriptor when it's passed up. So like you've already got two in kernel users. How does the user space portion know or does it need to know which one is servicing? What if you have different configuration needs on what needs to be done for NFS versus NVMe? So the way we've addressed this is that the special address family has a set of socket options so that they can parameterize the handshake. So you can provide the certificate, a PSK, pre-shared key, and it can ask for various types of handshake. For example, it could ask for a client hello, a server hello, session rekey, or a closure alert. Basically that information is an enum that's passed up in a set of socket options. Some people find that terribly distasteful and ugly. I'm not going to argue the aesthetics but that's basically how we can parameterize the handshake requests. Does the TLS handshake demon know what the protocol is who's asking for it? Not really. No, but as long as it knows what its requirements are. It's all basically in terms of TLS properties. What TLS would need to be done, which handshake parameters will need to be passed. So it isn't aware which protocol in the end will be running across the TLS line. It will just put in parameters for the TLS session and that's it. So it can be repurposed for literally any needs or so that's the hope. So since you're having so much trouble with the networking guys, I would really recommend a net link based solution here because you can do things like multicast from your thing and say, hey, I need a socket and you can parameterize all these things in net link and then you can just have a random thing listening on this multicast socket and say, oh, okay, I need to do this handshake now and do the thing. Well, multicast wouldn't work. So if we were going for the net link route, multicast wouldn't work because we need to pass the file description to specific tasks, not to all tasks. If that doesn't really work, then we would have a multiplex file description or something like no way. It needs to be a single task, meaning we can only do a unique task and we have to reject multicast for a net link. All right, yeah, that's fair. Okay, any other questions? So I know on the networking side, Jacob Kaczynski is doing a bunch with KTLS right now. So he was the most vocal about how ugly the set sock opt handshake parameterization worked. He's certainly vocal but easy to talk to about better ways to do it. So that would be my advice. Okay. Yeah, and on the related topic, I know there's been a lot of people mentioning quick to various conferences the last year or two and obviously other operating systems have an in kernel, that quick in kernel and it depends on some of what you're talking about. So I am curious if people have ideas who to follow up with on that. There have been like four or five user space drivers talked about, but quick is sort of like TCP, right? It gives you, it fixes all these things with TCP. It's more of a network protocol that depends on TLS. So it's, I guess easier to talk about in kernel in some sense because it's more like TCP, right? It solves problems TCP has, but it depends on TLS as well. But this is an area where we really need a lot of help from the networking side. And the guys I've been talking to have all been user space maintainers, not kernel maintainers. Just Stephen Heminger, I think it's the only one I've talked to on the kernel maintainer side. So if you guys, have you had kernel discussions with other people other than what we've talked about on the networking side? Well, we had quite a bit of discussion last week with Jacob because he finally woke up to the fact that we do it once Chuck posted his prototype. And we did have quite some discussion about the correct interface. And that is why we have such intense discussion now about netlink because that's precisely what he suggested, why can't you use netlink? Yes, we could, but then this is far more work for first designing the whole thing and then implementing blah, blah, blah, the whole gear. Whereas the accept one is just a logical thing because that just literally falls out of the use model of accept. So that was easy to implement. All right, let's wrap this up. We got break.