 Okay. Hello everybody. I would like to introduce you Fridolin Pokorny from Redhead who will present about Linux kernel TLS module. Let's welcome Fridolin. So, thank you. Hello, everyone. Welcome to my presentation about AFK TLS, address family kernel transport layer security. I will introduce you Linux kernel that does TLS and DTLS to be more precise parts of TLS and DTLS right inside the Linux kernel. So, before we start, let's look at TLS and DTLS in a nutshell. So, TLS stands for transport layer security, whereas DTLS stands for datagram transport layer security. You are probably familiar with the abbreviation SSL, that is some historical naming. Now, the proper naming is TLS, and the current version that is used is version 1.2. We have a draft that describes version 1.3 and introduces a lot of cool stuff such as zero run trip times. So, you basically send your data within negotiation, and it also restricts some ciphers and so on. It's still a draft, so it's not used. You can find GNU TLS and OpenSSL. These are the most widely used implementations. You can find, for example, Libre SSL or other libraries as well, but all of these are mostly forked from OpenSSL with some vulnerabilities. So, if you look at the TLS and DTLS protocol, we can distinguish two layers. One is the control layer that basically does all the overhead that is needed for TLS and DTLS. So, you can find messages like, I want to negotiate keys or I want to shut down the communication or I want to do re-keying, because I think my connection got compromised. On the other hand, the record layer is used for sending data, encrypted data. So, if you are sending some data using TLS or DTLS, they are sent in the record layer. The main difference between TLS and DTLS is that TLS requires the underlying protocol to be secure or reliable, so you have to use it with protocols such as TCP. On the other hand, DTLS does not have such a requirement, so you can use it with UDP. But this also means that implementing DTLS is not that straightforward or it's harder in some way because you have to keep some additional information. This information is basically a sequence number and you construct some window in your implementation. You can find many usages of TLS and DTLS, for example, HTTPS, that is protocol HTTP secured, so no one sees your communication. It's also used in the email. Probably less known usages are, for example, HA proxy, that is high availability proxy, that is a load balancer for HTTPS communications. You can also find usages in SSL-based VPNs. The term SSL-based VPNs is quite used because of historical naming. You can also say TLS-based VPNs and such VPNs are OpenConnect or Cisco AnyConnect. Cisco AnyConnect is a closed source implementation but you can use OpenConnect that is compatible with Cisco AnyConnect and let's focus on some scenarios with OpenConnect. So let's say we have a client that communicates with some device that is located on a local area network and let's say that the communication is already established so there is no handshake. We are just sending data. What needs to be done on the client side? A client needs to first encrypt data that needs to be send it. So it issues encrypt, it also adds information that are related to TLS protocol itself so there's edit header and tag for integrity. And once we have the TLS record or the TLS record, we write this TLS record into the kernel and kernel then transparently sends this record to a remote server and it uses media that is used for the communication. On the server side, server in the kernel, kernel receives the record and once the record is ready, there's an OpenConnect server that listens on some particular port and has socket and once there's already the TLS or the TLS record, it reads it from the kernel, it decrypts it, basically it does the reverse operation of then the client. Once the record is decrypted, it removes TLS or DTLS specific information from the record so it removes header and tag. It also checks the integrity so no one changed something on the wire when we were sending TLS record. Once this is done, then the OpenConnect server writes raw data into the kernel and routes it to desired device. If you see, if we look at this particular scenario, we can see that the decryption is done in the user space. If we move the decryption to the kernel, we could optimize it some way. What is the optimization? Basically, we saved two context switches because we are no longer required to do write and read operation assist calls just to read data from the kernel and write just to send data and we also saved two copies. We are no longer required to copy data from the kernel to user space and from user space to memory that is allocated in the kernel for the socket. So this was how AFK TLS was born. AFK TLS is a kernel module so it sits in the kernel and it does TLS and DTLS communication for you. It introduces new socket type. It's called AFK TLS, address family kernel transport layer security and it implements the record layer of TLS and DTLS protocols. That means when you want to use AFK TLS, you use your OpenSSL or GNU TLS library for instantiating the connection. So you do a handshake with the remote and once you have proper session, you pass all the key material into the kernel and AFK TLS does everything for you. So you use AFK TLS socket to write raw data and read raw data. It currently supports only ASJCM but it could be extended using, for example, Cha Cha Poly and it implements most of the socket operation such as socket for instantiating the socket, bind, send, receive, write and such. You can also do advanced syscalls like send fire or supplies. The only syscall that is not implemented is connect. When I was designing API, it didn't make much sense to implement such syscall. If we look at the optimization, we wanted to save two context switches but this is not possible because there's no syscall in Linux kernel that would say, hey, everything that is received from this socket, source socket, send to destination socket. We can simulate something like this using send file which basically operates on two file descriptors so if you want to save a send file using AFK TLS, you can do it or any static content that sits on your disk. If we look at the implementation of send file, it is implemented on top of supplies that is another syscall that operates on a file descriptor or socket and uses some intermediate structure that carries basically information about memory that is allocated inside the kernel. So in the kernel space, but you point to this memory from user space. Such structure is called pipe. On the other hand, we didn't save two context switches but on the other hand, we saved two copies. So we are no longer required to copy data from user space to kernel space and with the versa just to do the encryption. When I was implementing AFK TLS, there were issues with padding, for example, that also affects the optimization. If we look at the TLS record, you can see that there are some header and then there are three bytes that are not used so everything that follows is basically shifted by three bytes. This has a negative impact because you cannot access to aligned memory in the kernel. If we compare it to, for example, IPsec, IPsec has eight bytes and then they follow eight bytes just for reserved. They are just reserved. If we look at the DTLS record, you can see that the sequence number is carried within the DTLS record bar, we still have three bytes. If we look at the optimization results, we have benchmarks of user send that means you use GNU TLS send or OpenSSL send. So you do encryption in the user space and you pass the encrypted records to the kernel. In this particular scenario, the send file operation was faster by 10% in average. We also benchmark it using Mmap. So we mapped the whole file into the memory in user space. We encrypted it and send it using user send. If we look at the AFK TLS usages, there are various. One is OpenConnect as discussed in this presentation. The main disadvantage why this implementation was not used is because the tuned up device that is used by OpenConnect does not support splice operation. You can use AFK TLS with KCM that is the kernel multiplexer. AFK TLS has also advantages because you can access raw data right inside your kernel. You can do, for example, Linux socket filtering or Barclay socket filtering if you want. You can also use, for example, EDPF and another cool feature is, for example, BCC implementation where you can plug your function right behind some function that is called in the kernel or after it and you write basically Python code. Then the EDPF checker checks your code and the code is executed in the kernel. Another very nice usage is basically CPU offloading where you don't do the encryption and decryption on your CPU, but you have specialized hardware that deals with this. And there are possible improvements in this area. So you, for example, pass multiple TLS records and multiple IV vectors and you do the decryption like in burst mode. Currently the crypto is not, does not support such but there's an opportunity to do that and to use AFK TLS with such use cases. TLS in the kernel is not novel. There are implementations. For example, Solaris has its key KTLS. It's not only the record layer but it also carries the implementation of the control layer. So it implements wall TLS and DTLS right inside the kernel. This implementation was not successful because you have to maintain more than just IP address and ports, you have to deal with certificates and such. And it works like a proxy between two ports. So for example, you can send encrypted communication to four, four, three ports and read it from 80 ports in a row. For example, Netflix experimented with something similar as AFK TLS. The implementation is not publicly available. They modified BSD kernel and they adopted SSL to use the send file operation and they use it according to articles that they published. They use it for serving static content. We at Red Hat, we experimented with AFK TLS. Facebook joined us, they send us cool patches and they are experimenting with it as well. You can find AFK TLS on GitHub and feel free to use it, it's open. You can plug it to your kernel. You have to recompile your kernel and patch the ASJCM implementation to be used with AFK TLS. And you have to also add the socket type because you cannot just plug a new socket or module that implements new socket type into the kernel. So you have to manually add to the list of supported sockets to your kernel module. So thank you. And if you have any questions, feel free to ask. Any questions? We have plenty of time. Thank you for the presentation. Do you have any idea of when this will be included in normal Linux kernel? Obvious question, maybe. We have still open issues, like for example, DTLS window has some bugs right now and we haven't proposed the merge request yet. But hopefully we get it someday. There are also, Facebook is experimenting with having only one socket. So you don't need to have the socket for TCP and AFK TLS, but they want to merge it in some way and hopefully one day. I guess your client software, which does the handshake, also has to be patched, right? No? There's available API. So if you are using, for example, GNU TLS, you just ask after the handshake for all the key materials. So you pass basically keys and IV vector to the kernel for sending and receiving and that's it. And how does this communication work between user space and kernel? You have to maintain this in your code by your own. There's no support of AFK TLS in GNU TLS. When you are using, for example, GNU TLS, you do the negotiation and then you ask for the key material. You instantiate AFK TLS and you pass all the key material to AFK TLS and it transparently does everything for you. Thanks. But there's a lot of hard questions that TLS clients have to sort of answer when they want to establish a connection. So I suppose there needs to be some way of communicating with the kernel's TLS implementation. So what's the sort of strategy there? Do you use Yachtels to accept sort of requests or to sort of leak information to the user space back so that the user space can make decisions? I'm not sure if I follow the question. Can you rephrase it? So when you establish a TLS connections, you may want, as a client, you may want to react upon the behavior of the server. You may want to be informed of certain things that the server does and you, in that case, you need to, well, pass some information to the TLS library, which in this case seems to be in the kernel. So you need to have some way to communicate about the channel with the connection establishing user space. So what's the strategy there? Once you instantiate properly the session, you have all the key and you pass the key material to FKTLS, you can use the record layer just to send and receive data and you get unencrypted data in user space. The only issue here is when you want to, for example, when your opponent wants to negotiate new keys, then the kernel has to know some way how to tell, hey, user space, I have to negotiate new keys. So we return some kind of value that is the error state and then you have to ask kernel for the key material in order to proceed with, for example, negotiation of new keys. And then, again, you have to use OpenSSL or GNU TLS for that. And if you want to use FKTLS again, then you have to ask the information from GNU TLS or OpenSSL. Okay, but how would I, for example, validate the X509 certificate if any was sent? I mean, because that's probably, I want to do that in user space nonetheless, right? Because the application wants to have tight control of where it's connecting to. So in that case, you probably need to switch back. You need to inform me as the client, sort of as the user space about, in this case, the X509 identity. And then I need to tell the kernel back whether I want to continue with the connection or not. So X509 is done in user space. So user space handles this. And once you instantiate the connection, everything, you already know that the opponent is correct in this way. So we do only symmetric encryption in the kernel, in AFKTLS. Ah, okay. Right, so the handshaking is not done, but... No, no, no. Ah, okay, sorry. Okay, good. Any other questions? Okay, so my question is, you said you have roughly 10% performance benefit, right? Did you look into what is the major issue, why it's not more than that, because the encryption itself takes so much resources already. I mean, it's like a trade of other regarding complexity, right? Basically, AFKTLS was my master thesis. You can find some benchmarks here. And we benchmark it on Intel architecture. And we, the first idea was, okay, maybe the context switch is quite expensive, but we found out that the context switch is not that expensive. And the most time is spent on encryption and decryption itself. So you would probably want to use, for example, CPU off-loading. So you have dedicated device just for encryption and decryption. And you have free CPU. Okay, I had a question. I was just curious, when you use memory maps, I mean, don't you still have to copy inside the kernel the data to some other buffer, because otherwise you can still write into it from user space while the kernel works on it? With M-Map, you do copying. So that was basically some kind of way how we could optimize or compare it with send file. It seems that the objective is to reduce the context switch and copy. It seems to be variable only in certain use case, like a front-end that is forwarding its packets to other servers. And in this case, how is managed the forwarding to be included in staying into the kernel without copy? Is it part of this module or is it some other mechanism that I don't know? If you do something like forwarding between two sockets, you can do these supplies optimization when you, for example, receive encrypted communication on one socket, and you don't copy the content to user space, but you issue two supplies calls and you don't have raw data in user space. So that was the main optimization for OpenConnect, and it's also applicable for HA proxy. So you have to, the logic of basically forwarding, you have to keep it in your application. Okay, let's thank Fridolin again.