 A big round of applause for our technical support guys here. Sorry we're getting started just a few minutes late, but that's OK. People are coming back from lunch anyway. Linux cryptography or cryptographic acceleration on an IMX6. If you are not here for that session, you're in the wrong room. My name is Sean Hudson. I'm an embedded Linux architect at Mentor Graphics. My vanity slide here. I've been working with embedded stuff for a while now. A particular note is in the open source. I've been involved with the Octo Project for about seven years. And I've been involved with Open Embedded as a member of the board for how many years now? Four, something like that. Something like that. So one thing I have to make sure and do is I have to thank somebody who is not here. A co-worker of mine named Wade Farnsworth. He actually did a lot of the grunt work related to this and the data gathering and this kind of stuff. But since he wasn't unable to be here, I'm going to be the one that's presenting on it. With that in mind, my outline is a little bit messed up. I had planned to redo it and I ran out of time. So I'm going to actually blow through this real fast. Verbally, basically I'm going to go through real quickly some basics of cryptography. I'm going to talk about some of the ways that we get access to hardware acceleration through the kernel and specifically what we did with the IMX6 to make that happen. So some of these things are still applicable but they're a little out of order. I know everybody's sleepy after lunch. I tend to thrive best when I have people who are engaged in the presentation. So I will pause frequently to make sure everybody's still awake. And I will try and crack jokes, but they're engineer jokes. So you take that for what it's worth. So this is the best kind. So this presentation is basically geared towards some pretty specific things that I want to call them out. It's about using cryptographic hardware. The acceleration to that provides from user space. And the reason why that's important, hopefully will become evident, but it has to do with the fact that I'm not going to spend a whole lot of time on kernel internals. I'm not going to spend a whole lot of time on drivers. I'm going to talk about putting together this whole stack so that from user space you can actually make use of this hardware. This focus really came from we were doing some pilot work to figure out if we could accelerate some cryptographic work on the IMX6. So that's the genesis of this talk. A lot of this is generally applicable because of the fact that we're trying to go from user space all the way down to hardware. So I was asked to try and make sure that I cover this in a fairly general way, but it does come primarily from the IMX6. So I'm going to try and walk that line. So first off, I don't know how much experience there is with cryptography in general here. I wanted to make sure that we're clear on some basics. Cryptography is pretty simple, at least in principle. And this is my own wording here. You're trying to take some piece of information, a message. You're trying to send it from one point to another and make sure that a guy in the middle can't understand what that message is or certainly can't understand it in a reasonable amount of time. You define what reasonable is. So there's a lot of fluidity in that definition. But that's it in simplistic terms. Moving beyond that, just in general, most cryptographic solutions, in particular most cryptographic algorithms, rely on an asymmetry in terms of the complexity. This is basically critical, both in terms of the way that we ensure that that reasonable time doesn't occur, but also in terms of what the impact is for us. So what this essentially means is that we have brute force attacks, brute force attacks are where you're basically trying everything in the key space in order to decrypt the message that was sent. So it shouldn't be feasible to take a brute force approach on any algorithm that, again, will result in a reasonable amount of time before you can find a specific solution. Now, reasonable also moves over time because of the capacity of our hardware to actually calculate keys and try things out. And in fact, and I talk about it in terms of cheapness here, encryption and decryption should be relatively cheap in terms of computation, in terms of time. Right, so basically you wanna take, when you pass this through a transform, you wanna have the information very difficult to actually rip apart using brute force, but you still want it to be reasonable to be able to produce that cyber attacks. If you take 10 years to convert a document into a cyber text, then it's useless to you because the applicability of the data is gone. So, and this goes back to what you were saying, the, or I think you were saying, the as this model continues to expand, that means that the computational needs for both encryption and decryption also go up. Yeah, we wanna make sure that it's very difficult here, but in order to make it reasonably difficult here, we still have to put forth some effort in terms of the encryption and decryption. This is all pretty basic stuff. Again, I just wanna cover it real quick at a high level because it's gonna inform what we do later. A little bit more about this. So the basics of cryptography, as I said, are pretty simple, but what we begin to do with basic encryption is pretty cool, at least in my opinion. You've got two major pieces that most basic encryption requires. You've got a strong algorithm, something like AES, and then you've got a strong key that comes out of a pretty large key space that goes to the whole brute force attack thing. So if once we have those in place, the basic encryption is gonna enable us to do things like tamper detection. It is very basic. We're looking to see if somebody modified a message. We use this more for hashing than anything else, but it also is used in boot processes and things like that. Secure storage, of course, that's where you're just taking the information and keeping it secure. This one's a fun one, too. Now we can actually take keys. Initially, ciphers were just passed. This is the key, and that limits its applicability, but once you have some of these algorithms in place, now you can do some key exchange in a secure way, knowing that somebody in the middle can't intercept them. So all of these systems start being built up on top of just the basic core encryption. Secure identification authorization. This becomes real important for IoT. If you've got thousands of these little nodes floating around in your home, you need to know that the actual endpoint that you're talking to, gateway that you're talking to, is the one that you're supposed to be talking to. If you're receiving an update, you want to make sure, and that goes back to tamper detection and some of the secure storage, you want to make sure that you know for sure that you're talking to the right person and to the right end node, I say person, right computing device. Like many engineers, I tend to confuse people in computers. So also secure execution. This goes back to what I was talking about as far as having the secure boot process where you know you're running an image that is what you think it was. That goes back to also tamper detection. So this is kind of the basics. This one is one that I have to say over and over. It surprises me how many people, certainly late people don't get this, but even a fair number of engineers don't seem to get this. You can take an absolutely strong algorithm, apply it in the wrong way and have zero security. So just because you have a strong cryptographic solution in terms of an algorithm, which we tend to overlay that, it does not mean you have strong security. So don't think that just because you applied a strong algorithm that you're safe. We all know about how secure Wi-Fi is and some of the initial standards around that. And WEP was a victim of not a weak algorithm, it actually had a pretty decent strong algorithm. It was just misapplied. And the key space was actually very poorly defined. So moving on then. So we were talking about cryptographic acceleration. I said that computational complexity is driving us to look for hardware acceleration. There's a couple of different classifications. If you go out there and you look for these, you're gonna find a lot of, there's different types of hardware due to cryptographic acceleration. This one is one that's pretty easy. I use the example of smart cards, standalone device, it takes some sort of information in, does a transformation inside, and then what it spits back out is plain text again. So it usually is used in kind of like the smart card world where you're identifying a particular device, where you're identifying a particular person, in this case, it's carrying it. There's also this used just in the middle of other systems. It's not as interesting for us, but it's out there. One of the more interesting ones that has been around for a while are these instruction set extensions. So you take an existing CPU and you add some extensions to their instruction set, and now built on top of that CPU, you suddenly have these cryptographic operations that can happen very fast. Because these are built in the CPU, it's gonna change the way that we access them. I'll talk about that a little bit more in a minute. The primary area for interest of this presentation though is in separate co-processors. There's a lot of different ways that these things can connect. You can actually have a completely separate processor. It's got a different die or a different package and it's connected via an external bus. Some examples of this could also be something like a PCI card that's sitting in your bus and a server that just offloads some of these cryptographic operations. An example of this type of module is something called a Trusted Computing Module. The Trusted Computing Module is actually, it's a standard, it's not an actual implementation. So when we talk about a TPM, there's different implementations of this, but it's a standard that defines a separate processor that does all of these types of things. It goes well beyond just cryptographic accelerations. So I'm gonna gloss over that. But the TCG actually manages that standard. This is primarily found in x86 platforms. I have heard that there are some TPM modules available in ARM, but I didn't. Yeah, and that's the general feedback that I've seen. I'm sorry? You know, I'm not gonna talk about TrussZone. The question was, am I gonna cover ARM TrussZone? That's kind of a slightly different animal. It does work well with this, and the IMX6 does actually have TrussZone, but it's beyond the scope of what I was covering here. And this is the example I was talking about as far as offload. This one's the one down here at the bottom that is the most interesting for me for this presentation because of the fact that the IMX6 has a crypto block built into the SOC. So it's on the diets inside the package. Yes, it's technically got a little bust in there, but for all practical purposes, we consider it to be a single device. So let's take a look at that. So the NXP IMX6, it's a mouthful here, the SOC has a cryptographic acceleration and deterrence module called a CAM, and it provides a few things. It's gonna give you hardware implementation of cryptographic functions. That's the one that we're most interested in for this. It does include several ciphers. I'm not gonna list them all. You can go look at a technical reference manual if you're curious. It also provides secure memory. Again, these are functions, including secure key module and cryptographic authentication. I'm not really going to touch much on them. They're important. They're certainly part of your security system, but it's not what I was focused on with this presentation. I would encourage you to go and figure out what these things are if you don't know. One last thing on that. The random number generator, whoops. The random number generator is kind of adventurous because that also affects how you can generate keys. They're just found out something really cool, which I'm gonna throw in here, which is an external device. Do you remember what it was called? Was it you that was? No, it was BN. The chaos key. So it's basically an external device for... Yeah, it generates entropy that you can then use to generate random numbers. And this becomes real important when you're trying to generate true random keys that you can use, especially for transactional exchange of other asymmetric keys. So if you don't have a good source of randomness, which that was the downfall of WEP, then you're liable to be able to be brute-forced because you've cut the key space down so small that you can attack it effectively. And if I'm babbling, we'll just move on. You don't understand what I'm saying because it's not that important to this presentation. So this is more important. There's a basic enablement that you need to do. This is pretty much just to show you and forget. This is what you need to do to enable the CAM on your kernel. There's a lot of different options here. Some of these you can kind of pick out, random number generator, hashing functions and things like this. Essentially for our purposes, this is just what you need to enable, build your kernel, and then you've got CAM support. Is this upstream? Yes. And this is, this was as of the 4.1 kernel. So this is already, what, nine kernels back? I can't say with any certainty what has improved or regressed since then. So this work? Yeah. Well, I mean, basically enabling this is a building block that we use this kernel and we use the CAM function now at A to accelerate. We didn't know how to implement this. This was done by FreeScale. We may have helped them, because Mentor has a pretty deep relationship with FreeScale, but to my knowledge, this was largely done internally, to FreeScale. So again, the whole point of this presentation is to how do we get cryptographic acceleration in user space? So we now have at that bottom level, we've got our hardware, but how do we get to it? So let's talk first about some of the different implementations. There's a great presentation by Merrick out there a couple of years ago that talks very similar to this. So I'm not gonna spend a lot of time on these, but this is how you can get cryptography or cryptographic operations, not really acceleration, in your user space. The first one is just a pure software implementation. You literally just take CPU instructions, you execute it like you would any other program. There's some really good things to this. It's very portable, especially if you're right in the sea. And it supports an arbitrary set of algorithms. You can tweak it however you want, it's very modifiable. The usual stuff that comes with the software implementation. Unfortunately, it does have a cost. It's now gonna contest on your CPU, so you're gonna take a lot of cycles. And I didn't put this up here, but one sort of benefit to this is that it's all inside the CPU, so you don't have a lot of bus traffic. Yes, you're gonna have memory access and things like that, but you're not going out across that external bus, for instance. One thing that is also drawback is a lot of times the CPU's general purpose instruction set is not really optimized for this. So it may not do this real efficiently or as efficiently as we would like. So how about we talk about CPU instruction extensions? So there's a little bit of a different look at it when you have a CPU that now actually does some of these hardware things for you. The thing about this though is while it makes use of hardware acceleration, and it doesn't involve the kernel, which is important, you're limited to the set of algorithms that are supported. So in the case of like the Intel, the Intel extensions, I want to say. AES and I instead of EES extensions. Yeah, I think it's only. The IE version of chipset. Right, I think it's only 11 instructions and then primarily it's AES. AES. So it's like great for that, but if you don't want to use AES, you're kind of going back to software or something else. Yeah, and I don't even touch on that kind of thing because that blows all sorts of things out of the water. One thing just historically about AES. Yeah, I mean, when you have that kind of computing capability, but AES was actually designed from the get go in order to be hardware efficient. So it is a block cipher that from the start, that was one of the requirements. And so that's the reason why you see it as very popular as far as being implemented into hardware. So these are great if you happen to have a CPU with these and you happen to need that particular algorithm that works great for you. Software, as we said, has some great benefit to it, but also has some drawbacks. But what do you do the rest of the time? Maybe you have a hardware accelerator, like you have the CAM module, the IMX6, and you want to make use of it, but you also don't want to have to write everything yourself. So that's where the kernel APIs from user space come in. And I'm gonna come back to that. No, I'm not, I mean, that is something that is very important. The question was if I was gonna be talking about key protection techniques, and particularly in a software context, is that right? So in a software implementation, there's so much to security. I was being very facetious when I said it was very simple. Yeah, it's very simple in principle. When you apply all these layers, it becomes very complex. Key protection is incredibly important also from a human aspect. So, I mean, there's a lot of this that I'm not gonna touch on. What I'm... It's a secure story, so it is a problem. Yeah, and how do you make sure that that's secure in transit? And there's all sorts of different principles to this. So I'm not a security expert, and I don't pretend to be. Very few people can really claim that well. He's probably one of them. So my focus was really, how do we actually make this work? Again, from user space, it's just a general program that wants to start accelerating some of these operations. So, let's dig in a little bit into the crypto APIs in the kernel. Little bit of history in 2545, according to what I found, the kernel basically had a cryptography, a cryptographic framework internally. This was brought in in order to do things like accelerate IPsec, and to do the calculation in a uniform way. That's great, but that's all inside the kernel. We are wanting to get from user space down. So this is where we start looking at user space APIs. The kernel provides two APIs. There's CryptoDev1, which I'll talk a little bit more about, these individually, and then there's AFL. Okay, you're right. It's not internal to the kernel, but I'll talk about that as well. So being a user space interface is technically not, I sure did. All right, IOU and Mint. As with all user space APIs, these are gonna provide hardware abstraction, and they're gonna allow your program to run independently of whatever the actual underlying hardware is. So let's look at CryptoDev that I misspelled there too. No, all right, good. So this is basically an implementation that, in Linux, that is compatible with the OpenBSD cryptographic framework, OCF. It is not binary compatible, it's not the same, so it is a separate implementation, but it does make the effort to be compatible. It creates this Dev Crypto device, and then it's managed just like you would in the other device, so you use standard Ioctals to interface to it. And this is still pretty basic stuff, right? It's pretty low level. You're not getting a whole lot of high level functionality here. So I'm gonna walk through enabling this. No, no, no, it's fine. It's a good point, and I have a diagram that I'm gonna show at the end that kinda demonstrates this, but the comment was that CryptoDev provides really another abstraction that, but does not provide an actual implementation. So in the CryptoDev layer, you don't have a set of algorithms that are implemented. It is an interface layer that gets you either to the kernel implementation or to hardware, which is a good point, and it comes out a little bit more clearly in a diagram. I got a little bit later. So enabling this is, as he said, this is an out of kernel module. I think that should be out of tree kernel module. And so it has to be compiled separately. Now, for those of you that paid attention to my little bio thing up there, I am a YoctoProject guy. I am an open embedded guy. So I'm gonna do a little hand waving here because for us in Pocky, this is as simple as adding this. A little designator in your build. Now, if you wanna go and build this on your own, that's gonna involve assembling a cross-compile tool chain and making sure that you download the module and everything else beyond the scope of what I'm gonna talk about here. You need the recipe for that? Yeah, you're bragging in my presentation. So Dennis did that implementation of the recipe. Anyhow, so this enables it. Once you've done that, you've got the module and we'll move on from there. AFL, let's talk a little bit about that one. So it uses a socket interface to connect with a kernel. So it's literally a socket type. This is inside the mainline Linux. So it doesn't actually require that external module compile, but it does also require additional kernel configuration. So what's the kernel configuration? And these right here enable it. You can actually enable both. I haven't had any problem with doing that, at least not on this kernel. So if you wanna play with both, you can. So let's talk a little bit again. At this point, we're still building up the stack, right? We've got the hardware at the bottom, we've got the kernel and we've also now got a way to talk through the kernel to that hardware, but we still are doing a lot of the heavy lifting ourselves at this point. You can implement the algorithm directly or talk directly to the hardware to have it implement, do some of the calculations, but you're having to manage a lot of things. Instead, we chose to use OpenSSL. So this has a lot of benefit, especially when you want to abstract yourself from potentially situations where you have the AENI or you have the instructions that are available to you to directly do the implementation or you have hardware in different cases. So now you can write an application that doesn't really care, it just wants to get to some form of hardware acceleration and SSL will fail over to software and that becomes important when we do our comparison. So as always, there's a few steps and I'll cover that in a minute. Here's the pretty picture. By the way, I lifted this from somebody's presentation on the slide share. I thought this was exactly what I needed, so I'm giving them attribution, but this is essentially the pathways. Now it's a little bit blown up, so it may not be completely clear, but this is where your application is living and that's where we want to spend most of our time. So we're using OpenSSL either via crypto dev through the crypto dev Linux down into the core and on either to the hardware or out to the kernel implementation or we go through AFLG and we're gonna do the same thing. So we end up pretty much in the same place. Either way, OpenSSL abstracts that for us in a nice way, allows us to really focus in on our application. So this goes to your point earlier, Dennis. Crypto dev here is not a software algorithm. It is just an interface and AFLG is also not a set of algorithms. It's just an interface to get the data into the kernel in a clean way and get access to the hardware. Have I put everybody to sleep yet? Still doing all right? All right. So since we decided we were gonna use OpenSSL, what does it look like when we add crypto dev? Well, in order to add crypto dev support, you have to enable a couple of options in OpenSSL. This is what we were talking about last night, Dennis. You have to add this right here. This is 1.01, I think, or I have to look at the version number. You also have to add in this header. Once you have done that, you're good. And then once again, and I think this is your work, when you enable this recipe, it's gonna pull these pieces in for you and it'll be magic, it'll just work. That's what I was saying, I have to go and take a look. I don't know if I actually pulled it out in here. I think it may be one of the logs. So I gotta go look and see which one and where you'd actually select it. You know, it's very important as a module that you actually have it inserted first or else it's not gonna do anything. I think this is somewhat obvious, but you should also be able to check that it's there because you'll see something like this. When you do an OpenSSL engine and it says crypto dev. Yes. The question. The question is whether or not crypto dev supports ITC devices, for instance, key management devices. I don't know. I don't see any reason why it couldn't be added. The fact is crypto dev is just that interface piece, right? So if you have a device driver down at the bottom level, then you might have to write some glue in the kernel to get to it. But I mean, do you happen to know, Mike? No, I don't. Okay. So my answer is probably it can be done. I don't know if it has been. Okay. Yeah, I would say. Oh, you would have seen that. Right. Well, that's exactly what OpenSSL and the AF-Alg and crypto dev are doing. What time am I done? How much time do I got? Okay. So I should probably pick it. I don't have too many. Thank you for the question. I don't think it's there yet, but I don't think it should be a heavy lift to add that support in. So let's take a look at what would be required to add AF-Alg. So here's the answer. OpenSSL 1.02, it did not support it natively, okay? However, there are plugins available that will provide the support. OpenSSL is an extensible framework. So we actually use a plugin that we found here. Basically, it has been advertised that native support for AF-Alg will be available with OpenSSL 1.10, that was the number that stuck in my hand. So hopefully this gets edged out pretty fast. We always have a leading edge problem where we're chasing that. So I don't know how far off we are from 1.10 to tell you the truth. So I'm gonna do a massive hand wave at this. You can build a plugin, basically follow the directions in there. There's no point to me telling you exactly what to do there. Standard rules apply on that kind of thing. You do wanna place this into user live engines once you have that AF-Alg plugin library. You also need to add to your configuration in OpenSSL. These are the magic lines that do it, in particular AF-Alg, AF-Alg Engine, and then you give some of the Cypher choices here that AF-Alg will support, including Digest. Most of this, when you lay it up here, kind of makes some intuitive sense, I hope. Any questions on that? Vanilla Opus, the question is, what does OpenSSL do if it doesn't have these by default? It uses a software implementation. And that, like I said, will become something that we will look at when we look at performance. Yes, sir? It's a good question. I kind of avoided making a recommendation. If you look at Merrick's presentation, he actually comes down very clearly on AF-Alg is the way of the future. I'll tell you, when we get to my data, you'll see why I kind of hedge a little bit here because I suspect that some of your particular performance is gonna depend on your hardware, the specific kernel and things like that, and we'll see why. So let's take a look at the actual comparing of performance. So we've got our stack now, we've got our OpenSSL app at the top, I say app, we've got our OpenSSL at the top, we've got the different layers below it. We're gonna actually use the OpenSSL speed command to help us to measure. But of course you can use your application sitting on top of OpenSSL very cleanly. We use the elapsed argument so that we can get some idea against wall clock time rather than CPU time. That was just a choice that we made. Now this goes to your question. What happens if you don't have the CryptoDev module inserted? What happens if you don't have AFAOg enabled? It's gonna default to a software implementation. So this is the command we ran. I couldn't fit everything in here so I cut a whole chunk out of the middle. This is a lot of data to try and parse. But what you see here is basically we ran an AES128CBC, which, thank you. The joke was you could always just say blockchain. Anyhow, so we ran it for three seconds on each of these and these are the number of blocks that it actually managed to encrypt using the cipher or the block of this size and using that cipher. So you can see there's a whole string of these and then it talks about kilobytes per second. It's a lot of data. I have a little table at the end which hopefully I'll surprise you with. So this is just the software implementation. So let's try it again with CryptoDev. So again, this is using OpenSSL. In this case, we passed the CryptoDev engine on the command line, so OpenSSL speed, engine CryptoDev. This is our nice indicator that CryptoDev is running. Once again, we see these nice lines showing the different amount of blocks and I'll summarize these at the end. We decided that we really wanted to confirm that this was being used so this little, looking at the interrupts gives you an indication because you start seeing cam interrupts there. That's just a sanity check. Make sure you do what you think you're doing. So once again, for the AFL case, we just look at engine, oh darn it, that's a cut and paste error. Sorry. It should say engine AFL, which is what this command actually says. So ignore that part. Two typos, I guess I'm doing all right. And you get the confirmation here and then same kind of output. There's a whole bunch of garbage in here that, it's not garbage. There's a whole bunch of information in there that includes things like your tool chain and some of the configuration options and stuff that wasn't really necessary for us. So I'm gonna let this sit for a second. Anybody notice anything? So when we got this, we were really puzzled. We're like, okay, we were searching for hardware acceleration and we got hardware acceleration and it was slower. And it's not slower in one case, it's slower in all cases. There's only one that I highlighted here that was sort of anomalous because these two were very close together, but as it turns out, pretty much it's consistent. Regardless of the block size, you end up with software being faster. Anyone wanna hazard a guess? The answer from the crowd was kernel interface overhead. And I'm sorry? Bus speed is another one. What we found was that the number of interrupts that were occurring and context switches that were occurring skyrockets when you start using crypto dev or AFL and so kernel interface overhead. Basically you see so many context switches occurring that the performance actually goes down. So, this is a quad core. We did not do extensive profiling in this and this is not, I didn't actually put in the disclaimer here but this was not intent. I have it at the end. This was not an exhaustive search, this is not a scientific search. No, this is, and so when we did, yes sir, I can't hear you, I'm sorry. So, this goes back to what I was saying. We did not do an exhaustive profiling to see what was traversing buses or things like this. This was just the block size that we fed into this. This was just the block size that we fed in and there was a question up front here from Dennis about larger block sizes. We didn't go, I don't think we collected the data but it remained consistent up until I think 32K bite. And I think it's right around that point that there's inflection. I kind of like this one because it was nice and tidy. Yeah, it's gonna depend on your hardware. So, the point here is that especially when you have smaller block sizes, the transfer is very, very expensive. When you start getting up higher, you notice that the delta is dropping and I didn't really extend this table out. We didn't do an exhaustive block search, but right. And that's what I recall and I should have captured that but I didn't. As you get larger and larger, you're getting more efficiencies in terms of transfers so the overhead in the kernel is less. So, you do begin to see at some point in time usually there will occur a flip over where the hardware acceleration actually now does make enough of a difference. One of the other things and it's a conclusion that we want to get to and then we'll continue with the heckling from up front is so we were very puzzled by this. One of the other things that we found which was nice and you can't really see it in that data was that the CPU utilization did go down. So, the offload processing was occurring. There was still however a lot of overhead involved with doing so. It points to places for optimization. If you can reduce the amount of overhead, you're gonna get better CPU utilization. You're also going to get faster operations. But we didn't quantify that and we should. We, like I said, we just expected to see a set of operations that went the other way. So, when we discovered this, we kind of fell on the floor and we scratched our heads for a little while and then that was part of the reason why I decided that I think it was worth. The, the, the, all right. And so for our purposes, this was good enough. But I think that one of the things that, and this is sort of my blanket, you know, this was not intended as an exhaustive analysis. This was really intended to kind of promote the idea that it's not a slam dunk. You know, just because you have hardware down there, you have enough things going on in the middle. You need to make sure that you profile it and understand what it is that your particular use case is. So make sure and run your own tests. If you can pass very large block sizes, you may get enough benefit just as is to not have to worry about it. But there's certainly opportunities for optimization that exist. And also, because of the fact that you are definitely freeing the CPU a little bit for offloading, you may want to do this, even though you know it's maybe a little bit slower because now you're regaining some of your CPU budget for other operations. So that was pretty much the sum of it at this point. And we have questions. I don't know. That's a good question. I don't know. Yeah, I think that's what he was referring to. Yeah, I mean, that would make a lot of sense. So the question was if there's a way to configure OpenSSL so that it will use a different algorithm depending on the block size. To my knowledge, that doesn't exist, but I don't think that would be difficult. I think it's such a running on a map. Yeah, it's hard to disentangle the specific hardware effect of multi-threading and the actual performance change. So yeah, cache misses is something that, everything you read on this says, make sure that you have page line memory, which is another big one. All of these types of things are gonna factor in. So it is a free-scale kernel. Okay, so an Mx6U of LSE about a four times speed. I'm sorry? An Mx6U? Yeah, this was not on a UL. This was just on the stock Mx6. It was actually a Saber-Lite clone, Ultralight. So yeah, I know that there's been a lot of improvements and like I said, this is a four one kernel that I ran against with a specific version of SSL. Things may have moved on quite significantly. I know that they continue to scrutinize it. Okay, that's good to know. I didn't know that. It's real sensitive to quite a few different variables, which is why I keep emphasizing, make sure that you know your particular use case and try it out. You may find that it performs just fine for what you need. Any other questions? Yes, sir? Well, I was gonna say it depends. There's actually two different ones. So yeah, so there's implementations of some algorithms inside the kernel itself, inside the crypto framework. And there's also an implementation just pure software inside OpenSSL. OpenSSL, if you're using it, I don't think there's any pathological case where it detects that it doesn't have crypto, dev, or AFLG and it tries to pass it through the kernel anyway. I was gonna say, I don't think there's any way for it to do that. So yeah, I mean, but there are separate implementations in those two spaces. The reason why the OpenSSL one in software seems to do so well, I think, is because it doesn't have all that kernel overhead, because it's all CPU instructions and everything else, it has those nice things that I highlighted before about a software-only implementation. Right. The comment from the front here is, if you do run this on a single core, your core is gone, right? You probably consume the whole thing. I didn't actually show some of the capture in terms of percentage of CPU utilization, but it'll try and rail it. So it'll consume however many cores you wanna throw at it in general. Yeah, so the observation was that anytime you're computationally bound, you wanna offload as much as possible. And yeah, I mean, those are best practices. Yes? Not as good as throughput, but it's very reliable. Right. So it's what you need. Right, if you- It's not being fast enough. Yeah, if you need something that's on a hard real-time deadline, then maybe it makes more sense. The observation was jitter versus throughput, and those two don't always line up. I think we are done. Thank you very much.