 Hello, I'm Greg Melissa, Principal Engineer at TimeSys Corporation. This may not be Jensen's kitchen, but welcome to my office. Thank you for joining me as we discussed using Opti as a cryptography engine in your embedded application. So today, we're going to give you a little bit of background on Opti. I'm kind of assuming that most of you are familiar with it anyway, but we'll just have a quick refresher on what it is and why. And then we're going to talk about how we might add a hardware crypto provider to Opti in order to provide hardware random number generation and to do hardware crypto acceleration. Then we're going to talk about how to actually use all of those crypto routines from Linux so that we can leverage Opti's capabilities. So as you may have heard, the Internet of Things is here. I mean, I guess it's here every year, but it's definitely here now. Everything around us is connected, you know, all kinds of new gadgets in our homes, have internet connectivity, our industrial systems, which really have always been networked, are now available remotely. You know, you can just pull up Telnet, log into your local power plant, maybe. Our medical systems are connected, our cars are connected. There's even been high-profile examples of a car getting hijacked through the 4G connection because there was not a proper isolation between the infotainment bus and the car actual drive control bus. So in all of these cases, we need to store secret or immutable data on these platforms and we need to secure that data so that they can't be taken over, they can't be impersonated and they can't be commanded to behave incorrectly. And we might want to try to do that by adding some security modules to our system. So the easiest way to store things security is to add a hardware element like we might have an HSM that we integrate with our device or we might have a TPM that we integrate with our device or we might have something even more cost effective like a little secure element like an Atex 608 from Microchip. Just put it on the PCB. You can offload your crypto storage too. You can offload your crypto routine so everything works great. However, in our case, we would really rather focus on software solutions or things that are dedicated inside the SOC in order to try to minimize system cost. So what kind of features does a typical SOC have that would enable us to perform these security operations without needing an external secure element? Well, a lot of modern SOCs features some sort of tamper detection and tamper resistance. And as part of that, they often incorporate secure one time programmable memory. Now, there's usually not a lot of this memory, but it's definitely enough to store an RSA key or a couple of keys at minimum. Some of them have some more advanced features such as transparent DRAM detection. And basically what that means is that when you're looking at memory, it pages out to DRAM automatically. And when it gets brought on die and moved into SRAM for local access, that's when the memory is decrypted. And typically the hardware handles this transparently so that you do a little bit of configuration and set up at the beginning, but you often don't have a massive amount of extra work in order to make it actually work for every application in your system. And then, of course, the hardware accelerators, which used to be previously only accessible off-die, are now often available as part of the SOC as well, which makes it make a lot more sense to try to just keep everything in-house. Then apart from SOC features, we, of course, need to have architectural support. And there, since we're using ARM processors, we're in luck because ARM trust zone has been a part of the processor design for a very long time. And the key feature of trust zone that we want to take advantage of is the ability to separate the processor execution into a secure state and a non-secure state, and then depending on which state we're in, isolate different like regions of memory or different peripherals so that they're only available in a secure state, for example. So what is opti and how does it fit in that picture? Well, opti is an open source implementation of the T standard, which stands for Trusted Execution Environments. And in particular, opti slides in underneath the Linux kernel to provide us a way to manage our secure hardware without relying on Linux. Obviously, we're all familiar with the idea of user space applications being untrustworthy. I mean, we're going to write our own application and hopefully we're going to do a good job. But we might have bugs. We might use third-party applications that have bugs. And ultimately, these are going to probably turn into security vulnerabilities eventually. So we're pretty used to say, OK, we don't trust user space. We have to have secure things that are managed by the kernel and they're restricted from access. However, the next step of this, obviously, when you look at the number of CVEs that come out against the Linux kernel every year is what if we don't trust the Linux kernel either? I mean, maybe the core kernel is good, but we have some random sketchy driver or we have a module that we found on Google that was last updated in 2014. But it's the only thing that supports our hardware. So we really need it for our system. Well, we can't trust these things, obviously. And if somehow that module gets approved and you're actually incorporated into your system, then your Linux kernel could get compromised by an attacker pretty easily as well. So what we want to do then is move all of our secrets to yet another location that is even harder to compromise. And in particular, we're going to leverage the trust zone hardware features to enforce isolation between the kernel and Opti and therefore keep our secret data secret. So why do we want to use a T for this instead of maybe running it ourselves? I mean, we could obviously imagine writing a small micro kernel and stuff to get in ourselves. Well, T is great for this because there is an industry acceptance specification as the global platform T specification. So if you write a trusted application that runs on this T and uses the T API, then it'll run on any T. So, you know, we are able to move to another platform or whatever down the line as things change. Currently, Linux supports Opti and AMD T as T providers. However, AMD T, of course, is only available on X86 platforms. So we're not going to be looking at that. And since we're on our platforms, we're going to be focusing on using Opti itself. And the main features that Opti provides are encrypted persistent storage, cryptography routines and isolation between trusted applications. So it's going to handle pretty much all of the very, very basic security aspects for us. And then we're going to combine those in order to build a richer, secure application on top of it and then provide that functionality to our untrusted user space so that it can request secure actions on our behalf. So now that we were motivated sufficiently, we believe Opti is a pretty good idea. Let's talk about how we can actually start building out pieces of it and start adding functionality to it. So if we're going to try to add Opti to a new platform, there's really four steps. The first is the basic platform bring up. If you're using a board that already supports it, then you just kind of go into Yachto and configure it. If you're trying to port it to a new board, then you have a rather extensive porting task, but I'm going to suggest that that porting is sort of outside the scope of our discussion. Instead, we want to just focus on the middle two steps of the process. We're going to talk about bringing up the hardware RNG and we're going to talk about enabling hardware accelerators. We're going to assume that your platform already works and we're going to assume that once you're done here, you're going to go back and enable secure peripherals that might be specific to your application. Because it's kind of beyond the scope of this. And just to give you an example, like a secure peripheral that you might want is say a smart card reader, because you might want to exchange data with it that you don't trust the Linux kernel or a secure keypad reader. For instance, a user enters a pin, you wish to do some pin validation, but you can't trust the kernel with seeing what that pin is because if an attacker compromises it, of course, they're able to extract the user's pin. So we want to keep that isolated, but both of those are kind of beyond our scope. We want to focus on just the hardware accelerator enablement. And as we do this, I'm going to assume that you're kind of familiar with the core components of making changes to Opti. So when we're configuring a new platform, we really have sort of two main pieces of configuration and then just kind of an aside about how the build process works. So I'm going to assume that you understand how conf.mk works, which is a platform specific configuration file. It stores build configurations and options and enables features, disables features. I'm just going to sort of assume that you understand how to change those features. You understand what like the force option is and all these different things. And then I'm also going to assume that you understand how platform config works, which is the C header file that typically defines the memory layout for a new platform and tells us what addresses peripherals are located on and it helps Opti understand how to set up the initial memory mapping when it boots so that it can divide the world into secure and nonsecure peripherals. So our basic plan for approaching this hardware random number generator driver is first we're going to want to try to understand Opti's crypto RNG API, then we're going to want to implement a driver that satisfies this API and then finally we just integrate it into our build. Hopefully it would be a pretty simple three step process. So let's see how that pans out. The first is that when we look at the crypto RNG API, we see that it's only three functions. There's an initialization function crypto RNG in it and that function is responsible for initializing our state vectors or configuring our hardware. In the case of a hardware provider and so forth, just basic configuration. In this case, it receives one argument, which is actually the seed that should be used. This is really only relevant for the pseudo random software driven generators, which are seeded. The hardware generator that we're going to be using, of course, is not seeded so we can ignore this argument safely. Next is crypto RNG ad event. This is again another feature that's specific to this software pseudo RNG. This is used for adding entropy to the RNG. And again, for a harder device, this is typically not relevant, so we don't actually need to implement this function. Finally, we have the one function we care about, which is crypto RNG read and that allows us to read a specified amount of random data from hardware into a user supplied buffer. Now, to make this simple API even easier to work with, Opti provides weak links to defaults of all three functions. And in the case of crypto RNG read, all that the function does is verify the parameters so it makes sure that buffer is not a null pointer and make sure that the size makes sense. It's not bigger than you can store in memory, that kind of thing. And then it just calls hardware, get random byte repeatedly in order to actually load the random data from hardware into memory. So we can present a very simple driver implementation immediately. All we have to do is implement RNG in it. And during initialization, what we're going to do is try to obtain the virtual address for our physical hardware. And it's important to note that because our hardware is registered as a secure peripheral in system in it, we need to specify, hey, we're looking for a secure peripheral mapping. And then the Opti and a new configuration, it looks it up in the new tables and gives us the corresponding virtual address for our hardware random number generator. And then of course, like I said, we have to implement hardware, get random bytes. And in that case, we're just simply going to read one byte of data from the peripheral at a time. Now, if your peripheral, say, returns four bytes at once or more than that, you may wish to reimplement the entire crypto RNG read function yourself so that you can manage the hardware state more effectively. If you were to make one call to read four bytes, but only use one byte each time, that kind of becomes inefficient. And depending on the speed of your hardware random number generator, it could become a problem for the application. So you may wish to do some of that management yourself, but the simplest implementation looks pretty much like this. So with the implementation done, what else do we need to do to configure our build to use it? Well, obviously, we got to add our hardware RNG addresses to platform config. And then the big step is that we need to disable the software random number generator and switch it over to hardware. If we turn off the CFG with software PRNG option, then the build system will automatically switch over to the hardware RNG framework. And then all we have to do is enable our driver that implements those three functions that we mentioned before. It'll get linked in correctly and everybody will be very happy. And of course, just for reference, I just want to say this is how you might register your hardware RNG's physical address with the MMU as a secure mapping rather than a non secure mapping or a shared mapping or anything like that. So having implemented the hardware random number generator, now we're feeling pretty confident we're ready to just dive into implementing a full crypto accelerator and we're probably going to approach this pretty much the same way, right? So we start off by diving into crypto dot C and looking at what the crypto API is. And when we dig into this function, we find that symmetric keys, hashes and max all work one way. And then the asymmetric keys kind of all work uniquely. So if you want to implement an RSA accelerator, it has its own API and its own flow. If you want to implement a DSA accelerator or an ECC accelerator, each of those have their own unique flows. Because of the great amount of variability in the asymmetric operations, we're going to focus just on the standard ones that typically have a single API. And unfortunately, the asymmetric ones are kind of left as an exercise for you. My experience working with implementing an RSA accelerator is that it would basically be a presentation the same size as this one, just to talk about the RSA system. It's rather unfortunate. There's a lot of conceptual crossover. So once you understand this, you'll be good to dive in. But it is unfortunate, a lot of extra detail. So given that I've said that all three of these work the same way, how do they work? Well, the first thing that we want to do, like for example, when we're going to hash some data, first thing we need to do is allocate a hashing context. And then we want to initialize that context, which in the case of some hashes that involve keys would involve setting a key, or if we were doing a symmetric key operation, that would be where we'd set the symmetric key. We would set an initialization vector, those kinds of things. Then we load data into our algorithm in the update step, and we can call that as many times as we want for as much data as we have. And then we use the final method in order to get our hash result or get our MAC result, or in the case of a cypher in order to pad out our data and obtain our final data block if necessary. And then once we're done everything, we call the appropriate crypto hash free in order to release the memory associated with the crypto hashing context in order to make sure that we don't have memory leak, we don't run out of RAM on our system. So how do these functions work internally? When we call crypto hash in it on a hash context, what happens is that it looks up this crypto hash obstruct and then calls the init function pointer on an obstruct. When we call crypto hash update, it does the same thing. Looks up the obstruct, looks up the update pointer, calls the update function. So it seems like in order to implement a hardware accelerator, all we need to do is fill out one of these structs, crypto hash ops with function pointers pointing to our functions that actually perform those operations using the hardware and then get the crypto system in order to do that or get the crypto system to use that obstruct in order to integrate the hardware accelerator. It seems pretty easy. However, how many of you noticed that there was no alloc function in the crypto hash ops struct? So this allocate step has to happen somehow separately and somehow the crypto API has to be informed that it should use a different allocator in order to prepare a hash context for our hardware. So it looks like we're going to have to actually look at the code to figure out exactly how that works because it doesn't seem to be clear from the API. So this is a snippet of crypto hash allocctx, which is in the opti kernel. I've reformatted a little bit to try to make it fit better on the slide. But what we have here is that the first thing it does is it calls this drive crypt hash allocctx function and then depending on the return value from that function, it either succeeds or it seems to fall through and per algorithm try to allocate an algorithm specific context. So let's see. So what does drive crypt mean? It turns out that drive crypt is the specific subsystem or mechanism that opti already contains for integrating a hardware accelerator. It allows you to register one accelerator per family of operations. So you'll have one hash accelerator that's supposed to handle every type of hash operation, you have one Mac accelerator that's supposed to handle every type of Mac operation, one symmetric key accelerator. And then once you get to the asymmetric keys, you have one for RSA, one for ECC and so forth. And again, each of those are kind of unique. In order to add a new drive crypt accelerator, luckily, we don't need to change the crypto core at all. All we need to do is let drive crypt know that it should use us to handle that operation. And even better, if our hardware doesn't support everything, like let's say we have a hardware accelerator that only does Shaw one and Shaw two and someone needs an MB5 for some reason, it will implement a software fallback and we won't need to actually handle that ourselves in hardware, which is great because it simplifies the driver implementation to not need to worry about the fallback quite a lot. So what does the overall flow look like? Well, we implement our hardware driver and then during our hardware initialization routine, just one of the intercalls in Opti where we want to configure everything else or configure everything, we're going to call drive crypt register and in order to inform it to use us to handle hashing operations. And then later, when a user is using the crypto API inside the kernel, they call Alec or they call in it or they call update that will then flow through drive crypt and reach our hardware driver at the end of the day. So all we have to do is implement those functions just like we thought before called drive crypt register and that will hook us up to the entire sub system. As far as registration goes, we have a couple of different scenarios again. For hashes and max, the registration is very easy. You provide simply an allocator function that returns a crypto hash context or a crypto Mac context. If we're doing a symmetric key, it actually has to return a struct of crypt or of drive crypt Cypher, which is a little bit different. But it turns out that drive crypt modifies the parameters to symmetric keys in order to make it easier to work with with hardware. It sort of has like an intermediate interface in the middle. So rather than just directly registering the allocator, we actually register another obstruct that stands in between the two and of course, the asymmetric ones are fully unique again. So unfortunately, we can't actually deal with those. So the initialization and registration is pretty simple. I just wanted to pop this up here. So you're going to have your hardware initialization that you will do manually and that will be done in whatever normal way you might want to do it. I'm not going to give any like simplified examples or whatever. You're just going to have all your regular code. Then once everything is ready, you're just going to call drive crypt register hash with your hardware hash allocator function pointer. And I wanted to mention how the initialization works. So opti has multiple in it levels. There's like an early in it and early late in it, a service in it, service late in it, driver in it, driver late in it and so forth. However, all of those in it calls essentially happen in sequence. So in a level one runs in a level two, then three and four and so forth. All of that happens after the core board configuration. So if you try to initialize your hardware here, hashes will only be available after this point. If you need hashes earlier in the boot cycle, one good example is if you're loading a trusted application before all of the drivers are ready, you may need to manually insert your call to your hardware initialization function earlier in the boot flow. It's a little bit beyond the scope of the presentation. It's a little bit of an advanced topic, but just be aware that this is probably good enough. But if you need a special case, you may need to move this actually to even earlier in opti's boot flow. So what does our hash allocator do? Well, the first thing that it needs to do is it needs to make sure that the hardware supports the algorithm. Like I said, we have automatic software fallback. So all you really need to do is notify DriveCrypt that your hardware does not implement this algorithm and it should do the software fallback. It'll take care of everything else. Beyond that, all you do is allocate the hash context and configure the obstruct just like we thought. So I also do want to highlight, though, how the hardware context might look here. So the crypto hash context is fixed, of course. However, the crypto hash update and final functions do not include any additional information. So in particular, they don't include the algorithm ID for what you're doing and they don't have any extra data. So what you might wish to do is embed the crypto hash context struct into your own struct and then store some extra information, such as which algorithm you're implementing. You might have a hardware accelerator that expects MD5 data in one register and expects SHA data in another register. I've seen some platforms like that, especially once you're getting into symmetric key operations, you might have AES data in one register, you might have DES data in another register, for example. And of course, if your platform supports, say, multiple instances of an accelerator, then you also need to keep track of which one you're actually using. So you might want to store the virtual address that is the base address for your particular accelerator instance as part of your hash context. And then, like I mentioned, you just have to fill out the hash abstract, make sure that it gets passed to the crypto API and you're good to go. So once we've finished our driver and we're ready to integrate it with the build system, we need to make sure to go in and enable the drive crypt feature, which is the CFG crypto driver option. And then each subsystem has its own support layer that you have to manually add. So you'll need to add a drive hash or a drive Mac or drive Cypher. And then even more, if you want to do asymmetric ciphers, you're going to need to enable the specific asymmetric cipher that you're supporting hardware acceleration for. So RSA has its own option, ECC has its own option and so forth. And I've omitted those because there's actually, so there's too many options and too many different possibilities. It also kind of changes rapidly. So you kind of really need to just look at the repository and see what's available in order to find out what to enable for your asymmetric key cipher. So now that our hardware accelerator is ready, we're ready to go. We've got it fully integrated. We have to actually figure out how we're going to plan on using this thing from user space. So let's take a look at how we might do that from Linux. So our first option is going to be to implement a fully custom library that just directly talks to Opti and does our layer. This is kind of the most obvious point and which we're going to spend the most of our time. Then we're going to talk about how we might integrate it with the Linux kernel, how we might implement it with open SSL or how we might implement it with or how we might implement a crypto key API to make a PKCS 11 compliant so we can use it with some other applications. And I really want to highlight in this section, there are going to be a number of code examples. However, those code examples are stripped down just to highlight a specific aspect of the API or a specific feature that I want to talk about. They don't have complete error checking. They don't have complete parameter validation, both of which are very important for building a successful system. So if you're trying to build your own, you've got to make sure you add those things or you'll run into a lot of problems down the line. So let's start with a high level picture of how our application is going to look when we're using our Opti crypto routines. So we've got our non-secure user space where we have implemented our application. You know, it's our IoT device that measures temperature and reports it across the internet, for example. We have a wrapper library that we have to implement ourselves, which is going to provide an API for accessing our hardware crypto elements. And then all of that's going to get transported into Opti using the T client, which is a standard part of the Opti software. So we don't have to implement that part ourselves, luckily. Now, once we're on the Opti side, the kernel will handle everything up to the point where it will call TA Invoke Command Entry Point within the trusted application that we write. Now, this function is basically where we receive our commands from the user and we're going to have to figure out how to handle them. And so in particular, we might end up eventually calling T cipher update, for instance, and then that will go back into the Opti kernel and actually do the cryptographic operation using the hardware accelerator that we defined earlier through the crypto API, the drive crypt API, and then finally our hardware accelerator. So you may ask, well, what is a trusted application? We haven't talked about that yet. A TA is basically a user space application from the perspective of Opti. It's still a secure application. You still verify exactly that the application is what you expect it to be, that it was written by you, that you trust in everything, and it runs within the secure world and has access to secure resources. The trusted applications only have access to the global platform T internal core API. They don't have access to say libc, even for instance, and they don't have access to POSIX system calls. They really only have a limited feature set that was specifically designed in order to implement these secure applications. And typically the secure applications are used to handle tasks on behalf of Linux, which is done using a basically a remote procedure called kind of interface where the Linux user space will configure a bunch of data in shared memory, and then it will invoke Opti, which will look at shared memory, parse the data there, and then it will call the trusted application with the parameters that were given to it. Then once it's done, it returns a result in shared memory and Linux is able to look at it and see what happened. So as we're doing this, we're going to need to do cryptographic operations. So what is a T operation? Well, everything in Opti is actually stored within the kernel and within the trusted applications, the user space only has access to an opaque handle. So previously where we would call Crypto hash Alex CTX and that would give us a pointer directly to the hash context. Now we're going to receive an opaque handle that points to memory that's allocated kernel side rather than a user space. Otherwise, the operations are basically the same thing, except that they need to be associated with a key that is stored in a T object rather than a plain buffer pointer. So the next obvious question is, well, what's a T object? A T object, just like an operation, is an opaque reference to data that's stored kernel side and T objects come in two varieties. So there are persistence objects and there are transient objects. Transient objects are ones that exist only in RAM and only for the lifetime of the trusted application. So if there's ever an error and the trusted application gets killed or panics, the transient objects are immediately wiped. And this is useful because it prevents a kind of attack where you might want to crash the application and then leave some data sitting in memory. Even though, you know, it hasn't been reclaimed yet, for example, and it's sitting there and you're just like, oh, I'm going to crash it. And then I'm going to try to read some physical memory and pull out a key. That can't happen with Opti, luckily, because it will wipe the keys as soon as the transient object is deallocated, you know, either intentionally or because the application crashed. The objects, of course, therefore should be used for cryptographic keys because it adds another layer of security and fallback where as soon as your application ends, the keys get wiped. However, sometimes we do need to store things long term. And so to do that, we would use a persistent object. And the persistent object basically allows us to encrypt information and then write it out to long term storage like an EMMC. The persistent objects provide both encryption to secure the data and they provide integrity so that you can verify that the data was written, that you read back is exactly what you wrote. Basically, they detect data corruptions on encrypted data, of course. And these are typically stored on the EMMC or really through whatever file system Linux provides for you, which is great. The persistent objects are also automatically restricted to a single trusted application by the OptiKermal automatically and data by default cannot be shared. However, if you really wanted to, there is a mechanism that you could implement to share persistent objects between different trusted applications. So how does the interface between Linux and the TA work in a little bit more detail? I mentioned earlier that it was an RPC style interface. And basically what that amounts to is each remote procedure that we can call is identified by a command number. It accepts up to four arguments, which is it's an opti implementation detail rather than a part of the standard. And then each of those arguments, of course, it can be an integer or a buffer with a well-defined length can be an input and output or it can be data that's originally used as an input and then modified by opti in place, that kind of thing. So how does a minimal implementation of a TA interface actually look? Well, it comes down to one function that we register with the kernel as part of the trusted application structure. And in particular, it's typically called TA invoke command entry point. And an example one like this might be one that has two commands, one to do AES encryption and one to do AES decryption. So the first step is we figure out which command we've received and forward to the appropriate function. And then that function is going to do a little bit of parameter validation. So like I mentioned, opti typically provides four parameters for a function. So in this case, we can say, oh, we expect an initialization vector. We expect an input buffer and we expect an output buffer. And if we don't see those things in that order, you know, I mean, we can't verify the semantics of it, but we can at least verify that these are input buffers and these were output buffers. And then if that succeeds, then we can take our AES key that was stored secretly already inside the application, pass it to our AES encryption key and say, hey, please operate on this data. And then our AES encryption key might look something or our AES encryption routine might look something like this. It's going to allocate a new operation. It'll associate our key with the operation, initialize it with our IV, do the operation and then return the result to the user. And again, I keep bringing this up, but we're not doing any error checking or parameter validation here. So what happens if there's a problem with one of the parameters? Well, Opti is designed to try to save you from making a mistake. Obviously, you need to do this yourself and we'll talk about why. But if you just forget to do everything and you pass in an input buffer that's seven bytes long, for example, or you pass in an IV that's seven bytes long because that would be pretty invalid for AES 128. The TAPI is not going to do something inappropriate. Instead, it's going to say, hey, this buffer is seven bytes long. That's an error. And then it will cause a panic immediately, which will kill your TA. It'll delete your transient objects and everything and just fall back to user space and then the user would have to start over completely. This is often not what you want, but it is a failsafe at the end of the line. So typically you want to do the error checking yourself and prevent those kinds of things by reporting errors to the user and then they can be like, oh, whoops, my IV was too short. I'll give you a better one next time, whatever that kind of thing. But it will try to save you if you forget, but please don't rely on that. So now that we've seen how the trusted applications are arranged, how we call a command, how that eventually ends up interacting with the user space crypto API, which of course calls into the kernel and uses our hardware, let's try to combine all of this in order to build a secure storage system. So we mentioned at the start of the presentation that our goal is to build a software only secure element that provides the same kind of features that the hardware secure element does in order to lower our system costs. So to do that, we're basically going to try doing something like organizing our data into slots where each slot stores the data persistently. And we have multiple types of data. So we might have an AES key in one slot. We might have a public key certificate in another slot. We might have an RSA key in another slot, you know, all those things. And then each one has operation support that's specific to the type of slot that we have. So at a high level, this is what the application flow might look like. We're going to have a user space library that's untrusted code. It's going to provide a basic range of functions for accessing our key storage system, and it'll do basic operations like opening a slot or closing a slot. It might support using a slot to encrypt data or decrypt it. And of course, finally, for actually adding our data to the system, we will also need endpoints to inject a new key or to generate a key within a slot, for instance, if we're building session keys for TLS or something, we'll need functions for those kinds of things as well. In any case, all of those functions get routed into that TA invoke command entry point function, which then we'll have to fan back out based on the command that was given. So if we receive an open command, it's going to go to the slot definition, pull up the slot operation struct that we have and find the open function pointer, and then if it's valid, it'll call it. And you can do this the same thing for any other operations. So even even more like important is if you try to call encrypt on, say, a certificate slot, then your implementation of invoke when it's delegating to the encrypt function pointer should detect, hey, there's no encrypt function here. That's an error. And obviously it's our slot type doesn't support encryption, so we can't do that, right? And finally, the reason we want to use this array of slot ops is that each type of slot could store different data and therefore support different operations. So if we imagine something that stores an AES key, that might be a transient data object. We're just going to dump our AES key directly on a transient object, use it for operations, then delete it. If we have something that stores a certificate, we might actually not want that on an object because there's no certificate operations within the opti API. Instead, we're going to have to do some of our certificate interface stuff ourselves. So we might want to have just a regular buffer pointer that we copy our certificate into. Or if we're working with the payment industry, we might have some features that are not yet part of the opti standard. So in particular, if we're doing something like device unique key for transaction stuff, then we need to, when we load our data from persistent storage, we have to initialize our transaction counter. We have to calculate the correct key serial number. We have to make sure that our future key registers are populated correctly. So we typically have different responsibilities for each type of open operation based on what the slot is actually going to be used for and what kind of data it stores. So I'm going to give you a couple of examples here. I mean, we could go straight into encryption and decryption. However, we've sort of seen that workflow a lot, right? Like when you want to encrypt something, you call encrypt and that will call encrypt and then that will call encrypt again and again and again until eventually you end up with a hardware accelerator and actually does the encryption. So we're not going to talk about that. I will, however, talk about some of the details of opening a slot and closing a slot because that's very important. So it's important enough that I want to mention that all of those operations are actually separate from your regular slot operations. When we go to open a slot, we first obviously we're going to allocate resources and whatever. And then we're going to read our persistent data from the file system. Not every slot is going to have persistent data, but for all the ones that do, then we allow them to initialize themselves based on that persistent data. So if you're thinking about it in terms of the examples I gave on the previous slide, a transient slot might have a new function that allocates a transient object and then the persistent object doesn't exist. So we're going to see an item not found error and that also lets us skip the initialization function and this is really what I wanted to highlight is that item not found is not necessarily always an error in opt-t. You might simply have something that doesn't have persistent backing. However, you do need to carefully think about what that means. It could also mean that you're bringing up a device for the first time before you've done your manufacturing configuration and you actually need to load data into those persistent slots in the first place. So typically in order to determine whether a slot is populated or not, you can't rely on item not found by itself. You would also need to store something separately. For example, insecure one time programmable memory that indicates whether manufacturing provisioning has been completed for this device or not. And then if provisioning has been completed, item not found could be an error. If provisioning has not been completed, then item not found is probably not an error because we're about to do our initial key load into the slot. You could of course also implement a custom routine that skips this step, but it's nice to have things sort of follow a standard flow. And then the other important thing to look at is how do we write a slot correctly? So like I mentioned before, if we're storing a new certificate or restoring a new key that we've generated or we've updated our future key registers and we want to prepare for our next transaction, it's very important that we don't corrupt the data on our device. So a naive solution would be, well, I delete the previous object and then I write a new persistent object in its place. However, if your system loses power in the middle, suddenly your slot gets clear and you're back to the previous slide where you're trying to figure out what does it mean when the persistent object is not found? What kind of error is that? Luckily, Opti provides a way to do an atomic object replace. If you supply T data flag overwrite to the flags when you create a persistent object, Opti will atomically swap the two. So what it does in practice is that it writes the second object next to the first one and then swaps the pointer. And this guarantees that if your system has a power loss, you will either read the previous object or the new object, but never something in between. You'll never have half of each. You'll never have an item not found there. You'll just get one or the other and that's all. I just want to highlight that because that's kind of important for making sure that you don't accidentally break yourself because you have a power loss or something benign that might happen during operation like that. You know, not even worrying about an actual attack. So having talked about how we might organize things and how our API looks, well, what is left in order to actually build out a trusted application? Really, it's a bunch of routine tasks. So. I mentioned that there's an obstruct for each slot, of course, and that's going to support a bunch of different operations. You really have to go through and implement all of those yourselves. Obviously, like I said, there's encryption, there's decryption. It calls encrypt. It's not that interesting. It's not nearly as interesting as opening or writing the slots, which have some very specific pitfalls associated with them. You'll obviously also need to fill out the invoke function, which just is just a large switch statement. You've got to check for each command number, forward to the appropriate handler, check for the next command ID, forward to the appropriate handler, and so forth. And then as I keep mentioning, you have to do all of the error checking and parameter validation because it's very important to do all of that in order to make sure that your application is robust against either programming mistakes or malicious attempts to use it. After you've got all the core pieces in place, you have to approach the one time programming support for manufacturing. Somehow you're going to a need to initialize your key store. And I've seen a couple of different approaches. One is to load one version of the trusted application for manufacturing and load a different one for release. And then the manufacturing version supports key injection, but the release version does not. Another option is to use the SOTP, like I mentioned before, and have a flag that indicates that programming has been done. And if that flag is set, you don't allow new keys to be loaded. You reject all those operations as a security violation. And then finally, after you've done your manufacturing enabling, you have to decide how to handle one more error. This is the T error corrupt object. And to do that, you kind of have to think beyond an item not being found, what does it mean if an item is corrupt? Well, to understand that, it helps to think about how this works. So we have a key that's stored in SOTP. And then based on that key, we're going to try to decrypt our objects. So if we receive a corrupt object error, either the source data for the object was corrupted or the key was corrupted. And that could mean that a hardware failure has happened, in which case the unit needs to be brought in and serviced. Or it could mean that someone has intentionally tried to tamper with our unit. And so you've got to kind of think about both of these possibilities and then decide, you know, does this mean that we should shut down operations immediately? Does this mean we delete everything and break ourselves? You know, does this mean we try to notify somebody? You know, exactly what happens? It's kind of up to exactly what your situational requirements are for your project as to how you should handle that kind of an error. So having finished the trusted application, we can move forward to Linux user space application. The user space application is very simple, luckily. It's going to mirror the invoke structure of TA Invoker command entry point. And the user space library is actually very, very simple. It's basically one function call is going to set up the correct command number. And it's going to call the T client to invoke that function within our TA. And all of them work exactly the same way. So it's not particularly interesting, exciting. It's optional whether you do the parameter validation here or on the TA side. I would say that under all circumstances, you have to validate parameters within the trusted application. So anything that you do in the user space library is really a convenience. I recommend you do both places. But if you're only going to do one, be sure to get the trusted application. So since the user space library was really easy, let's turn our attention to the kernel and hopefully that'll be pretty easy too. While we're talking about the kernel, I want to mention that if we enable Opti, you need to be able, you need to be sure to disable any existing kernel hardware accelerators. So, for instance, if you're using an XP's cam, that has a kernel driver that's readily available. It's part of mainline, it works great. However, you have to disable it once that a peripheral has been transitioned to secure world, because now the Linux kernel will not be able to access it. And if it tries to access it, it'll generate a security exception. So first things first, you disable that driver and then you're going to replace it with your own driver and replacing it with our own driver is pretty similar to how we did things in Opti. I mean, we're going to define what amounts to an obstruct and register it with the kernel. For our crypto algorithms, we have crypto register ALG and then for things like hashes, we have crypto register A hash and S hash, which are asynchronous and synchronous hashes. Personally, I would recommend using the S hash interface, because effectively all of the Opti hash operations will be synchronous because open world will stall until Opti gives back control. So since it's effectively synchronous, it's kind of easier to just write synchronous code as well on the other side. And the nice thing is that because the Linux kernel has an implementation of the T client available internally, the implementation of a kernel based provider is essentially the same as the implementation of the user space libraries. It's very simple and I'm not going to try to repeat anything too much, but I would like to highlight one thing, which is that when you're registering a Cypher ALG struct, there's these three functions that you have to implement, one to set the key, one to encrypt and one to decrypt. And when you're setting the key, the Linux kernel typically expects this to be a private key or a symmetric key, depending on what you're doing. However, all of our keys are stored only within Opti and we don't want them to be visible inside the kernel. So rather than setting a key, I would recommend that you supply a key identifier, which the trusted application then uses to select the key from itself. You could just say, oh, I'll just give a slot ID. But it turns out that the kernel is even smarter than that. And when you're doing something like AES 128, it knows how long the key should be. So if you try to pass it a four byte slot ID as the key, it'll reject it and say, hey, this key is too short. So unfortunately, kind of the pattern, make sure that it's the right length for your operation, if they make it look like a real key as far as the kernel is concerned, but the data it stores is actually just a slot identifier that your trusted application knows how to interpret. So having done the custom application and having looked at the kernel and found that it's really easy to implement, we should be feeling pretty good about ourselves. And we're like, hey, we're going to tackle open SSL because open SSL supports custom engines, which let us add our own features. And we could just build an engine around our custom library. And then we're going to have full open SSL integration. Everybody's going to be really happy. Please don't do this. I spent about two weeks trying to get started on this and got almost nowhere in about two weeks of work just because of how complicated open SSL internals are. You know, it was easy to get like one RSA encryption operation going. And I was like, oh, well, I want to make it work with a different subcommand. As it turns out, you have to implement yet another structural stuff. It ends up being very, very complicated. So I would recommend instead that you try to make sure you use an existing engine. And it turns out open SSL really has two engines that facilitate this. First one would be AF-ALG, which lets you use the kernel crypto providers directly from open SSL. And then you don't have to implement your own engine at all. You're able to just do your kernel implementation fall back onto that. And so it uses the AF-ALG socket that you would normally use to access kernel crypto from within a C program. The other alternative is that you use a crypto key wraparound engines, such as OpenSC's libp11, and that they basically provide an open SSL engine that converts the open SSL internals to the pkcs11 standard. And anything that implements the crypto key API can then be used with libp11 and as such get plugged into open SSL. So I've talked about pkcs11 a bunch of times, I've mentioned it a bunch of times. And I just kind of want to briefly go over a couple aspects of the spec and why you might favor implementing something that supports it. Basically, this spec supports pretty much everything that we want in our crypto storage application. It has multiple slots for cryptographic tokens. It has an interface that lets us perform operations using those tokens without needing to copy them around. It doesn't export secret keys that everything is stored within our device and we exist external to it and ask it to do things on our behalf. And then I'm sure this is not a coincidence. There is some resemblance between the two APIs. They're definitely not the same, but it seems like they map one to the other very easily, which gives us a low friction interface between the two. And again, it makes it very simple to implement a crypto key provider for our opti-based storage systems. And then the reason that we might actually want to do this is that a lot of applications that don't work with open SSL do work with pkcs11. So you just get a broad range of application support right out of the box without having to reimplement anything else. So since this is a standard, the next obvious question is, well, shouldn't there be a standard implementation that does this? And I want to say yes. The Leonardo team has been working on a pkcs11 TA and a corresponding library called libcktc, which allows you user space access to that TA. It's very exciting. It's very promising, but unfortunately, it's not quite ready for you to use. So that's really why we're talking about this at all. Like these details I would like for you to not need to know them, but until a standard solution is ready, which could be a year or two or more, you're going to have to get in there and implement your own trusted application, you're going to have to implement your own interface. However, if you're excited about standard solution, everything, I would recommend that you check out Ruchika and NtN's presentation from LVC earlier this year. They go into detail about the current status and the roadmap for when they feel a little bit ready for production use. And if you're interested in an alternative to pkcs11, like you really like trusted platform modules, for instance, there are actually firmware TPM implementations that target Opti. So in particular, Microsoft has an implementation of the TPM 2.0 standard that uses Opti as its back end. And this implementation is ready. It's fully released and it even has a mainline kernel driver. So you could start using it today. I haven't really messed around with it too much other than being aware of its existence, but it should be ready for you to use if you're interested in a TPM style interface. And even if you're not, you can always take the TPM interface, plug it into another Crypto Key adapter and then use it from a pkcs11 consumer. I haven't investigated those too much, but they do exist. So it's all possible. There are a couple of standard solutions, but, you know, it's kind of up to what your system needs are. So in summary, I would say I would recommend that if you're building out an embedded system with security needs, you use Opti to provide cryptographic services, use Opti to store sensitive data. We can add hardware accelerators directly to Opti to leverage hardware features that are available on our SOCs. We can access those accelerators through trusted applications that are managed by the Opti kernel and then we can access those trusted applications through a user space library, such as one that we write ourselves or a standard one that we pick up from another provider. And as such, I would recommend you be on the lookout for those standard ones and in the future, once they're ready, try to integrate those in your product rather than rolling it yourself. So thank you for being with me today and I'll be around on Slack to discuss things or answer any questions that you might have. Thank you very much.