 Hello everybody and welcome to my talk about a new approach on the Linux Dev Random Implementation. Before I like to start, allow me please to say that I would very much like to be on site together with you to have face-to-face discussions and talks about this implementation. Unfortunately, the US government is not allowing European citizens to travel to the US these days. So therefore, it is just a virtual event at least for me. Nonetheless, while you're watching this video, I am available at the conference's chat system to answer all questions and all issues and comments that you have. Even after the conclusion of this conference, I would be more than happy to discuss any issues and problems that you have over email or other communication venues. That said, let's have a look at the actual presentation now. Before I'd like to jump right into the presentation, allow me to say that this presentation and also the code that I'm going to show here is a means to facilitate the discussion around Dev Random and to achieve an implementation that is flexible to the extent needed by all different vendors and users. That said, my first step to outline the Linux RNG implementation that I am providing here is to discuss the goals, the design goals with you. The second step is to analyze and outline the design. Finally, I would like to cover the entropy sources with you and followed by the actual initial savings strategies that are applied to achieve a fully seeded RNG as soon as possible. Okay, let's have a look at the LNG goals. Now, one of the first goals that I definitely have is that only cryptographic primitives are being used for data processing. That means in the LNG, you will not find things like LFSRs or other data processing steps that do not rest on cryptography. In addition, I do have a logless high-performance interrupt handle. Of course, when I say high performance, we have to compare it to the existing Dev Random implementation and a later slide will do so, also to give you an idea of what high performance truly means here. Also, considering that the LNG will be the resting foundation of the cryptography in the entire Linux system, we need to have test frameworks for all processing steps so that every researcher and every developer can have a look and verify that indeed the LNG provides and performs its operation as it is supposed to be without losing entropy. An additional aspect is that during PowerUp but also at runtime, certain tests are applied to the received data and also to the implementation itself to make sure we always have untainted entropy data. The key element is that the LNG attempts to be API and ABI compatible to the existing Dev Random implementation. This is clearly obvious with the patch set that is provided. It provides a kernel configuration option that allows the LNG support to be enabled for this Able. If you disable it, the existing Dev Random implementation will be compiled, otherwise, of course, the LNG. That also means if you apply the LNG patch, there is no change at all needed in the remainder of the kernel or even in user space. So we can just use it as it is. Another aspect that is of importance is that the LNG provides a flexible configuration environment to support a wide range of use cases. Maybe a step that I will work about in my background. I'm working with a lot of different vendors, in system integrators and even distributions. Lately, I see more and more the problem that they have different use cases they would like to achieve, which is very difficult with the existing Dev Random codebakes. The LNG should be able to cover all these use cases that at least I'm aware of during my discussion with all of these different vendors. Another aspect is that the cryptographic primitives that I mentioned at the beginning can be changed and even changed at one time if it is allowed via the kernel configuration. Also, one goal is to provide a implementation based on a clean architecture. Clean architecture means every entropy source, for example, has its own definition, it has its own implementation, the DRNG management has its own implementation and all of those different code bases are then being typed together. Yet, these different code parts work relatively autonomously and independently. Finally, there is a standards compliance that I'd like to achieve with dimension standards, yet any code providing these standards compliance is only compiled if a respective kernel configuration option is eliminated. Otherwise, there is no code whatsoever being left in the LNG around this. The design is now provided with the picture that you see here in that slide. Let's focus on the center part, that nice little box with the different colors that is marked as the temporary seed buffer. This temporary seed buffer obtains the data from four different entropy sources that you see with these different colors here and it will concatenate this data before it's being used to seed the DRNG. By concatenating the data, you see that all entropy sources are being treated equally, which means also that there is no possibility that any nor entropy source can dominate any other entropy source. It is easily possible that such a thing could happen if you consider the CPU-based noise source, for example, like RD seed. RD seed is an instruction that provides data relatively fast compared to the other entropy sources. Yet, the LNG ensures that the CPU noise source will not provide unlimited amount of data to the DRNG and therefore monopolizing the temporary seed buffer or the other entropy sources. This is achieved by the fact that the seeding operation of the DRNG is triggered by the boot process of the LNG or by the DRNG itself. It is not possible that a entropy source can say, hey, I have entropy available and I would like to feed it straight into the DRNG without regards to the other entropy sources. Also, the LNG provides the means that the entropy sources can be selectively distributed at compile time. So this already is one aspect that should allow vendors and system integrators to decide which entropy sources are of importance to them, for which they have an entropy assessment and which they think are good. And all others are either disabled or are credited with an entropy rate that is providing only very limited amount of entropy. When data is obtained from these entropy sources, it is fed into the DRNG. And then when the DRNG has to produce random numbers, the output generate operation is then wired up to the different APIs that actually feed these random numbers to the respective caller. Speaking of those output APIs, let's have a look at them. So we have two types of APIs. First, we have our blocking APIs. These APIs only deliver data after the DRNG is fully initialized and fully seeded. Of course, the question is now, what does that mean, fully initialized and fully seeded? One of our subsequent slides will exactly discuss this topic. And so therefore, please bear with me and let's just defer it. So which blocking APIs do we have? We have DefRandom and we have the GetRandom system called when it has been invoked without any specific flags. And finally, the GetRandomBytesFull in kernel API, which is called after being triggered by the AddRandomReady callback that would fire only when the DRNG is fully initialized and fully seeded. All other APIs that we have in NRNG deliver data without blocking until the DRNG is fully initialized. So therefore, they do not provide any guarantee whether it is sufficiently initialized or seeded. After looking at the output side of the DRNG, now let's look at the input side of the DRNG. That actually is seeding part. So again, as mentioned with the design, we do have this temporary seed buffer, which is a concatenation of all output from all entropy sources. During boot of the system, we actually apply the following concepts. The DRNG is seeded after all entropy sources have collectively available 32 bits of entropy, then at a second time when 128 bits of entropy are available, and finally when 256 bits of entropy are available. Now the question is, why these three steps? Well, 32 bits reached very fast during kernel boot way before user space actually takes hold. And we seed the DRNG with at least some entropy to make sure we do not provide the same random numbers that have been produced during the last one, for example. The next step, 128, is also commonly achieved before user space boots, and it is already the threshold where it provides a meaningful cryptographic security strength. Yet the full cryptographic security strength of 256 bits is then reached in the third step, and that third step is commonly reached either around the time user space boots or very shortly thereafter, say during the time when the root partition is mounted. At runtime, of course, the DRNG also needs to be re-sealed, and the re-seeds are performed after either performing or servicing 2 to the power of 20 generate requests, or after the elapsed 10 minutes, whatever comes first. In addition, user space can force the DRNG to re-seed. Finally, a re-seed is also triggered when a new DRNG implementation, so remember at the beginning I mentioned that it is possible that the cryptographic primitives can be updated, and they can even be updated at runtime. So that means when it has been changed, the DRNG needs to be re-sealed naturally. But there's a caveat. The DRNG is only re-seated after or when the entropy sources collectively provide or have at least 128 bits available. If you boot the LRNG with a different mode, this SP8190C mode, then it is required that the entropy sources even provide 256 bits of entropy. When seeding commences, the entropy sources are each requested to provide 256 bits of entropy, yet the entropy sources may deliver less depending on whatever they have available. And the interesting aspect is that when the LRNG identifies that the re-seeding is due, it only actually sets a flag, and the re-seeding then is performed at the time the DRNG is requested to produce with random bits. That means the DRNG checks this flag, and if that flag indicates that re-seeding is due, then it will go out to the entropy sources and fetch data. So with that, it might be possible that when the DRNG wants to seed from the entropy sources, these entropy sources in fact actually do not provide sufficient entropy like it does not provide or they do not provide 128 or 256 bits. That's actually quite harmless considering how often we do re-seed if you look at the initial statements. However, it becomes harmful if it is repetitive that re-seedings are not completed. And there is a precaution in the DRNG where when the DRNG has not been re-seeded with a full strength for more than 2 to the power of 30 generate requests, then the DRNG goes back into the non-seeded state. What does that mean? If you look, if you consider the previous slide, there the non-seeded stage means that the blocking APIs will again block. That means it's possible that the def random or get random will block eventually. The pictures that you see here on this slide show the regular seeding behavior, that means the upper picture is the regular seeding behavior where you see the different entropy sources, the interrupt CPU based Jitter-RNG, and the auxiliary entropy. It also shows you the amount of data that's being collected by the respective entropy source and the amount of entropy awarded to this data. On the other hand, the picture at the bottom shows a different seeding strategy and you will see shortly what these different seeding strategies actually entail. Another aspect that is of importance is that the NRNG is capable of managing different DRNG instances per numernode, if numernode is being compiled. Also the hash context used for conditioning, you'll see in a second what conditioning means, will also be a handle of numernode local. Every DRNG seeds itself from the actual temporary seed buffer, but in order, or since during boot time insufficient entropy may be available, the DRNG instances are being initialized in the sequential order, so of course the DRNG for numernode zero is initialized then for numernode one and so on. So how entropy is available is actually. And if a request for random number comes in, the NRNG tries to service it through the numernode local DRNG instance, but if that is not yet fully seeded, it reverts back to using the DRNG for numernode zero. And to prevent a re-seed storm, considering that all DRNGs are managed independently of each other, the at least timer-based re-seed thresholds are different for each DRNG. The DRNG for numernode zero has still the 10 minutes that I mentioned before, then we have numernode one which has 700 seconds, numernode two has 800 seconds and so on. Yet this entire code is only present and compiled if the numers of all is compiled for the count. I mentioned that we only use cryptographic primitives for data processing. Now let's have a look what that really means. Actually, we first have to consider what processing steps do we have that we need to consider. We first of course have the DRNG. The DRNG is one cryptographic primitive that's been used. The second one is actually the conditioning hash. We will see when we discuss the entropy sources, what conditioning really means and how data is being processed there. But at this point, let's just say that the conditioning hash is either a SHA-256 or we will see a SHA-512. The build in cryptographic primitives is a CHA-20 DRNG. You see here in that picture the concept of the CHA-20 DRNG. Essentially, it is the key stream that is being provided by the CHA-20 operation that serves as the random number that is being returned to the code. In addition, the DRNG implementation implements an enhanced backward secrecy, which means that the internal state of the CHA-20 DRNG implementation is being updated every time a random number request is being serviced. For the conditioning hash, the build in implementation rests on SHA-256. This implementation has been taken from the SlashLite directory. You see the build in implementations do not rest on the KernelCrypt API. It means that it is possible for the LNG to be compiled without the KernelCrypt API support. If it is available to the KernelCrypt API, then we also can draw on its primitives. There we have an SP-80019 ADRBG, which uses accelerated AES or SHA primitives. We can even use accelerated SHA primitives. In our case, we use a SHA-512 for the conditioning. Other implementations might be used, for example, even hardware-based DRNGs like CPACF, which might be wired up through the KernelCrypt API. Considering that the LNG provides a well-defined API to the kernel, it allows developers to provide other implementations of these cryptographic primitives, which might then be used and loaded at boot or runtime and then used by the LNG instead of the primitives I mentioned here. However, for all primitives that are outlined here and which are provided by the LNG, a complete test is available verifying that these cryptographic primitives indeed comply with this specification. The test harness is based on NIST's ACVP framework and the Github repository here provides you with the access to testing these primitives. In addition, the Chacha-20-based DRNG is externalized into user space, which allows researchers to analyze the behavior of the DRNG in user space and can easily compare it to the implementation in kernel. Of course, both of them are identical. But full disclosure, of course, there is more data processing that we have to consider, and that is the concatenation that I already mentioned. Basically, when we concatenate the temporary seed buffer or entropy sources that feed the temporary seed buffer, and we have another one, and that is the truncation of the conditioning message digest to the heuristic entropy value that this data actually is supposed to contain. Yet, all of those primitives that I mentioned are considered to be fully understood with respect to its behavior towards entropy, and I think all of those primitives are completely uncontended when it comes to its behavior with respect to entropy and the guarantee that entropy is going to maintain. With that, we actually concluded the discussion of the deterministic side of the DRNG, and now let's have a look at the non-deterministic part, which means the entropy sources. We do have two types of entropy sources, external versus internal. Let's first clarify what external versus internal really means here. With external, the NRNG considers entropy sources for which the NRNG has no control and no concept about. All data that's been provided by these external entropy sources is being taken as phase value. The entropy rate that these entropy sources are claimed to provide are taken at phase value. That said, what type of external entropy sources we have? Actually, we do have two types. We have first fast, which means this entropy source can provide data at the time the NRNG requests it. We have two implementations here. We have a Jitter-based RNG and we have a CPU instruction-based entropy source, for example, our Intel RDC and so on. In addition, the slow entropy sources, external entropy sources, well, they are not slow in the fact that they provide a slow rate, but rather they provide data uncontrolled by the NRNG. The NRNG has to expect data at any time and the NRNG cannot go out the entropy sources to fetch data from it straight away. So data is just trickling. All data that is trickling in has to be somehow stored and managed, and that is managed and stored in this auxiliary entropy pool that you see here on the lower right part of that picture. The concept of the auxiliary pools are actually quite straightforward. It is a hash context and every time data is being received, this data is being inserted into that auxiliary pool by performing a hash update operation. When data has been required from that auxiliary pool, well, the message digest is being created followed by an initialization of that hash context and the insertion of the temporary receipt buffer into that state. That reinsertion of the temporary receipt buffer into that state is only there to ensure backtracking resistance, but that temporary receipt buffer is not considered to contain any entropy anymore because it is used to see the NRNG and transport its entropy over there. Now let's look at the internal entropy source. When external entropy source means the NRNG has no concept of its structure, the internal of course means the NRNG has full control over that entropy source. It knows all its concepts and knows the entropy state. And we have one internal entropy source which is based on interrupt timing. All interrupts that are being received by the NRNG are treated as one entropy source. The data collection that's being implemented by the NRNG is executed in interrupt context. In addition, eventually, data also needs to be compressed with a conditioning hash and that is partially implemented in interrupt context and partially in process context. And as I mentioned, this compression is a hash update operation because the concept of this entropy pool is identical to the auxiliary pool where the entropy pool is just the hash state, the hash context. Now you may scratch it and say, wait a minute, how can it be that a hash update operation which is very costly, which executes an interrupt context can still lead to a high performance interrupt handler? Well, you see here with the graph in the upper part actually the measurements that I have conducted. That graph shows the mean duration in CPU cycles to service one interrupt within the NRNG. That depends on the collection size. Please bear with me. The collection size is explained in the next slide, so therefore just take it. The default collection size is 1024, so that means if I look at the graph for AVXSHA512 conditioning hash, it takes on average a little bit more than 40 cycles to service. If you compare that with the existing DevRandom implementation, which is marked with a gray line, the existing DevRandom implementation requires close to 100 cycles on average to service on request. That means you see it already that the interrupt handler of the NRNG is almost twice as fast or more than twice as fast as the existing implementation. That can even be faster if a certain kernel configuration option is being disabled, the continuous compression operation if that is disabled, which means that the hash update operation even is then moved into process context to have an even higher or faster interrupt handler. Now let's look at the process inside. How can it be achieved this performance? I have two pictures here with me, and let's have a look at the topmost picture. There, let's look at the top left part. Eventually, the kernel receives an interrupt and pings the NRNG that such interrupt has been received. The NRNG now obtains its timestamp. In our case, it's just a cycle counter from RDTSC, for example, on Intel. It divides this value by the greatest common divisor and takes only the eight least significant bits of that data. It applies some entropy estimate on it, and it performs some health tests on it, and then it concatenates it into the per CPU collection pool. That collection pool again is per CPU, which means it can be accessed without any lock. Therefore, already we have a lockless handling here. Additional data that is not believed to have any entropy can also be added into the collection pool that may be received by the CPU while the interrupt has been serviced. When the interrupt, when the collection pool is completely filled, then and only then the hash update operation is performed to insert the collection pool state into the actual entropy pool. And again, that entropy pool is also CPU local. That means also this hash update operation would be serviced without taking a lock. If the continuous compression is disabled, then also the insertion of the collection pool into the entropy pool, into that hash state, will not be performed while during an interrupt service, but actually at process context. When now the temporary receipt buffer shall be filled with data from the interrupt entropy source, then it calculates or requests the generation of the message digest from each entropy pool. All of those message digest are being fed into another hash to have a combined data set for all entropy pools, for all CPU local entropy pools. This is followed immediately by a reinitialization of the per CPU entropy pool. So in order to service the next insertion of the collection pool. When we look at the internal entropy source, we also need to now provide a means of analyzing it. Of course, here the LRNG is responsible for that entropy source and it shall provide all means to researchers and testers and validators and reviewers and peer developers to analyze whether the entropy data is actually providing sufficient entropy, whether it's good enough for all use cases. The testing code that's been provided actually is enabled at compile time. That means if you disable it, which should be the case for production occurrence, no trace of these tests and test codes is in the kernel at all. Also no interfaces are being exported. Yet when you compile it, then you can access these test interfaces via debug FS files. These debug FS files actually provide ASCII data output, but I provide a small tool that formats this data a little bit nicer so that you can immediately post-process it with your favorite analyze tool. What type of interfaces actually do we provide? We have test interfaces to obtain totally raw and unprocessed entropy timestamps that are being collected for each interrupt. We can also collect other event values, which may or may not be collected by the LRNG. In addition, we have a test interface to analyze the performance of the LRNG's interrupt handler. Finally, we also have a test interface for the hashing implementation of the built-in SHA256 hashing conditioner. This allows testers and people who want to analyze it whether the SHA256 implementation truly follows the specification of SHA256. With these provided test interfaces, actually, we do now have a full S890B compliant test cycle available that could lead to a full 90B entropy assessment. In fact, as part of the codebase, I provide a full-fledged documentation with a full entropy assessment on the interrupt entropy source. All tools that are needed to perform this entropy assessment are provided as part of the code distribution. It allows every developer and researcher to perform this analysis on the system of your choice. The table that you see here in that slide actually gives you a glimpse on the testing that I have performed on different CPU types. All of those CPU types provide sufficient entropy. Another aspect that should not be forgotten is that health tests shall be applied. Actually, we do have two types of health tests, which can be enabled at compile time. If you disable it, naturally, the code is not there. You do not incur any performance penalties that these self-tests apply or provide. The first type of health test is actually a power-up self-test, which verifies the cryptographic mechanisms, whether they are okay or not, and it also validates the timestamp management. That means the collection pool is tested whether it truly performs a proper concatenation of timestamps and time data. We also have an adaptive proportion and a repetitive count test for the entropy data. That means these tests shall verify that the entropy data actually is temporarily degraded. In addition, we have a timestamp pattern detector, which rests on the fact that the first, second, and third derivative of time is calculated, which must always be not equal to zero. With the presence of an adaptive proportion test, the blocking interfaces will block until this test actually completed its power-up cycle. The good thing is the adaptive proportion test and repetitive count tests are only enabled with a certain boot time flag, because it might not be of relevance for some users. And with that, with these health tests, we actually achieve an SP890B compliant entropy source. Let's have a look at now the seeding strategies, because the seeding strategies are guaranteed that we do have a fully seeded DRNG as early as possible. Let's recap. We already mentioned that we have these three steps where the DRNG is initially seeded with 32 bits, with 128 bits, and 256 bits of entropy, which marks an initial seed, the minimal seed, and the fully seeded DRNG. The blocking interfaces are being released when the DRNG is fully seeded. This is the default operation, which is applied when there is no other specific seeding strategy configured and compiled. Yet what other options do we have? And that is actually a seeding strategy based on entropy source oversampling. The initial and minimal seeding steps still apply, however, the fully seeded step is changed. This oversampling seeding strategy is only available if selected or compiled, otherwise it is not available. In addition, a certain kernel command line flag must be set, and the conditioning hash must be greater than 384 bits. What does it mean? The entropy sources are requested to provide more entropy than the conditioner hash actually can transport. That's the first thing. And the second thing is that the DRNG is being seeded with more entropy than the security strength of the DRNG. That seeding applies only during the initialization, which means the initial seeding. Any subsequent seeding then reverts back to require only a seeding strength of the security strength of the DRNG. The DRNG ensures that every entropy source alone is capable of providing this oversampling just by itself, which means that if a vendor wants only one entropy source to provide all its entropy, the DRNG allows you to do that. Finally, this oversampling strategy is actually compliant with SP890C, at least the current draft, and it complies there with the construction method of an RBG2 non-physical and an RBG2 physical if certain configurations are being applied to the DRNG. Now, before I release the DRNG patch, actually, I provide an extensive testing to make sure it operates as it's supposed to be. First, I have an automatic regression test suite available, which covers all the different options of the DRNG. You've heard that we have many compile time and runtime options, and they are all been tested to really truly work as intended. In addition, there's a locking torture test. Why is that? Well, I mean, I said that the cryptographic primitives can be changed and updated at one time, but at the same time, of course, the DRNG maybe requests just to provide random numbers. And this locking torture test basically causes a full load of the DRNG and performs a loading and unloading operation of the DRNGs in a repetitive form and fashion. Also, I applied certain kernel test frameworks that are listed here, like KZN, NUVZN, locked at the memory leak detector, SPARS. Another test is actually just a compile time test, and that is to see that the DRNG can be compiled without the kernel encrypted API. So, and it is successfully able to do so. Also, I provide performance tests. You see the table here that gives you an idea of these performance performances. I have a system called validation testing, and there is finally a test of the DRNG behavior in atomic contexts. This now leads me to the conclusion of my talk. I would like to point out where the actual code is residing. Here you have the GitHub repository where that code is located, along with all test tools that I mentioned and the actual documentation. Remember, there is a full-fledged documentation and entropy assessment at least of the internal entropy source. The testing that I mentioned has been conducted not only on an Intel X86 or an AMD system, but also on different ARM systems, 64 and 32 bits, on small-scale MIPS processors on a PowerLE and BE system, as well as an IBM C mainframe, or a RISC-5 system. You see, Rika or I covered embedded systems as well as big irons. Also at the GitHub repository, you will find backporting patches to long-term support kernels that I listed here, and in addition, also other kernels. With that, I would like to close my talk and hope that I provided at least a good addition to discuss the future of DevRandom and provide a complete production-ready implementation of a DevRandom device which uses contemporary cryptography and a contemporary approach without too many processing steps. With that, thank you very much for listening, and I would like then to ask, if you have any questions, please do not hesitate to reach out to me, and I would be happy to answer any questions that you have. With that, thank you very much, and bye-bye.