 Thank you, today, I'll be talking about some joint work with Dan Schumo in which we investigate the security properties of the Pseudo-Random Number Generators in the NIST SP800 98 standard. At a high level, a Pseudo-Random Number Generator, or PRNG, takes this input as short high-end PC and uses this to produce much larger quantities of pseudo-random bits. Given that most modern cryptography relies random input in the form of keys, nonces, IBs, and so on, it's fair to say that a secure PRNG underpins the vast majority of cryptographic applications that we use today. Now at the same time, there's an ever-growing list of real-world PRNG failures that bear out the fact that often when a PRNG is broken, the security of the reliant application breaks down with it as well. And so it's of absolute importance that standardised PRNGs are designed to be as secure as possible. Now, the NIST SP890A standard gives three constructions of PRNGs, each based on a different primitive. So there's a counter-DRBG based on a block cipher, there's HMAC-DRBG based on HMAC, and HASH-DRBG based on a cryptographic hash function. Now, these generators are certainly widely deployed. Indeed, any software or hardware seeking fit certification has to implement one of these constructions. And yet, as we're going to see in this talk, these generators have received surprisingly little formal analysis to date. So one possible explanation for this is that earlier revisions of the standard have, of course, contained the now infamous Julie C, which was then removed from later revisions after the Snowden leaks. So it seems plausible that perhaps because of the attention lavished on the Julie C, the other generators in the standard were somewhat overlooked. So in this work, we set out to address some of these gaps in analysis and to increase our understanding of the security properties of the remaining generators in the standard. OK, so first, what is a PRNG? So at a high level, a PRNG with input is a stateful PRG which has continual access to an imperfect source of randomness that we call the entropy source. So we usually define our PRNG to be a tuple of algorithms. So the initial state generation algorithm takes its input and entropy sample drawn from the source and uses this to construct an initial generator state. The output generation algorithm takes its input, the current state of the generator and returns a fixed length pseudo random output. And the refresh algorithm can be used to incorporate entropy samples that have been drawn from the source into the generator state. Now, there are well-known impossibility results that mean that if we want our PRNG to work with arbitrary and perfect sources, we need to allow each of these algorithms to take its input a random seed. We crucially must be generated independently of the entropy source. Now, as is common with real-world PRNGs, so which is just not feasible to conjure up this independent random seed in practice, none of the NIST-DRBGs are specified to take a seed. So I'm going to brush this issue onto the carpet for the rest of the talk because time is short. But for those who are interested, we managed to sidestep these impossibility results because ultimately our analysis is going to be in an idealised model and we're going to assume a bit more of the entropy source. Full details of what we do are given in the paper. OK, so back to PRNGs. So this is pretty much how PRNGs look in the literature. And as we're going to see, the NIST-DRBGs are specified a bit differently and that they take its input a number of optional inputs and parameters. And I'm going to give two examples of these now. So the first difference is that while we normally think of PRNGs as generating a fixed-length output in response to each next call, the NIST-DRBGs have this parameter which allow variable-length outputs to be requested in each next call. And moreover, these outputs can be large. The standard allows up to two to 19 bits of output to be requested in each call to next. Another difference is that the standard allows for optional strings of additional input to be fed to the generator during output generation requests. These are essentially arbitrary data, could be things like timestamps or device IDs, but it's just an option to get a bit more entropy into the state of the generator if you'd like. So we can see that there's a lot of flexibility in the way that these generators are specified, and we could have two implementations of the same algorithm looking very different depending on how parameters are set. And moreover, we've got something of a gap between how PRNGs look in theory and how the NIST generators are specified in practice. So in terms of security properties, the standard claims that each of the generators achieves these properties of backtracking and prediction resistance. So backtracking resistance says that if at some point in time the state is compromised, then output produced prior to the compromise remains secure and pseudo-random. This property is more commonly known as forward security, and I'm going to refer to it as this for the rest of the talk to avoid confusion. Now, the second property, prediction resistance, says that if at some point in time the state of the generator is compromised and sufficient entropy enters the system via refresh calls, then security should be recovered. Now, we said at the start of the talk that these generators have received surprisingly patchy formal analysis to date. So what we mean by this is that while there have been proofs that counter-DRBG and HMACDRBG produce pseudo-random output, these all make substantial simplifications. So, for example, they only model output generation, not initial state generation and refreshing, and they assume the generators initialise with an ideal random state, whereas in practice, of course, we're going to have to construct the state from the entropy source. So, as far as we are aware, no prior work has analysed the full specification of these generators, including initial state generation and refreshing, and moreover, these stronger security properties that are claimed in the standard are improving. So, in this work, we set out to address some of these gaps in analysis, and we ultimately uncover a mixture of positive and less positive results. So, on the positive side, we prove the robustness of both HMACDRBG and HMACDRBG with a caveat in the latter case. So, on the less positive side, this caveat is that if an optional input is emitted from HMACDRBG, then it turns out the algorithm isn't actually forward secure, contradicting claims in the standard. Moreover, we take a close look at flexibilities in the standard, and we argue that the overly flexible standard allows the generators to be used in ways that may admit vulnerabilities. So, I'm going to try and give a flavour of all these results in this talk today. So, first, we're going to take a closer look at forward security and prediction resistance. So, we're going to begin with this somewhat surprising result about HMACDRBG. So, you'll recall at the start of the talk that I said that standard allows these generators, that standard allows for optional strings of additional input to be included in output generation requests. Now, it turns out that if HMACDRBG is called without these optional inputs, then it's not actually forward secure, which directly contradicts claims in the standard. So, to see where the problem creeps in, we need to take a closer look at how output generation works in HMACDRBG. So, essentially, the state of the generator consists of a key k and a counter v. So, to satisfy an output generation request, the generator is essentially going to iteratively hash this counter using HMAC multiple times, producing an output block in each iteration until sufficient mini blocks have been produced to satisfy the request. And then once this is done, both key and counter in the state are updated via a number of extra HMAC applications. Now, it turns out that if additional input is not included in the call, then there are fewer HMAC applications performed in the state update. Now, the effect of this is that so when these extra updates are omitted, it turns out that the updated state is such that the counter component of that state is computed as a deterministic function of the key component of that state and the final output block produced in the call. And, of course, because we're considering forward security, which imagines a state being compromised after output production, this is all information which is available to the forward security attacker. So, all he needs to do is test whether this relation holds, and if it does, he knows he's receiving real output and he's challenged with overwhelming probability. So, while this doesn't necessarily lead to, like, a devastating real-world attack, I think it's quite interesting that this flaw exists at all. And I think it really underlines the importance of formally proving security claims even when they seem obvious and also taking care with these flexibilities because here we've seen how this quite innocent-looking choice to omit an optional input actually changes the structure of the algorithm and omits a vulnerability. So, that's our kind of surprising negative result, and, fortunately for us, the rest of the results in this section are more positive. So, you'll recall that the standard targets these properties of forward security and prediction resistance, but our understanding of what a good generator should achieve has moved on a lot since then. To these days, the property we really want from a good PRNG is this notion of robustness, first formalised by Dodas et al. So, in this work, we analyse robustness of hash-drbg and h-max-drbg used with this additional input. The reason we focus on these two algorithms is that we have to do quite a lot of work to adapt the model to accommodate these generators, and since these two are hash-based, they both fit naturally into the same model. So, robustness, in a robustness game, we essentially set our generator running and then we give our attacker a number of oracles modelling ways in which they can compromise the generator. So, they can refresh the state with entropy drawn from the source. They can request a real or random output as a challenge. They can compromise the state of the generator and they can even set the state to something of their choosing. And of course, all these oracles are defined to prevent trivial wins. So, the first thing we need to do is to adapt this model to accommodate the NIST-drbg. So, we extend the interface by which the attacker can request real or random outputs to allow both outputs of varying lengths to be requested and for additional input to be included in calls to reflect that the NIST generator support these capabilities. And the second adaptation is that our analysis is going to be in the random oracle model. So, we have to extend our model to accommodate this. Now, this is clearly a heuristic, but it turns out when analysing these generators that we run into a number of technical issues that make it seem that making some kind of strong idealised assumption about the underlying primitives is going to be inherent. And we took the view that we really wanted to analyse these generators as they're used rather than trying to tweak the construction to get the proof to go through under weaker assumptions. And so, we were willing to accept heuristics if it meant we could construct a proof. So, we construct proofs of the robustness of both hash-drbg and HMAC-drbg. The hash result is fully general. The HMAC result is with respect to a more restricted class of entropy sources, which nonetheless includes all those permitted by the standard. So, I guess a kind of obvious question here is that we're trying to prove something about pseudo-randomness in the random oracle. How hard can this be? And it turns out that the answer is much harder than it needs to be. So, firstly, this is a general point, but robustness is such a strong security property that even working in idealised models is non-trivial to show that PRNGs can withstand these strong forms of compromise. But what's interesting is that we find that a number of seemingly innocuous design features of the misgenerators turn out to significantly complicate analysis. So, for example, we run into all sorts of awkward state distributions and cases where states at different points depend on each other when we'd really like them to be independent. This, in turn, introduces multiple cases into proofs and requires us to restate and improve useful results that we use for analysing the bus PRNGs. There's also numerous examples of things like domains not being fully separated, so we have to be very careful to deal with accidental collisions, and in short, there's just a lot to juggle. So, I kind of highlight this, because I think this really underscores how beneficial it is to design and prove the security of schemes simultaneously wherever possible. There are so many places here where a really small tweak that wouldn't really have impacted efficiency would have made a huge difference to analysis. OK. So, apart from this kind of flaw we identified in HMAC, the RBG, we've seen that these generators hold up quite well when analysed in this robustness framework. However, we've also seen how the misgenerators don't, well, their kind of optional inputs and parameters don't really look like PRNGs as defined in the literature which these models were originally designed to capture. So, this raises the question of is robustness sufficient to capture all attacks against these generators that we should be worried about? So, to answer this question, we take a closer look at flexibilities in the standard, and we identify two cases in which the overly flexible standard allows the generators to be used in ways which may admit vulnerabilities. So, in this talk, I'm going to be focusing on this flexibility to request large and variable-length outputs in next calls. So, to see how this can be problematic, we need to take a closer look at how output generation works in the NIST-RBGs. So, it turns out that under the hood of the next algorithm of each of the generators, there's this internal function that sort of acts as an internal PRG. In the sense that, on each notification, it returns a fixed-length pseudo-random output. So, to generate these variable-length outputs, each of the NIST generators essentially proceeds as follows. So, it iteratively applies its internal PRG multiple times until sufficiently many blocks have been produced to satisfy the request, and then it performs its proper state update, and it's this state update that's designed to give forward security. So, for example, for a county RBG, this iterative process corresponds to running a block cipher in counter mode. So, given that up to two to 19 bits of output can be requested in each call, this corresponds to up to two to 12 AS computations with a fixed key in each next call. So, we can see that there's potentially a lot of active computation going on under the hood of this next algorithm. And this raises the question of what would happen if an attacker were able to compromise part of the stage of the generator during this iterative output generation process via a side channel? And, of course, robustness tells us nothing about this because robustness only considers the effect of state compromise after the state's updated at the conclusion of a call. Now, there's also an efficiency consideration at play in that these proper state updates slow things down. So, what emerges as an appealing youth choice in terms of efficiency is to generate all the output required for your application up front in a single call to next, and then you're going to buffer this to be used for different purposes in the application. So, the effect of this is that we'll have some portions of the output being used for secret values, such as keys, while other portions of the same output are going to be used for public values such as nonces. So, we find ourselves questioning how secure is this kind of batching up of output approach if we're in a situation where partial state compromise during output generation via a side channel is a realistic concern. So, to address this, we propose a new and kind of more informal security model in which we imagine our generators being used to generate multiple blocks of output in a single request. And we propose our attacker is able to learn partial state information at some point during the output generation process in addition to a single output block which may be made public, for example, of the nonce. And we're going to challenge our attacker to compute unseen output values. So, we assess the ability to compute output both before and after the compromise block within the call. And we also assess the ability to move past the call and compute all future output. So, we analyse each other in this generator in our framework and we find that each admit vulnerabilities with the counter DRBG faring especially badly. So, the kind of high level intuition for this is that you'll recall that these variable length outputs are generated by effectively iterating a kind of underlying PRG multiple times. And it turns out that these underlying PRGs aren't for secure and in fact security breaks down quite badly if part of the state leaks in a way which allows recovery of unseen output. And because counter DRBG is based on a block cipher which is invertible, it fares especially badly. So, these put security can centreise when the generators are used to produce large outputs in a single call and which raises the question of are the generators used like this in the real world? So, to try and get some insight into this, we took a look at the open SSL implementation of counter DRBG to see if it allowed for large outputs to be requested. And it turns out that not only did it allow for large outputs to be requested, it actually didn't set any up a limit on how much output could be requested at all. And so, this implementation isn't actually complying with the standard in which this up a limit is required. And moreover, this suggests that real world implementations of these generators allow use in the waves that we highlight as potentially problematic. Okay, so, just to sum up, we took an in-depth look at the standardised CEDA random number generators in this SP890A and uncover a mix of positive and less positive results. So, while we formally prove a number of security properties claimed in the standard, we also uncover a number of security concerns. Now, we're certainly not trying to say that these generators are totally broken. A lot of our tags are quite theoretical. But we think these are good to point out and we hope that our results might be useful to help inform implementation choices. So, I'll conclude with some recommendations. So, on the practical side, standards should not be overly flexible and all security claims in them should be proven. And moreover, wherever possible, it's so beneficial to design and prove the security of constructions simultaneously, rather than trying to come up with the proof after the fact when all the details are fixed. On the theory side, we repeatedly found examples of whether there's something of a gap between how PRNGs look in theory and how PRNGs look in the real world. So, it's really important to always be looking to develop new models to better capture real-world PRNGs. So, that's all from me today. Thank you for listening. Any questions for Joanne? Greg? Just more a statement than a question. This is incredibly valuable work. Thank you very much. Aw, thank you so much. But I think we should also ask ourselves whether standards should actually aim for such a strong security property as you said them to achieve, because I think flexibility is good to have in standards. I mean, they could aim for a lower security guarantee if we are aware of that and we are aware of the type of attacks that we can get. Because then, of course, you have to push the decision for the developers to decide for my application what's the right solution to do that. So, I don't know, is there a strong reason why we should aim for this strong security property of robustness? Interesting question. I think robustness really captures what we want from pseudo-random number generators in general in most applications, because often we really want to know that our generator is able to recover from state compromise if necessary. Perhaps, I guess, for different applications you are going to require different levels of security, so perhaps there is room in the standard to give different versions of the generators, as you say, with some achieving less security properties, but really clearly stating what's achieved for each one. I think the problem with this standard is that, and to be fair to the standard, it was published long before these notions of robustness were introduced, so they couldn't have known about them, but the security properties are very, very defined and, of course, are unproven, so it's kind of unclear what guarantees you're getting. So I think whatever is just really important that standards are very explicit about what you're getting from each generator. But you had technical issues in the proof that required you to make idealised assumptions. Is that assumption just a random oracle model, or is it some other kind of assumptions in the proof? So what kind of technical issues? Is this solvable by other means than the idealised assumptions? OK, so our assumptions are, our analysis is in the random oracle model, and because these generators are seedless, we make a stronger assumption on the entropy source, essentially assuming that it is not allowed access to the random oracle. This isn't ideal, but it's kind of like, at the time of doing this work, this is the kind of only way we could get around not having a seed. Interestingly, some very recent work has just come up with a new model that addresses seedless RNGs, but that... So does that mean something in practice should change? I mean, to have it more secure, or is it just for the proof? Does this mean that the source of entropy is such a hobby? Yeah, I think... Certainly for hashed-rbg, even if you're literally proving the output at pseudo-random, you're probably going to need a random oracle because you're producing output by basically hashing in counter-mode, so there's no secret to be used, you're going to have to assume a strong assumption. I think... I feel like for real practical constructions of pseudo-random generators, you're very likely often going to have to make a strong assumption that you want to be using all very fast metric primitives, which you often have to assume a bit more of. I don't think it's a terrible thing at all to work in a random oracle model or something if it means that you get a practical construction. Okay, thank you very much. Next I'll thank all the speakers of this session, and we have now a coffee break.