 Hi, so I'm going to be talking today about some recent work with John Paul de Gabrieli, Kenny Patterson and Jacob Schott on backdoors and pseudo random number generators. So the story really begins here with the Snowden leaks in 2013. So overnight the threat model changed and we needed to start considering governments as potential adversaries. Now among the Snowden revelations was evidence that the NSA had colluded with software providers to insert backdoors into their cryptographic software. So when we talk about a backdoor, we mean a deliberate vulnerability in a scheme coupled with some secret backdoor information. And it's designed in such a way that if you don't know the backdoor, the scheme appears to be secure and doing its job. But if you do know the backdoor, then you get some advantage in subverting the scheme. Now a good sort of randomness is essential for many cryptographic applications. And correspondingly, a pseudo random generator or a pseudo random number generator of input is pretty ubiquitous in cryptographic implementations. And we'll define both more formally as we go through. But the key difference is that a PRG is deterministic, whereas a PRNG is able to gather entropy from its operating environment. Now it's the ubiquity of these primitives and implementations coupled with the fact that often if the source of randomness fails, then the security of the scheme which is relying on it crumbles as well, which makes these very attractive targets for attackers. Sorry, okay, so it makes these very attractive targets for an attacker who might wish to optimize the impact and the spread of a backdoor. And this isn't just conjecture, we know this has really happened with the infamous Julie C, which achieved widespread deployment and has been shown to be exploitable in practice, and which more recently reared its ugly head again in the June of her firewalls. Thank you. Okay, so in 2014, Valarie, Paterson and Roggaway revamped the field of kleptography for the post-snowden world. And in the process, kickstart a whole new line of theoretical research into this area. And it's very much into this strand of work that ours fits. Now, in particular, the first post-snowden treatment of backdoor PRGs was by Doe de Zetal in 2015. And we take this work as our jumping off point here. So the question we see to explore is, to what extent can a pseudo-random number generator be backdoored and simultaneously provably secure? So to this end, we strengthen existing results on backdoor PRGs and we initiate the first study of backdoored PRGs of input. So we come up with definitions and models. We present a construction of a robust backdoor PRG and we have an impossibility result, this theoretical lower bound which links the so-called backdoor ability of a robust PRG to its state size. So I'm going to try and touch on all of these today because whilst the work on PRGs is probably our main contribution, I think the work on PRGs is quite a nice way of introducing some of the ideas that we're ultimately going to use in that later work. So what is a PRG? So a PRG takes a short, truly random string as input and outputs pseudo-random bit strings of arbitrary polynomial length. And we modify the syntax slightly here to facilitate our backdooring definitions. So formally, a PRG is a tuple of algorithms where setup outputs a pair of parameters for the generator and PP denotes the public parameter, which is an input to all other algorithms, and we're going to use BK to represent the secret backdoor key. Now in it outputs an initial state for the generator S0 and next takes as input the current state of the generator and returns a public output R and an updated state S prime. And you can see it's essential that the state is kept secret because any attacker that knows a state has everything they need to be able to compute all future output. Now when we talk about PRG security, well at the very least you want these outputs to be indistinguishable from random bit strings. But we're actually going to be interested in PRGs with a stronger property, which is that of forward security. So forward security says, suppose at some point in time the state of your generator gets compromised. Now clearly all bets off about future output, what forward security asks is can previous outputs remain secure and pseudo random even conditioned on knowledge of the state. So now to backdoor PRGs, which we define in line with the definitions given by Dodas, Sertal and their PayPal. So our backdoor PRGs are a tuple of algorithms. We've set up in it a next just as before, but now we add this new algorithm B, which is highlighted in red where B stands for big brother and this models our backdoor adversary. Now big brother is internal in the sense that he's built into the specification of the PRG itself, but he's also kind of external in that other than getting the backdoor key, he can only observe public outputs and parameters. Now there are various different ways that big brother might seek to benefit from a backdoor. So for each of these different backdooring goals, we write a game which captures that goal and an advantage term to measure how well big brother does. And we'll see an example of this in a couple of slides time. So putting it all together, we say our tuple of algorithms is a backdoored forward secure PRG of a given type. If without knowledge of the backdoor, the algorithms form a forward secure PRG, but big brother with his backdoor key gets some advantage in subverting the scheme. So the authors in the paper from which we take our definitions present a number of different constructions of backdoor PRGs. However, none of these allow big brother to recover past output values while simultaneously being forward secure. And when you think about it, the two seem kind of odd, right? Because forward security is all about protecting past outputs and we want to subvert them. So this is an open problem. Can these two properties coexist? And it turns out the answer is yes they can and even worse. So we come up with a new backdooring model in which we initialize the generator with some state as zero. Then we run the generator forward to produce Q outputs. Then big brother is given one of those outputs, just one and the secret backdoor key and we challenge him to recover the very first state of the generator. And you can see this is a very strong form of compromise because of the first state. He can compute everything that happens subsequently and you can actually show that this is strictly stronger than any of the models given in the paper by Dota set up. Now we come up with a couple of constructions of PRGs which achieve this. But I'm gonna talk about one in particular today because we're ultimately gonna use a similar trick when we backdoor a PRNG. So the key components are we take a forward secure but non-backdoor PRG and we're gonna augment it using a special encryption scheme to give it a backdoor. So this encryption scheme has ciphertext that look like random bit strings and it's statistically be randomizable which means we have an efficient algorithm that takes input a ciphertext and some random coins and it returns a new ciphertext encrypting the same underlying message and to be statistically be randomizable means that an honestly generated ciphertext and the re-randomization of a ciphertext generated with arbitrary randomness are statistically close. Now in the full construction we need a scheme to have a few extra properties that I'm not really gonna touch on here but it can easily be seen to be achieved with Elgamal with carefully chosen encoding schemes. So for the construction we begin by simply generating the parameters for the underlying PRG and the encryption scheme and we make everything public except for the secret decryption key which perhaps unsurprisingly we're gonna keep back and give to big brother. So to cross the initial state of our backdoor PRG we begin by generating a state S0 for the underlying PRG which we then encrypt and it's these two components together which form the state of the backdoor PRG. So when next is called we need to do two things we need to produce output and we need to update the state. So to produce output we simply output the ciphertext in the state as it is and then to update the state we call the underlying PRG on S update the state accordingly and use the output produced to re-randomize that ciphertext in the state. So to see how this helps big brother now by construction whichever output he's given is a ciphertext encrypting S0 the first state of the underlying PRG. Glossing over details here but essentially this gives him everything he needs to be able to reconstruct the first state of the backdoor PRG and in the process all other states and outputs. Now the encryption the correctness of the encryption scheme ensures that big brother succeeds in probability one so it's the very effective backdoor and due to the properties of the encryption scheme all of these ciphertexts appear pseudo random and independent of each other. So the forward security follows from this and the fact that the underlying PRG itself is forward secure. So at this point we're kind of forced to conclude that unfortunately deterministic PRGs are totally susceptible to backdooring. It's hard to think of a much more devastating form of compromise than that that we've just seen. So it makes sense at this point to turn our attention to PRG's of input both because they're widely deployed in practice but also because intuitively you feel that this flow of entropy into the system is gonna make big brothers job harder and it turns out it does to an extent. So what's the PRG? We define PRG's in line with the model given by Doe de Sartel in their 2013 paper. So essentially a PRG is a deterministic PRG with a refresh procedure added. So refresh is an algorithm that takes an input that generates a state and some entropy input I and it combines these to produce an updated state as prime. Now these entropy inputs are gathered from a source of randomness which may be imperfect. In the real world this will be drawn from things like disk timings and keystrokes. So to model this process we use an algorithm D which we call the distribution sampler. And in particular the addition of this refresh algorithm means that we expect a good PRG to be able to recover from state compromise provided sufficient entropy that enters the system. So to capture this the strongest security notion for PRG's is that of robustness. So here we imagine a scenario where we set the generator running and we flip a bit B which determines whether the adversary will see real or random outputs. Now in contrast to the PRG setting where the adversary is passive and can just observe outputs here we give him a number of oracles which reflect different ways in which he might compromise the state of the generator or influence if entropy source. And what robustness says is that even in the face of all this compromise and all this interference no bounded adversary can work out what that challenge bit is much better than guessing. So this is a very strong security property. Now the authors in the paper from which we take our definitions present a construction of a provably robust PRG of input and conveniently for us it has output which is produced by a deterministic for the cure PRG. So all we need to do is swap this with the backdoor one that we just spoken about tweak the entropy accumulation of it and we immediately obtain a PRG of input which is backdoor in the sense that now big brother can take one output and he can compute everything going back to the previous refresh call and everything up to the next refresh call. But there's a snag here right because as soon as the generators refresh with sufficient entropy the state is going to evolve to something which is statistically unpredictable and big brother is going to need to capture another output in order to regain his backdoor advantage. So it becomes clear that if we're gonna backdoor a PRG in any meaningful kind of way we need to be able to move past these refresh calls. This motivates our backdooring models. So scenario we imagine is a PRG is initialized with some state as zero. Now as zero is an evolve by a sequence of refresh and next calls which we record in a refresh pattern and ultimately we're going to use this pattern as a parameter of the experiment. Now at some point we stop and we give big brother some arbitrary output R i along with a secret backdoor key and we see what he can do with this. Can he produce some arbitrary output R j or some arbitrary state or can he do even better and go all the way back to the beginning computing everything as we were able to in the setting of deterministic PRGs. So we present a construction of a robust PR and U of input which allows big brother to recover arbitrary past outputs even when they are separated by these refresh calls. So the key components are we take a robust but non backdoor PRNG and again we're gonna augment it using our special encryption scheme. So to construct the initial state we generate a state S for the underlying PRNG which is shown in yellow here and you can think of this as the active part of the state of the backdoor PRNG because we're then gonna add to it a whole lot of redundant space which is shown here in blue and we're gonna use this space to store encrypted snapshots of the active part at crucial points during its evolution namely after refresh calls. So at a given point in time the state of our backdoor PRNG is gonna look like this with a set of encrypted snapshots stored in the state. So when refresh is called we simply apply the refresh algorithm of the underlying PRNG to the active part and update it accordingly. But now at the conclusion of a period of refreshing we take an encrypted snapshot of the active part of the state and we store it in the state of the backdoor PRNG by shifting everything else down to make room for it. So you can see that as much as we have taken on new information here we've also lost something. The last side of the text has been pushed out the state and we're not gonna be able to get it back again. And you can also see that the number of encrypted snapshots we can store is limited by the size of the state. So now we have a state that has a lot of useful information encoded in it and we need to get this out to big brother somehow. So to do this we produce output in two distinct ways and we alternate between them in a way that appears pseudo-random. So the first way is we simply leak encrypted snapshots in the form of output. Whereas the second way we compute output by applying the underlying PRNG to the active part of the state and crucially output produced in the second way is reproducible if you have the right encrypted snapshot. So for big brother to succeed here a couple of things need to happen. Firstly it needs to be the output he's given consists of the encrypted snapshots and that the output he's targeting is one that was produced in this latter way. And also it needs to be that the output he's targeting is in range. In the sense that there haven't been so many refresh calls between the two outputs that the snapshot he needs has been pushed out the state and lost. So correspondingly big brother succeeds with probability approximately a quarter to target values in range and zero otherwise. And throughout careful re-randomization of all these ciphertexts ensure that they appear pseudo-random and independent of each other. And security follows from this and the robustness of the underlying PRNG. And there are many optimizations and variations possible on this basic scheme. So suppose for example, you were targeting a specific output value that was going to be exposed to the non-center protocol and you could make sure that the encrypted snapshot you need is always stored in the state and achieve a much better success probability. Now in reality as you can see the situation is not quite so simple. And that's because robustness is such a strong security guarantee that to be able to give the state all this extra structure and still prove the generator to bust takes some quite delicate work. But it's possible and so we're forced to conclude that unfortunately PRNGs are also susceptible to back-during and even the really strong property of robustness is no guarantee against us. But there is some glimmer of hope here because you'll notice that our back-door PRNG has a very large state and that Big Brother's ability to go back is fundamentally limited by the size of the state. So this raises the question of is this somehow inherent? And it turns out that the answer is yes. So in our impossibility result we prove that for restricted but still important class of distribution samplers that there's a limit dependent on the state size to how much information about previous states even an unbounded adversary can recover. So to give this equation some context we're imagining a scenario where we run the generator Big Brother's given some output R and we challenge him to recover not just one previous state this time but a vector of J states each of which has been separated by a high entropy refresh. So that's the left-hand side of this equation and because my entropy measures how difficult it is for an unbounded attacker to predict something if Big Brother who is bound is gonna stand a chance the left-hand side is going to need to be very small. Now making Big Brother's life more difficult on the right-hand side the epsilon term corresponds to the level of robustness security of the generator. If the generator's robust epsilon's gonna be very small and the log of it reciprocal is going to be very large and even worse this large term scales linearly with J so the more refreshes we ask Big Brother to try and bypass the larger and larger this term gets. Now the only thing making the right-hand side smaller is this minus N term where N is the state size of the generator but of course this is fixed in the beginning so as we ask Big Brother to go back further and further and J continues to grow at some point it's gonna hit a wall where the right-hand side is simply too large and Big Brother hasn't got a chance of being able to recover the information that he wants. So in conclusion the bad news is that despite their strong security properties both for secure PRGs and robust PRNGs of input are both susceptible to back-douring but the better news is that robust PRNGs do offer some inherent resistance. So in the full version of the paper we have strengthened our impossibility results to get bounds on recovery of past output values and prediction of future output values and avenues for further work would be to consider immunizers so ways in which we can post-process the output of the PRNGs try and diminish Big Brother's advantage and it would be good to have tight bounds for our impossibility results and perhaps most importantly of all we know robustness isn't enough but can we find another property of PRNGs which excludes the presence of a back door? So that's the conclusion of my talk thank you for listening.