 Your speakers are Felix Durer, he is a student at the Castro Institute for Technology. The baseline for this talk basically was his bachelor's thesis, which was actually awarded the International Student Award for Undergraduate Research at the University of Vienna. We've been on stage is Vladimir Klebanov, who supervises the thesis and provided feedback and guidance, and together they also developed an automated tool using that framework. So please welcome them with a warm round of applause. Thank you very much. This work was supported by the Castro Institute of Technology and also the German National Research Council under the priority program RS3, Research into Reliably Secure Software Systems. And some time ago we started to ask ourselves the question, how do we know that our PRNGs are working properly? And not soon after we started asking the question, we actually realized that the answer is spoiler alert, we hope they do. And this is an important question, does anyone have any idea what this list represents? Well, the younger generation probably does not remember this event, but the older generation will know that this is the list of services that were affected when Debian misspatched the open SSL PRNG, and as you can see this is almost everything. So a wonderful example of a single point of failure here. And well, this was a really defining moment in the history of pseudo-random number generation. This is a really great example of a bug, a great story. We will go a little bit into technical detail later, but I would encourage you actually to Google it and to read the full account of what happened. This is a really thrilling story, a fine Greek tragedy where technical, human and organizational factors aligned to create this perfect storm that actually caused, well, for two years a backdoor in all of these things. And well, at some point it was discovered, there was a talk at the 25th Congress about that, and then one could ask ourselves, well, what actually happened afterwards? So what were the technical measures taken to prevent reoccurrence of such incidents in the future? Well, the technical consequence of the disaster was to add these two comments that you see here. So what happened, how the disaster came to be, is that the Deben maintainer, due to some quite reasonable logic, removed this particular line. In the code and essentially one reinstated the line and then added the comment, do not remove this line. This is a very specific remediation and this was it. So in the end we will see that one can actually do better things to make sure that PRNGs are working correctly, but also that there are good technical reasons why quality assurance is difficult, but nonetheless. And while we're here talking about comments, I would like to show you some more comments from other implementations of other PRNGs just to give you a feeling of what the state of implementation is in the wild. And these are not some obscure implementations. These are actually things that you have on your devices. This is what we use every day. And you might be saying, well, you're not being fair. You are taking things out of context and indeed this is in some sense right, but I can also tell it's only mildly unfair and there is not a lot of context accompanying these things. And also they really do transport the feeling that at least I get when I look at that code. For example, so in one implementation we read hash bytes to use defines number of binds returned by compute hash to use to form byte area returning by the next bytes methods. Note that this implementation uses more bytes than it is defined in the above specification. What? So it is really difficult to understand what is going on here and to this day I'm actually not quite sure. I think the author just wanted to say we defined that the SHA1 hash returned to digest of 20 bytes, but in the end the intent of this message is completely unclear. And a few more, so here is another one. Put the data into the entropy at some data from the unknown state recede. We see terminology is really difficult. It is so, but you have to make an effort. Take some random data and make more random looking data from it. I don't know what it means. The only comment at a particular place in the code. This is the code that has the potential to put a backdoor into all your applications. Yeah. Here is another nice statement. This is from the end user documentation of the PRNG and here one has really to say, hey, you have end user documentation. Great. Really fantastic do this. But of course nobody is perfect. This is a particular function. It just says be very careful. Otherwise bad things happen and you have no idea why you're supposed to be careful, how you're supposed to be careful, what's going on. To make the point a little bit more precise. I'm showing this of course to entertain you, but the idea here is not to blame the developer and not to say, well, this was a bad developer. Then this wouldn't be there. And I will not make these mistakes, of course. No, if you think so, you have it coming. The point is that there is, well, there is a process and there are actually natural forces that make things the way they are today. And typically things start with some kind of personal project or research project or sometimes it's a clean room implementation. And well, slowly you put your code out there and then people notice and then maybe in the meantime it ends in a bigger project that is under the auspices of some respected entity like the Apache foundation or Mozilla or something else. And then in the end of the supply chain there is some kind of dealer, I mean, vendor who has a need for a Pyrin G and then they will go and say, oh, there's this code and it looks really great. We'll just incorporate that into our product and we will not do any further due diligence. We solve our problem, ship it. And I think these guys are actually responsible for most of the problems that we have. The vendors who take the open source code do not put in the effort and just include it into the product and ship it. So we need to really take care of the process which is putting software on our devices. I don't really have a better recommendation here. But let me show you one more example. So this is more or less the only comment attached to a critical function of yet another Pyrin G and I can assure you if you know what's going on, it makes total sense. But if you do not know what's going on, you have no chance to understand it. So we need documentation but we need actually to write documentation for people who do not understand what is going on because otherwise your documentation is useless. What we find works best is if you have something like a one-page design document where you explain in abstract terms, in mathematical terms the design of your Pyrin G and then you can refine that into the implementation. So it's not just sufficient to add some comments to the functions that you have in the code. Well, I should say it is really good that you have this kind of comment there. So it abstracts in a way from the code but you have to phrase it in a different way. You have to target a different audience. And in this Pyrin G at least there was a link to a scientific paper that described kind of the inspiration for this kind of schema and that part was really helpful. Anyway, you hopefully get an idea that there is a need actually to do something and there is a need to improve state of things and there is a need for quality assurance. And well, what are the options for quality assurance that we have today? So here is the top five of things you may or may not be doing. And the first one is what one does with whatever other software that you have. You typically have some kind of system test or maybe you even apply some formal methods to verify that your software does what you think it should be doing. But in this particular case, this is very difficult. Well, I would say impossible. We have to say that the specification of a Pyrin G it's a piece of software that takes a little piece of random data somewhere from outside of the system and then stretches it into a stream of other data that is indistinguishable from random to a computationally bounded observer. So it is actually not a property of single output stream. So you cannot really check this property by looking at one output stream that you are having. You cannot write a test for the particular stream that actually tells if this is true or not. It is a property of all the output streams that a Pyrin G can produce. So this is really difficult here. The thing that one could be doing is to have unit tests for individual units that have well-defined functional behavior. For example, we notice also from implementations in the wild that it is difficult to implement a circular buffer. It is indeed surprisingly difficult to do so correctly. But of course, no self-respecting C programmer goes and says, I will implement the data structure and I will write tests for this data structure and then I will use this data structure in my other code. No, everything is baked into one monolithic thing because that's what one doesn't see or because performance, I don't know. It doesn't really make sense. So the recommendation here is clearly try to make a code modular. Let the compiler take care. The compiler will inline things. Don't worry about that. There will be, if any, very small performance overhead. But then you obtain units with clear functionality and you can test it and be safe on the side. Don't risk things. Then another kind of testing that one sees from time to time and also this is recommended from time to time is statistical tests. There is a whole bunch of them here, a few examples. What these do, they look at the individual output stream and they try to find bad smells in the stream. So if your stream has way more zeros than once, then this test would say, ah, this is suspicious. Okay, you can do that, but it is not helpful and it is not helpful for one reason because most of the PRNGs we are talking about here are cryptographic PRNGs and that means on one side that they are intended to be used for cryptography, the output is intended to be used for cryptography, and on the other side that they actually have cryptographic building blocks within them. And these cryptographic building blocks have the property that once you pipe whatever you have through them, the output will immediately satisfy this statistical test. So your PRNG can be as almost as badly broken as you wish it will still pass this test. So it doesn't hurt to do them. Sometimes regulatory agencies require them. There are remnants from the days past when the PRNG output was not meant for cryptography, but essentially this is a waste of your time. You can skip this step. On the other hand, what is useful, but what is also not done enough is regression testing. Regression tests are collections of reference seeds and reference outputs. And by comparing the output of your implementation with these reference values, you can at least be as good or as bad as other people, in particular if you are implementing this particular PRNG standard, which not many people do, but some do. You have these reference values in the standard, and so you can be as good or as bad as the standard, and you can hope that more people have looked at the standard. So also if you change something, if you do maintenance and it is not supposed to change the output, then at least you can be sure that you didn't change something that you didn't want to. So do regression tests. This is really helpful, not done enough. Well, almost the only thing that we see for many, many implementation, this is really the only thing that stands between you and the disaster, this is the manual code review. So people are just looking at the code and say, it's fine, and think about it. Would you accept such a situation for any other piece of software? It doesn't matter how unimportant. Would you feel fine if I told you this piece of software was never tested? We looked at it, we think it's fine, but we never tested it. We never did any technical quality assurance. Would you accept that situation? This is what you have at the center of your security infrastructure. So, we do not only have bad news, but we also try to improve things a little bit, and here we did two things. First, we looked at many, many incidents with PRNGs that occurred previously, and we identified a particular property that has been violated in many of these incidents, and on top of that we also developed a tool, we developed a method, a particular specialized static analysis, that will find violations of this property in your implementations, in real world implementations, open SSL and the like. And in this talk we will try to explain what the property is, what the tool does, and what were the results of applying it, actually, to implementations. But first I need to explain a little bit how a PRNG, a typical PRNG, works, and this is a very simplified model. In practice, of course, things are more complicated, but nonetheless, in the middle you have some kind of state, sometimes called entropy pool, but this is just the state where entropy is collected, and then there is a seeding function that takes this little piece of random seed, that is, has its origin somewhere outside of the system. Basically you ask the operating system for it, and the operating system will also typically derive it from hardware, from disk latency, from timing of your keystrokes, from interrupt timing, and so on. So then you obtain this kind of very short, about 20 bytes, typical value of random seed, and you transfer it into your state, and the rest of operation goes in cycles. You somehow perturb this internal state, and then you derive a little chunk of output from the state. And then the next cycle begins, so you perturb the state again, and derive the next chunk and so on and so forth, and then you obtain a stream. In the model, all of these things that you see here are deterministic. So all of this procedure is completely deterministic, and the only source of randomness, the only source of determinism is actually the choice of the seed. So we will go ahead and we will actually treat this whole cascade as a function g from an m-bit seed to an n-bit prefix of the output stream. g is for generator, and you will see the symbol from time to time during the talk. This is what we are concerned with. And before I explain the problem that we actually solve, I would also like to make clear that there are a lot of problems that we do not solve, and it is important to name some of them. So the first problem that we do not take care of is the choice of the seed. This is a difficult problem, but it is completely orthogonal to making sure that this procedure is working correctly. And there was a talk about a seed selection yesterday, and well, there are also popular mistakes that you can have in the choice of the seed. For example, if you ever seed with something related to time, you will have a bad time. For reasons first, because typically the resolution of the timers is not large enough, so your seed will have a small range. And also because time is often a publicly known entity, so the attacker might actually know the seed, and then all the construction collapses. And by the way, if you are doing this in the embedded world, then I'm really sorry you have very few options to get good seeds, but again, luckily not our problem. The other thing we are not considering is, typically all of these functions that you have seen will contain cryptographic building blocks, and here we assume that these blocks are there. We can actually check if they are there, but for the time being we will just assume that they are there. And also we will assume that they are really one way in particular colloquial sense. There has been only one incident, the dual elliptic curve generator. Also quite thrilling story, you can Google that, where it turned out that these functions were not one way for everybody, but they were not one way for the NSA. But that's a completely different thing. And another thing we do not take care of is we do not take care of very powerful attackers. So here we assume that the attacker knows the code of your PRNG, but they do not know the seed and they do not know the internal state. If you have an attacker that knows some of these things, you are on your own. If a three-letter agency is standing with a tempest when in front of your building, sorry if we cannot help you. But we cannot, we can help you detect entropy loss in your implementation. And entropy loss, there are several different ways to define entropy loss. So all of the following are equivalent. For example, if you have two seeds that produce the same output stream, this is entropy loss, so there is a collision in your PRNG. And this is equivalent to the fact that part of your seed, this is where the collision will be, is not used for the output. It also means that there are fewer possible outputs than you have seeds. And here you already get the intuition why this is bad. So you are wasting some of the entropy you have collected in your seed. And probably you are wasting quite a lot of it. And then your seed, your output will be much more predictable than the seed. And this is bad, we do not want this. Mathematically, it all boils down that you have a problem if the function g is not injective. This is what we will be looking for. So we have a whole machinery to check if functions are injective. Security, by the way, that's an interesting observation, is often about injectivity. It's either about injectivity or non-injectivity, but many things like privacy and integrity and so on can be defined in these terms mathematically. Just quickly, this is just a mathematical formal definition of injectivity of the function g, and this is indeed what our tool will be checking. But now Felix will show you examples of entropy loss so that you get a better intuition. So let's first come back to what happened in the Debian OpenSSL disaster. So this is the model that we slightly adopted for the Debian PRNG. We adopted it the way that we have here in the mix function, additionally added the PID. So the OpenSSL PRNG does not only take the seed as a random input, but it also takes the process ID of your process as random input used in the mix function to perturb the state and then derive output. So in what happened, the line was removed and, yeah, the seed was cut off. Badly, they had the PID here contained in the mix function so the output of the PRNG was still random looking. So if you have a PNG and get all this the same key, someone would have noticed that, but this way it was not one key, but 32,000 keys, so it was not easy to notice that. And as you can see, that's an easy instance of entropy loss. So you're not using card of the seed, you're not using anything of the seed. And what we can learn from here is you should separate your PNG, yeah? Take the random data at one place, use it in some way, but not add random data at different points in your code. That would also help us later when we talk about how we verify that the code works correctly and does not lose any of your entropy. Another prominent example is what happened in the Android seeder and the number generator. So this is a large Java integer array that's used in that PRNG. It's called seed, but the seed is only here. Generally, it's the whole internal state of the PRNG and the seed should be written from Java integers 0 to Java integers 4. Yeah, that would be 5 Java integers of seed. That's a good number of output streams to not be able to do attacks, but what accidentally happened, yeah, they would want to write a counter and a padding here afterwards. But what they did is they did not update their pointer, so they wrote the counter and the padding here, overriding more than half of the seed, so what was left were two Java integers, yeah, 64-bit. That's not much. That's more than what was left with Davian, yeah, but it's, you see, more than the half is gone. So that's a clear example that you cannot detect such issues when looking at the output and looking at how the PRNG works. Yeah, and the another bug, that's the one that we detected, was libgcrypt and we'll discuss that in greater detail later. So what do we do to analyze? First of all, the user has to take the user of our tool, the user that wants to analyze has to take the PRNG and isolate the PRNG, isolate the deterministic part, remove all other entropy sources that are not directly the seed so that the tool can check that the whole random data from the seed is included in the output. Then the user has to choose a scope which we want to analyze, so we have to define a fixed length of input and output length that we want to check injectivity in. So, yeah, typically we do that and just take one cycle. Then the user needs to find a way around the crypto functions because the crypto functions are essentially hard to be proven injective, that's the whole point of the crypto functions. So we need to circumvent that in a way that our tool can prove around those crypto functions that the rest of the code does its job correctly. And then our tool comes into play, it generates the condition that we described earlier and verifies that this condition is true. And finally, yeah, we get a result, is there entropy loss, yes or no? And I would like to show you that in short now. So, what's here, that's part of the open SSL PRNG. That's the function that would add random data to the PRNG's internal state. Yeah, that's the seed arrow. Below comes the part where something was moved. And what we want to do now is we just want to introduce a simple problem that I have here. The problem kills one bit in the input. So we will not use one bit of the input. And now I will run the analysis. And the analysis will now translate the program, the open SSL PRNG, into a logical formula. Yeah, formula describing what G actually does. Then our tool comes into play, duplicates the formula and specifies the condition that we want to have two different inputs. And when applying this function, this will receive a result in the same output. And finally, we have here, a solver for such formulas. And the solver quickly said, yeah, we have a solution here. And the solution is giving those two inputs, we receive these same outputs. So now you have to check what's wrong. Oh, here, here's exactly one bit wrong. Why is the PRNG not using this bit? So, and then you would probably want to debug that and hopefully stumble across the instruction that we added here and remove that again. And then you can run the analysis again. And again, the whole program gets now translated into a logical formula that takes its time. So that's 30 megabyte of logical formula. Then we double that size because we want to have two invocations of the program which have different inputs resulting in the same outputs. And finally, yeah, the solver shows, yeah, we have no example. There's no example where this function loses random data. So how can we deal with this cryptographic functions? That's the part that I skipped and Vladimir will now take over to explain to you how we do that. Yeah, so this is the bit that we have conveniently swept under the rug so far. I will show you an example. So the thing that you see at the top, this is one invocation of the SHA-1 hash that is part of the seed function of the open SSL PRNG that takes the seed somewhere from outside and transfers it into the internal state. And what you see here is a concatenation of one, two, three, four different things. And I can tell you that on the first invocation only the third parameter here, only the buff, this is actually the parameter to the function, will contain entropy. And all the other parameters have fixed values. And altogether this is 68, I think, bytes long and the SHA-1 hash will derive a 20 byte digest from that. And the cryptographers promise us that, well, if the input has more than, well, has at least two to the power of 160 different values, then we will see at least two to the power of 160 different outputs. At least this is what we hope for. But again, as said, we cannot really check for injectivity of this call, of this function, because this function has been specifically designed to make this very expensive. So if we could check that, if we could check the injectivity here, that we would basically have an attack on SHA-1 or we could prove that there is no attack. So we cannot do that. We simplify the problem and we simplify it by offloading the reasoning about SHA-1 again onto somebody else. So we say the cryptographers take care of the behavior of SHA-1 and we only check the code that is outside of SHA-1 because this is the code that is non-standard and this is the code that is error-prone. The SHA-1 itself, this is a standard primitive. There are standard implementations. The non-standard test unit, it is difficult to break that, but it is very easy to break the code outside of it. So we will look only at the code outside. And we do this in the following manner. So here is a little visualization of the same call that is shown on top. And now there are two steps, actually, to replace this call with something that is easier to reason about and the same time that still gives you useful information about all the code outside. So we want to replace that call by something injective in a particular sense. And there are two steps to do that. The first step is to identify the part of the input that will contain the entropy. And you could say, well, but this is difficult, but in fact it is not as difficult as one thing. So first, we let the user do this. We could also automate that, but for the time being, we didn't bother. So we let the user do this first with the contextual clues, with the names of the variables, and so on and so forth. Then with the information that we provide on counter examples, and I will talk a little bit about that later. And also you could just try different things. So here you have a choice actually between three different things. So the fourth is not big enough for the 20 byte output. You can choose between these three things. You can just brute force it. If you choose wrong, that you will get a false alarm in the end. So the whole construction will not be injective, and you can fix the problem and try again. So first we find an input parameter with the length of 20 byte because the output is 20 byte. And then we have to replace that function by something else. And here we have two possibilities with what to replace it. And the first one is very simple. We can just replace this shaman function by identity. We can just copy the content of the seed to the output. This is very simple, and it also has many advantages. Because if you do this, well with a little bit of luck, actually a whole PRNG construction will become an identity function. So the output that the PRNG produces after you do this replacement will look very much like the seed. Maybe the seed is repeated, and you can test for that. So you can immediately check that all the other codes is fine. This construction is not sound because you have replaced one function with an unrelated function. But often or most of the time it will work, and it will work, and it will be okay, so the results will be trustworthy because you can imagine typical PRNG is actually agnostic of the concrete data that you have in your seed. So it would be strange if the PRNG would do different things depending on what exactly the data is that it is operating on. So you can actually substitute this data for something else. You can substitute the SHA1 hash by the just copy of the seed. So this is very useful, this is great for debugging. You can already test many things with this replacement, but it is still unsound. And what we can do, because we are not just testing things, but we are applying a formal reasoning tool, is that we can hear instead of this identity function use something else. We can use a mathematical construct. We can say we replace this particular function with an abstraction. We replace it with an under-specified injective function. So we do not say what it is, but it is something that is injective. And this is novel approximation of the behavior of the hash function here, and the results that you get will also transform to the original code that you have. So we would typically do the first thing, and that will help us to find bugs. It will show us collisions if there any. We can actually look at the trace. We can see how this bit pattern of the seed propagates along the implementation. We can just print statements along the way, and we see how this flow is working. We will not show that today, but we do have a nice visualization there. And then in the end we switch over to the sound idealization, and then we prove that the positive result that we get in the end, the good result, is indeed false. Right. A few words about the implementation. So a little commercial break here. We could get that far because we were standing on the shoulders of giants, and the two giants here is the CBMC-bounded model checker. It is a static analysis tool that you can feed your implementation in C or in Java, and it will happily take it, even if it's something complex, like open as a cell or Linux kernel or something. So this is really the advantage with this tool. It's a really robust and fantastic tool. It will explore the behavior of your program to a particular depth, and it will detect, well, certain things. It can detect predefined things like buffer or flows and so on, but it can also check user-specified assertions. So even if you have nothing to do with cryptography, Pyrene G is warmly recommended to use that tool to gain confidence in your implementation. And on the other hand, we are using Minisat's Satsolver, which is also quite an amazing piece of technology. I mean, there are, of course, other tools. There are other verification tools and other Satsolvers. This is just what we are familiar with. And you know that SAT is an NP-complete problem, but it doesn't really matter for us in practice. We can feed formulas that are gigabytes in size to the solver and within seconds that will tell us if these formulas are satisfiable, if there is a problem in Pyrene G or not. And we use these tools. We combine these tools with our own code. So we use CBMC to generate a description of what the Pyrene G does as a formula, and then we combine this description into this definition of injectivity that I have shown you previously, and we feed the result into Minisat, and Minisat will see if it is possible to find a violation of injectivity in this formula. And if it does, then we show the resulting traces for the two seeds and the same output to the user. The whole procedure, as you have seen, takes some seconds, so you can use the tool interactively. What are the results from applying this? Yeah, so the tool was applied on various Pyrene Gs, and first the good parts, yeah. So we checked those. We did not found anything suspicious there. The tool could verify that the Pyrene G code around the cryptographic primitives is injective and does not lose entropy. In Apache Harmony, as I showed previously, yeah, there's no problem. When we reinstated the problem, we would have detected that. And when we remove the problem, we detect no further problem. In OpenSSL, it's quite similar. Yeah, we would have detected if you remove the code. We also detected other problems in code that may or may not be used in practice and that has problem managing the circular buffer with the array going off by one, writing one byte outside the buffer, doing strange stuff, but all in all, that was not very fatal. Yeah, and then finally, we found the critical bug in KnewVG that we then caused to be fixed, and I will show you how that works, how the bug works, where there's a problem and how the bug could be undetected for so long. So first of all, there was a paper that proposed how we could operate a Pyrene G and first I would like to show you how it was designed to be implemented. So we have the buffer with random data here, and what we're looking at here is the perturbing function that mixes the buffer internally in some strange way using cryptographic primitives and what was suggested here is that we would take 84 bytes of the buffer, hash them using hash function and overwrite the middle 20 bytes that we have above here and then this buffer would go up there again. After hashing, we would move to the next block and hash that again, move to the next block and so on and so on until the end where there happens a wrap around and finally the last block would be calculated by hashing this block and the block here at front, then we would get the last block and then finally this buffer would be handed out to the user as this is your random data. KnewVG had a hash function that took smaller input sizes, not 84, but 64 bytes and for reasons that are not that clear, they decided we have a 60 bytes hash function just leave the middle bit out here. We're overwriting that anyway, so we can leave that out. I don't know really what's the intention, but what happens in KnewVG with the bug is we would hash them, write that down and so on and so on and so on until the end we would take this one, this one, hash them and write them here and this buffer should now be random. Is that buffer really random? When we take this block and this block, they are here and here again and hash them, we would get this last block. The buffer cannot be random. We can calculate the last 20 bytes when we use these 44 and these 20 bytes, hashing them, we would get exactly this block. Yeah, so that's what happened in LibGCrypt. That's a thinking problem, that's not really off by one implementation problem, but that's essentially what happened there. So yeah, what the consequences are on keys, they were hopefully not that bad, but yeah, happily we spotted that problem. That problem laid dormant there for a long period of time, had several security audits. Nobody spotted that, yeah, we looked at the code, it said initial import, so it was several years old and what we can take away from that is that audits are necessary, looking at the code is necessary, but the code has gone too complex to fully understand it manually, so we have to take technical assurance to verify that the code is correct. So thank you for your attention. And we have questions. Thank you. If you do have questions, please line up at the microphones on stage, there on the side, there are four of them. And if you would like to leave now, please do that quietly so that other people can still listen to the Q&A. On the right hand side, the... Yes, hello. Thank you for the presentation, very interesting. Is there some kind of implementation for mere mortals like us that can take their current stack of whatever they have, for example, OpenGP, generate, I don't know, 20 RSA keys, upload them to you and you tell us whether those look random enough from each other and does my system actually generate correctly, you know. Yeah, as we try to explain, the problems that we're looking here at cannot be detected when just looking at the output of the PNG. So when you have one output stream of your PNG, that would still look random. So statistical test suites won't detect anything. So the only thing that you could do is check for known issues and have known all streams that the known issues produce. So with the OpenSSL disaster, that were luckily only a few so that you can have all those 30,000 keys and check against them. But just merely looking at the keys and not knowing what to look for, I think that's not possible. Yeah, so could your techniques beneficially be applied to other cryptographic primitives like block ciphers or the building blocks that go into block ciphers? That's a good question. So you could, I think you could apply them to particular auxiliary codes that accompanies the primitives. So the techniques that we've shown you are pure information theoretical. So they are not concerned with cryptography as such, but also in cryptographic code you need these things. So for key expansion, for example, key expansion is supposed to have the same injectivity properties. You could apply it there. So there are some applications, but there are also very clear limitations. All right. Is there a question from the IOC? Nope, that's not okay. To the left side, one in the back. If I understood your definition of entropy loss correctly, you mean if you have two seeds that go to the same output that you would consider a problem. And you had several examples with SHA-1, I think from OpenS, and I think the Linux kernel is also using SHA-1, and that would mean, as far as I understand, that if you have a hash function with a collision, then this would generate a situation of entropy loss. So would that mean that, and we assume that SHA-1 will have collision problems within the next couple of years. So is there any practical problem if you use a non-collision resistant hash function here? Okay. So this is a little subtle. There are two answers to this. So first, well, first we are trying actually to look at the code that is outside of SHA-1, because this is where the mistakes often are. So we try actually not to say SHA-1 is someone else's problem, but also to be more exact, the collisions, so we are looking at injectivity from a 20-byte input to a 20-byte output. And of course, if you find a collision there, then, well, this will weaken your PRNG a little bit, but hopefully there are not too many of these collisions because if there were too many, it would be easy to find. But at some point, of course, you also probably have to upgrade the hash function in your PRNG. We can discuss that later in detail. Okay, on the left up front. In the theory of deterministic RNG are the notions of forward and backward and enhanced backward secrecy. As I understood, your theory is going forward, so detecting entropy loss would it be possible in principle to detect not fulfillment of enhanced backward secrecy. So if you know all further output and in the state to go backward to guess former outputs. That's a good question. I will have to think about that. Okay. Yeah, I think we're not looking at secrecy in forward secrecy or backward secrecy, we're just focusing on two different seeds derive different output, so that does not have any implications on whether knowing some random output means that you can calculate the next output or the other way around. Yeah, but in the sense it's looking at the injectivity of the G functions and what we need for backward secrecy would be the inverse function. Not only, so the inverse function would be having the output stream and being able to derive the seed. But what backward secrecy would mean is that you would have part of the output function that is more behind. So I don't need the seed but the complete in the state. The G function is from the seed to the output. Yeah, okay. Just to clarify, the attacker model here is that the attacker observes the output and tries to brute force the seed. This is the attacker model. If you want more things, you need to do more things. Okay, we do have a question from the internet. Yeah, basically we have two questions. First of all are physical implementations of RNGs better or do we need them to test the same way? I think we need to test them the same way. And yeah, because they typically also have some kind of hardware source of randomness and then there is this deterministic expansion also on top of that, so you need to check that. Yeah, the answer is you do and you can. Thank you. And second the IRC or some people on the stream are not able to see your explanation of the GNU PG bug. So could you please repeat that without using the laser pointer? Okay, so the whole problem finally boils down to that if you take blocks that are not overwritten so they are visible in the output stream and hash them together using the hash function that you hopefully know having the GNU PG code then you can derive from that the last 20 bytes. So first 44 and not the last but the before last 20 hash together will give you the last 20 bytes and that's a problem because that's not random. Maybe another way to formulate this is when you have this stream and you're looking at the last chunk in the stream then everything that this chunk depends on comes earlier in the stream and this is the problem so when you output the stream to the observer, the observer can observe everything that is needed to calculate the last chunk in the stream so there is no more data that the observer doesn't see going into the calculation of the last chunk First microphone on the right One comment about hardware implementations because usually you take two end bits entropy and compress them first to one end bits because you assume it's a problem for the detection of entropy loss because you do that because your hardware will have a bias and that's okay, it's just an offset of the comparator or something like that the bias is okay you solve the problem with the bias by compressing two end bits to one end bits and I think your checker would detect that as entropy loss because you really take two end bits compress it to one end bits and so obviously end bits are lost we assume that the seed is uniformly distributed in whatever length it has but with a hardware entropy source you don't have this uniform distribution and by compressing it with a hash function you get the uniform distribution and that's the first step to then you expand from the uniform from the compressed version so you have to be careful when using your checker because the input is not completely uniform you distribute it mostly but not perfectly and that's just because hardware is analog design and you have offsets and the other thing I did for checking excuse me, please come to the question when you have something like the SSL problem I write the first round of the first 32 bytes into a file and I check each time I initialize my PRNG and I check if that pattern has been there before and due to the birthday paradox if I reduce my initial seed I get to the square root when I have the SSL problem I would have after 256 iterations I would have a 50% chance to see that I have a problem yeah, but we showed for example how many problems and the margins are much bigger so there would be after 2GB but if every user does this check the chance to see it is pretty good ok, the other microphone on the right please so at some point you talk about checking the generation by simplifying the hash function so for example you talked about replacing sha1 with mem copy so I was wondering you said that some implementations could have outputs like the seed material in the outputs when you use mem copy do you have any examples of that like in real PRNG implementations for example OpenSSL will do that ok and to make it clear this is just for analysis purposes yeah sure yeah both on the left I'm very happy to see more formal methods and formal verification popping up and making this usable so thank you for that and my question is you said that you can trace the entropy or you can trace the changes in state or in the production function how does this actually work is this a feature of the seed proof stuff you used or did you implement it and how can I actually well go ahead and trace where my entropy gets lost how does this work yeah the seed prover models the whole state of your program and so every position where the seed is used are variables in the logical formula that the seed prover outputs and that way that tracks the data ok but is this part of the seed prover or did you add this to I think this is slightly different point this is not part of the seed prover this is actually something that we do after the seed prover analysis has terminated because what we do get is we get the model of the formula and we extract the actual seeds that exhibit the problem and then we run the p-ring with these two seeds and we output the intermediate states just we write them on the console and we show them side by side so you use the variables that satisfy your or break injectivity and then by having the two seeds that break injectivity you can trace ok yeah alright thanks a lot to Lannemar and Felix please give some applause for this one thank you very much