 Okay, hi everyone. My name is Benny Pinkas from Barri-Lan University in Israel and I'm going to talk about implementations of pseudo-random number generators. Before I do that, I do some advertisement for another winter school. We're going to have a winter school at Barri-Lan. It's the seventh winter school we're having, seventh annual winter school. It's going to be in mid-February. The weather is going to be about what you experience here today, unless it's going to rain. So it's going to be very nice. This year the school is going to be on differential privacy from TV to practice. We have a very nice set of speakers and if you want to learn about differential privacy, you're very invited to register. You have the URL there or you can search for the winter school. You can ask, I mean some people here have been in past events at Barri-Lan at these winter schools and they're usually very nice, I mean educating and fun events. So in this talk I'm going to talk about pseudo-random number generation as it's done by common operating systems and all kinds of implications of that. So part of this talk is going to be about research I've done about 10 years ago on the actual implementations of pseudo-random number generators in windows in Linux. Then I talk about some implications of the findings we had in Linux. And then at the end I must save some time to talk about the recent back door in the random number generator used in Juniper products. Juniper products, I mean they have the control a lot of the firewall and router market and back there in the random number generator is, you know, it's a very serious issue. Okay, so first I'll talk about the insecurity of the pseudo-random generators of Linux and Windows. This was done with some very good students, Svika Gutemann, Verodor Endorf and Zachy Rheinmann who did a lot of systems work. So this is a lot, this work is a lot about systems. I won't have any theorem in this work, in this presentation, okay? No theorem, probably almost no math, okay? Talk about systems. Okay, so I'll talk, can you see that? How pseudo-random number generators are implemented? Why is it important to check how they're implemented? Specifically on the generators used by Windows and by Linux, the algorithm and the weaknesses. And security issues when you're using a generator in systems without the hardware. Okay, so what are random number generators? Okay, you know, I mean usually I have this slide, what are random number generators? You know about that. Why are they important? So in cryptography, like every cryptographic construction usually says something like we have in the bottom, pick a key at random, something like this. So security depends on using random numbers and the question is how we pick them. So you know computer of course cannot flip coins, so we need to somehow pick a random number generator and for instance in Windows they call this function cryptgen random which provides in this case 16 bytes of random bits and inputs it to this variable, the key. And this is relevant for any cryptographic construction that we might use. So when we design cryptographic construction just say okay let's use random number generators but people who implement them actually use whatever the operating system usually provides them and if someone can, you know, so if the operating system doesn't provide secure, I mean truly random numbers then all the constructions based on random numbers are insecure. If someone can tamper with the generation of random numbers then also the same happens. So this is a major thing if you if the source of random numbers used by computers is insecure then all the cryptographic constructions we build on top of it are insecure. Here's another example it's not necessarily, it's not a cryptographic construction but here is, you know, you're doing HTTP with the server so HTTP is stateless so how when I go to a site over and over again how's the site, the server know that I'm the same person it's usually the first time I connect to the site I get this random session ID and in each future connection I make to the site I have to send this session ID, okay and then it knows that it's actually me, okay and this next session could be, you know, where I'm getting, you know, my bank account information or my email or whatever so everything here depends on this session ID, I mean everything is encrypted but if the session ID is not properly, it's not, it's not random then someone else can guess it and then it can try to impersonate me, isn't this session ID and this is, okay, work for, so for instance, סביקה גורטי מהנדר ימלכי showed in 2005, specifically on NAPACHI, Java implementation that these session IDs were not random, okay and, you know, the guys who implemented the server, I mean they had a lot of things to do on their mind, right, so it's, it's not, it's, it's, it's, it's not realistic to expect them to generate random numbers properly so someone, probably the operating system has to provide good random numbers for them because, you know, they haven't been in discourse this week, they, okay, they have a lot of other things to do, they need to ship something very, you know, quickly, we need to provide them with random numbers, so applications are designed to be secure if they're using truly random bits but it's hard to get truly random bits therefore using, we are using pseudo random number generators and applications are only secure if the pseudo random number generators provide output which is indistinguishable from random so the way we usually write it is that the PRNG gets a short seed which is supposed to be random but it's short and then generates a longer output and it's secure if this longer output is indistinguishable from a truly random swing of the same source. I hope that you all know this, this, this definition so the out of the generator should be indistinguishable from truly random one and here I assume that the generator gets a small seed which is truly random in practice it gets more seeding information which is not truly random like all kinds of timing events of what I do with my computer and so on so it doesn't get a short seed which is random it gets lots of somewhat random, you know, events that have some entropy and it tries to get from them a swing which is indistinguishable from random so how can we use random numbers in, in programs one option is to use pure hardware number generators like the one designed by Intel the problem is that it might be not expensive it doesn't might not be on each device we're using and if I want to write an application first of all I want to run on all devices not not only those that have this hardware number generator also I don't want to mess with the API of actually talking to this random number generator especially if on different systems it's going to be a different API so on someone you know higher up to generate the random numbers for me another option is to use to write the sudo random number generator myself however the application sits too high up you know in the system it doesn't get access to a lot of noise that happens down you know somewhere down there like you know all kinds of events happening in the system doesn't have enough entropy and also writing a good number sudo random number generator is hard so we don't want to leave it to the application writers so we end up with the operating system providing the sudo random number generator on one hand it can access system events like maybe a hardware number random number generator the time all kinds of this timings user keystrokes whatever uh and it can provide an API that provides access to random numbers to uh any application that's running on top of it and since we only write it once for all those who are going to use the system then the implementer can be you know more educated and do his work better than some application writer who has a lot of other things to do so back then before you know why should we investigate the sudo random number generator operating system so okay because it's it's important i mean all the cryptographic and security code that sits on top of that depends on the security of the random numbers that are generated by the sudo random number generator so we have no it makes sense to investigate what's what's happening there so and by in the time we were working on that the algorithms and the code were not published so in case of windows the code we only had executable in case of linux we had the source code but it was 2500 lines of code and no one knew what's happening no one except the writer knew what what's happening inside uh we also even if we have the code we don't know how it's initialized what's the initial initial seed that's used by it and no everyone uses it so it seems like a good task to look at so that's what we did uh so how this sudo random number generator works so it it's different than what you know we think about in cryptography in cryptography we have a seed and we generate an output so here we have a generator it keeps a certain state and there is a deterministic function which takes the state moves through the other state and provides some output so basically this might be the initial seed and then we provide output move to the next state provide output move to the next step and so on something else that happens there is that the state is being refreshed periodically so as the system runs the operators the operating system uh uh takes reads a lot of system events and somehow generates more entropy and from time to time it inserts this entropy into the state to kind of make it more more random okay so this is different than the at least the theoretical model that was investigated earlier of sudo random number generators we have a seed and this function gets an input provides an output here it's a continuous operation it provides output whenever someone calls and asks for an output and from time to time we kind of refresh the state of this function when we analyze the security we assume that everything is known we assume that the adversary knows exactly what's the function that's being implemented here we know you know he can ask for outputs the only thing that we don't assume the adversary to know is what's the system entropy which is being input into the sudo random generator okay and uh manufacturer of the operating system might try to hide this uh function that advances the random number generator but you can do reverse engineering we managed to do that with a couple of students so it's possible to do this if you're determined enough so we must assume that the function is known so what security properties do we want or that were stated like in the practical world so one obvious thing is sudo randomness the output should be indistinguishable from random so if this system entropy is a random state random a short random string the output should be indistinguishable from random therefore it can be used instead of truly random bits another property is backward security which is breaking recovery i'm always confused between what's backward what's forward so let's see what's backward because you can also backward security looks like a forward thing to me but this is how it's defined so an attacker that learns the internal state cannot learn future outputs of the generator assuming that sufficient entropy is used to refresh the state so we have this state and now someone breaks into the system and learns the state so of course he can use the deterministic function that advances the state to learn future outputs but at some point we have new system entropy being added to the generator and from that point on the attacker should not be able to learn the outputs of the generator okay so this is i look at like future but it's called backward security okay so security from things that happens back in the past the inverse property is forward security and it means that given the state at time i plus one it's hard to compute any previous state okay so given the current state it's hard to compute previous states of the generator and going forward in time what we showed in our attacks that at the time the generators both of linux and windows didn't have forward security so if you could break into the system you know right now you could learn all previous states and previous outputs of the generator so all keys that were used by whoever was using this machine to connect to all kinds of services on the internet in the past and of course also in the future and until the next refresh so why is security forward security important so security systems are secure as long as no one break into them if you don't have forward security then someone broke into the system you can go back in time and learn also what happened in the past i'll skip about this okay so the analysis of the windows random number generator this was in ccs 2007 okay so in windows yeah we were able to go back yeah you can even really see it yeah so you can go all the way back until the beginning of you know whenever the system was restarted so okay quick gen random is the only api that windows provide okay i'm not i'm not familiar with the current state of of the art i know what was the set of the art 10 years ago okay but it doesn't matter because we're not here to talk about you know what's the current state of the art of random number generators we're here to talk about how you should design good random number generator okay so it's uh you know things might be different today but it's also interesting to learn what you know what what was the case what was the uh uh you know what was happening 10 years ago and actually the time we looked at we were first to actually look as cryptographers at or at least in the academic world to see how these things work so uh it's and it paid because after we looked at you know how the generators worked the those who built them changed the constructions to be more secure so okay going back to the talk so quick gen random is the uh only api that windows provides to get secure random number generators so it's probably at the time at least the most common through the random number generator used in the world uh and it used by internet explore to generate ssl keys at the time mozilla was using something else uh chrome was not very strong at the time so i don't know what what chrome is using uh and the code and the design were not known it was a black box and microsoft said okay just trust us okay so we examine the binary code of windows 2000 because that was uh you know what was used back then and you know leo dorndorf very talented did the reverse engineering of the source of the executable to learn the code of the random number generators okay uh and we identified the algorithm uh using static and dynamic reverse engineering and then we verified the algorithm by writing the simulator we we thought we understood how the algorithm works so then you write a code that implements your understanding of how the algorithm works and you run it side by side with the actual windows and you verify that both of them provide the same output so therefore you know this kind of verifies that you actually identified the actual code that is used by the by the actual system and then we show the tags okay so this is the code okay this is okay i mean it's a long call but this is like the heart of the code and when i presented it that actually it was here in baselona in uh in in in the ramp session of eurocrypt that was the first time you know someone presented this source code of the windows api then a lot of people in the crowd took out their cameras and took photos of them and then i felt like you know a rock star a celebrity because when i give a talk and all you see is flashes from the uh no cameras i mean no one's taking photos of the code today but that was actually because no one knew how what the code looked looked like at the time okay so what's happening here okay so we have uh okay we want to read from the this generator we want to read len bytes into a buffer so as long as we need to read more bytes let me see here okay there's a state the state of the generator has two registers each is 20 bytes long r and state so r is being x-word with 20 bytes to get from some rc4 instances rc4 is a swim a swim cipher then you xor into state then t is almost sha one of the state and then this sha one of this state is what you output to the buffer then you take part of your output and you change part of r using part of this output and then you add one to state and two r and you go back here and since you read five bytes here why do you reduce 20 i'm not sure maybe there's a bug but that's that's the code so basically the code here is r and state this is how they change based on sha one and rc4 so you look at you know i'm not a crypt analysis but try to understand what's happening here okay so this is the output output gets a sha one of the state sha one is a hash function and this is how we work we take rc4 which is a swim cipher actually several instances that okay they somehow take all kinds of system events and add this I know the initialize with the initialize with system events and then they just work to generate a longer longer stream you add it here so it's unclear what's happening something that we noticed was very odd is that okay you xor into state and then you don't change r and then you add r into state and you add one okay so xor and addition are almost the same I mean xor and addition are only different if you have a carry bit so it's almost like think of this like xor instead of addition it's not the same so it's like you xor to state and then you xor again so you kind of remove r so it's not like that but it's kind of almost like that this looked very odd so we actually we used it okay so that's a good question so I think I have something about it but they take when the system starts running they take a lot of I think about 2500 different bytes they initialize an rc4 state with these bytes or several rc4 states and from that point on they continue generating more and more bytes and the question is you know we were not able to identify exactly how the system affects initialization we really have to understand how windows initial is initialized it's like it's complete mess if you know what I mean if you could tamper with that then everything is broken but we don't know what's happening yeah so there is some fresh entropy coming in but not very frequently I'll describe I'll describe so so yeah there's no fresh entropy here this rc4 is something you initialize supposedly with a random key at the beginning and then it kinds of it adds entropy without it then everything is kind of predictable yes this r is almost you can always remove it okay so what's the big picture okay so a different state of this random number generator is kept for every thread in the system and rc4 is in static dll space r and state are in the stack meaning that common attacks like buffer overflows or like very simple attacks can be used to get this information they're not hidden in kernel space which is harder for attackers to get into it's like they're kind of in the application area where it's kind of easy to get them okay so they're not well well kept also it's important as there's a different copy for each thread the initialization you take 3500 something bytes of system data most of it predictable but not all of it like internal states cpu queries whatever registry keys you hash everything together and you generate the initial rc4 states we don't know what's happening here if someone can tamper with it then that's the end now the system is receded with fresh entropy after it generated 128 kilobytes of output okay the problem is is that we have a different instance per a different instance per thread so threads usually don't use this much output so think about a browser that uses some you know randomness to generate you know to use ssl or no yeah tls handshakes it's not going to use 128 kilobytes even it's going to run for a month or so okay so essentially refresh never happens okay unless in very rare cases you should start from the initial state and just go forward okay so something which is not surprising is that you know if someone breaks and get the current state they can compute all future state steps once we provided the algorithm okay and it's not hard to get the state because it's kind of in application area it's it's it's it's not hidden inside the kernel okay and this is problematic because the refresh only happens after you generate 120 kilobytes for this specific thread uh this means that never happens for many you know for many important applications so this is severe i mean you go for the coffee break i break into your machine okay i get the state that's the end until you do a restart i mean until you restart your computer okay now we don't know how to break the sudo randomness of the generator i don't know we tried we're not great crit analysts i think that other crit analysts looked at it no one nothing was published so it's probably secure uh so we don't know how to distinguish the output from random assuming that the initial initialization was was one uh we do have an attack on forward security meaning that we can learn previous states so and and the main issue the main thing that enabled the attack is that rc4 is a good it's not a good stream cypher it's being phased out but it was good enough for these purposes however it was not designed to provide forward security so i won't show you the the code of rc4 but something that the uh the designers meaning one revest who designed rc4 didn't care enough was about was forward security he didn't make it uh so what's possible to do with rc4 is that if i get the current state of rc4 it's very easy to go back in time and compute the previous states of rc4 so for the application where you use it for encryption it doesn't matter because you know i encrypt data and it should be secure against someone who ifs dropped to it and no one assumes that someone gets into my machine and gets my key if they get my key or get my current state then that's the end of end of the game here however if someone gets the you know breaks into your machine right now and they get the state of the rc4 you know instances running on your machine they'll be able to reverse the algorithm of rc4 and generate all future uh states of rc4 excuse me reversing compute all previous states of rc4 in all previous outputs i won't show you how to do it but this is trivial if you see the code okay so given that we show that given the current state of uh creep-gen run random it's possible to compute the previous state with two to the 23 work which is like a sub second computation like eight eight million uh it's basically eight million uh invocations of uh of sha1 okay and part of the attack is based on the exporting the relation between uh xo and addition i'll show you a bit about the attack in a minute um actually there's an even uh simpler attack so think about the system we have the state and the r these two registers okay and we have rc4 rc4 is j is initialized with all this system information that we cannot model so let's assume it's totally random okay state and r are not initialized they kind of when the system runs then this loop i showed you uh changes them with data with bits coming from rc4 but they're initialized with whatever happens to be in these variables when you start running the system and this is predictable okay so if i look at the system i know the initial states of uh state and r okay and i don't know what's the initial state of rc4 that's always true now you go for your coffee break and i break into your machine and i get the current state of uh s and r and i also get the current state of your rc4 okay since rc4 doesn't provide for security i can take this current state of rc4 in your system and rewind it in time all the way to the initial state so now i have the initial state of rc4 in your system i have the initial states of s and r because they're the same in all machines they're kind of predictable and from there i can go you know run the algorithm and go to the future and compute you know everything that your machine did until this time okay so since rc4 doesn't have for for security nothing has for security uh if you know the initial states of state and r and if even if you don't know them i'll show you a little bit about the attack which takes two to the 23 time to uh work to to go back so before showing the details of the attack let's talk about the implications so we notified microsoft and they said this is a local information disclosure a vulnerability and has no possibility of code execution meaning it doesn't let someone break into your machine and just let someone who already has access to your machine learn more about it so what they said at the time was if someone can get the state of your uh uh uh suduan renumber generator then it means that they all already have access to your machine then if someone has access to your machine then we don't i mean we cannot protect you anymore what we said is that if someone breaks into a machine like doing the you know coming coffee break then uh okay it's kind of legitimate for them to learn everything that is on the machine at that time but what the attack enables is for them to go back on time and learn things i've been doing with the machine in the past so this is that's not cool that's not legitimate okay this is something you should uh uh protect against and this is what i think the attack is is interesting so basically the implication is if someone breaks into my machine now then obviously they can go forward in time until the next refresh using the attack they can go backwards in time until the last refresh and this is usually the time the the machine was restarted and the next refresh is not going to happen anytime so we can learn all the outputs of the one renumber generator okay 120k bytes okay so let's look a little bit about the attack i'll just look at no basic things so i'm not a cryptanalyst it's like very basic okay i'm not elly beham we just did very basic things okay so what have here we have here state and r rc states are being x or 2 r this is being x or 2 state we take sha one of that uh we put it into r okay this is the output we take sha one of the output put it into r and we add it with state and one this is the new value of state so this is the output we replace some part of r with it add one at state this is new value of state okay so what does the attack knows he breaks into the system he knows the current rc4 state the current r state and the current state okay since okay he he knew this he knows this and he knows this so therefore he knows sha one because this is he knows state state x or r equals new state if you know new state and r you can go back in time and compute the previous state so you can do that and he i think he also knows the output this part of it and then okay here we had the rc state that goes x or d t and go okay this is okay this is the same as this except that we replaced the last five bytes with something we got from the output so we know this value so therefore we know this value except for the last five bytes we know we know this value entirely because that's the rc4 state therefore this is the equal to x of this and that so we know this entirely except for these last five bytes and we know this okay we know this entirely we know this except for five bytes so we know this except for five bytes okay so basically just by arithmetic i can compute the previous state except for 40 bytes okay so okay go back so i can going forward or backward in time this is more about okay so basically okay i have everything i have the previous i have like 40 bits of entropy about the previous states i can check all two to the 40 instances compare the output that i have to the current output that i current the output that i picked from the system to the output that is derived from that state and see if they match or not and then you okay you have two to the 40 options for the missing five bytes you have i think two to the 40 we have a single 40 bit output you check you're going to get the right one is going to pass the check you also expect to get some other false positive but then you can compute it's you compute a martingale whatever you compute the number of false positives that you expect to finance very small so basically this is enough to break the system it requires some analysis but this is yeah whatever so you start you have one false positive one false positive from that no you have one value from that you expect to get after no for the previous state you get no the true value of the previous state you're going to identify also going to identify one false positive so you have like two you go back in time so this false positive sometime in the past you're going to see that it's actually not doesn't make sense because it's not going to have any ancestor and some are going to have ancestors but if you do the analysis with martingale you're going to see that you're going to end up with a very smallest set of possible seeds and they're going to the true one is going to stay inside and the other ones are going to fall after a while okay and then we have this okay this is the attack it's kind of simple we have a more complicated attack that took only two to the 23 shower one calculations and it used the relation between addition and exclusive or so this enabled us to do a better attack and like even at the time we can do to the to the 23 shower one invocations in less than a second now it's a fraction of a second so it's at the very practical attack and just to tell you some at some point doing the analysis we thought that we had like a you know complete break like you know we could break everything about the system learn distinguish it from being pseudo random at that time we freaked out like I remember we were sitting in a cafe we said okay we found a bug that kind of enables the nsa to eavesdrop to all windows machine what do we do what do we do now it was please know then and luckily you know after a few hours we found that we were wrong and the things are pseudo randomness is okay there's only this attack which is kind of you know it's not as severe as having the truth I mean the pseudo randomness being completely okay so this was remember it was windows 2000 at the time xp was the more important system so xp had the same pseudo random number generator that rc4 was replaced with something else uh this is just telling you you know the the experiences that we had at the time in case you gonna do something similar with other systems and we didn't want to spend months trying to reverse engineer the code of this thing okay because it's we didn't know what's happening there it was more complex but if there was no forward security there it would mean that xp was also susceptible to all the attacks that we had so we asked microsoft and the initial answer was that later versions of windows meaning xp contain various changes and enhancements to the random number generator okay this was okay so it's it's better but the question okay was you know we continued answering we actually had a journalist who asked this for us like okay it's better but is it secure against this attack and the answer was actually xp you know it's better but it's also vulnerable to this attack but then they changed the code to make it secure against the attack okay and like the lesson here is they give you an answer and it's not cleared and you should you know you should ask again no the code as far as I know the code is closed you don't know what's happening there that's actually that's a problem with doing this research like you invest months doing reverse engineering and then you might end up seeing that everything is working fine okay so if you're the first one to do it then most likely it's not going to be fine if now we're going to do it then I don't know yeah as far as I know there's did they analyze windows or did they just okay so this I only I as far as I thought the only vote how you should be doing it they also analyze it's how the paper is called it's yeah yeah that's the paper how to join okay so I I'm not surprised that windows are actually good at this point so it has what windows are using there okay so you know I'm I'm afraid I'm not I'm not following the you know what's what's happening but it's uh I assume so it's probably true that it's secure at this point um okay however okay it's probably secure but this is like the future work that we suggested at the time so look at future windows like vista whatever win seven win 10 which is probably secure however something that people did not investigate is how you initialize the state I mean this 3500 bits that you pick from different operating system events and you initialize the state with them what's happening there so to do that you'll have to run a lot of I guess system copies and try to you know estimate what's the entropy that you have there and perhaps there's some no entropy in some area which is which which later affects the entire state you know I don't know it's uh it's a huge uh you know it's it's a huge project to do that okay but if this is insecure then everything is insecure okay and one recommendation is to switch to design which supports forward security Barakalevi at the time had a paper on how to design random number generators and I skimmed the the the recent paper of uh you've got any uh I have a reference to it later you've got any other Shamir Daniel Weeks and some other person I'll say them in a few slides also provides a design and also you have to wiki uh more often okay you have to add entropy more often because if someone breaks into your system you want to limit the effect of that attack okay now I'll describe the attack on linux which was actually earlier work before the attack on windows okay so the linux random number generator was generated essentially by a single person from from 94 till this day till.so it's implemented in the kernel which is better than in the user space because it's harder to break into there and get to the state it's very complex and it changed like on a weekly basis uh so he keeps changing him changing it he's the master of of the random number generator uh the design I'm going to describe have been changed afterwards but I'll describe what was happening then so there are two interfaces one is for kernel functions another is for users the user have two interfaces dev random and dev view random the dev random is a blocking interface which is supposed to be more secure meaning that it doesn't provide output unless it thinks that it has enough physical entropy okay dev view random always provides you output people don't like using dev random because it might just block and won't give you output because doesn't have it has enough of physical entropy okay so uh they mostly use dev view random actually if you have good seeding you don't need dev random you can just use a pseudorandom number generator but uh okay you have these two interfaces okay so it was open source however it was you know 2,500 lines of code unclear and it was constantly being packed so we had to just get one version analyze it using static analysis then do changes to the kernel to see uh you know what what's happening if you do this this change and then you write a simulator and verify that it provides the same output as the actual implementation and I didn't do any of this work but it's a very tedious and annoying work even even though it was open source and this actually tells us something about open source uh uh uh לנוס תורוות set something like no bag is too deep if there are many eyes looking at the code something like that however here we had we have code and I'm sure no one except for third or so and those who tried to break and actually you know you know abuse the system understood everything that's happening in that random number generator okay so it's open source but actually very few people analyzed it analyzed the code so open source is not you know a name for security because it's it's hard to understand what's happening there okay so the structure is that you take lots of entropy sources like key or keyboard timings mouse timings interrupts disk timings you have a primary entropy pool and then you have uh output going there for secondary pool and to the random your random pool this provides the dev random output this provides the dev u random output they have some counter which estimates how much entropy you get and how much entropy you take and if you took too much entropy and didn't add enough entropy from here then this dev random blocks doesn't give you more output and dev u random is happy to give you more output of course we know as cryptographers that if there was an initial seed which was random enough you can provide you know any polynomial polynomial sized output which is and it's still going to be secure ever with with dev random they want to get like physical entropy okay uh entropy is always gathered and refresh happens all the time i like windows which only refreshes you know at infinity here they keep refreshing the the state an event has two words representing it one is the event type which is predictable this is a mouse click whatever and the other is the timing in milliseconds in each event there is very limited entropy we measured hard disks hard disk accesses and we had like one bit per hard disk access at the most but the good news there are many events and they can have kind of refresh the system uh so we had an attack however it was less severe it took us to attack forward security two to the 64 wall compared to two to the 23 so the attack is less severe also in linux it seems to be more secure it's in kernel space it's hard to get it it's the same generator used for all processors so in one side one process finds it you know can access the generator of another process that's not so good on the other hand have just one copy it's more protected okay we have this blocking interface that people don't like because you want to get random output and you in just you know don't think it can provide it to you so just blocks and your program doesn't run so this blocking interface is susceptible to denial of service attacks so for instance if you run like a super secure application that needs output from dev random and I can somehow get a lot of random random outputs from your system from say another process on your system it uses dev u random just I'm saying saying sending pings to your machine or whatever then I can block the random generator on your machine because I'm going back to this so if you have something on your system that wants to get output from here and I can make something else on your system generate output from here suppose I'm sending a lot of pings to your machine and you know when you send TCP answers you should put some randomness there you get it from here then I'm kind of drained the entropy count so when your super important application wants to get output from here then the generators sorry I don't have enough physical entropy you should wait so this enables denial of service attacks and that's not good okay also there was future work in 2012 these authors had a paper the leading student random number generator we visited the examining the changes that were done between our work in that time there are quite a few changes they describe the state of the generator and they didn't describe any new attack so it's kind of okay and there's also this newer work on Yevgeny, Adi Shamir, Steffes Davidovic and Daniel Wicks who designed you know optimal recovery strategies for compromise RNGs how to enter new entropy to your generator however there's a big issue with using the real linux random number generator with with systems without hard disk that describes in a minute but before that I'll say something else so we found uh it's actually issues the forward security issue both with the linux random random number generator and with windows and basically we told both Microsoft and the linux you know developers about these issues and asked them you know to change to you know to fix their system so the the answers we got were very different okay for Microsoft they said that's no problem we said there's still a problem and they said okay we're gonna fix the system and they fix the system with the linux people it was more like a like you know like a religious war like if you look at things that they were written about us at you know at that time in linux forums they're really trashers like trash as academics who don't know anything and the issue the thing is that for Microsoft is they said you know it's secure it's theoretical attacks we don't have to fix anything and the thing is that with Microsoft it's a company right it's a business issue we fix it for them you know for someone who invest a lot of his free time you know working on trying to make the system you know you know the say the random number generator of unix of linux or other parts of linux this is like you know something you know he cares about he owns and they don't like someone else coming and telling them what to do so it was really it was much harder to convince the linux people to change the system than the convince then to convince Microsoft okay and that's I know something to think about also with with regards to security of open source uh system because they it's like at the end it's like a few people who decide what's happening there okay so what happens with this class system so at the time uh people you know 2006 people started those who had a lot of money to replace hard disk with uh solid state drives okay and now it's it's very common uh and the problem is that with hard disk read write operations you have uh variance in the timing on how much it takes you to read something from disk and there's actually a paper from 1990 about actually if you have a hard disk you have a disk that rotates if it rotates uh it generates air turbulence the air turbulence provides entropy that uh is unpredictable okay it's and this is actually the source of randomness if you have a solid state drive then all read operations are kind of digital and there's no there's no randomness there's no entropy there okay now that's a problem because with uh with the random number generator of of linux how these timings are the major source of entropy so if you take this r and g just you know and you put it on a system with without a hard drive just with a solid state drive you lose the major source of entropy and uh and the other sources of entropy are very limited user input system interrupts they might be guessed or if you think about a system like a router I mean there's no user input okay there's no mouse there's nothing there okay uh just the hard disk and network operations if you measure them but at the time they were not measuring net network interrupts okay so we speculated this might be a threat to the security of linux r and g in future systems we did some preliminary exams I'll show you in a minute so actually the the developers of the r and g had some recommendations they said okay uh we don't trust the source of entropy so much so the system should somehow continue when you shut it down so whenever you shut down an linux system it should take the current state of the random number generator store it save it and when it starts booting up again it should take that state and use it as the base for its new operation that's great this is done by the linux distribution however if you have it a distribution on a cd or dvd okay which is not readable then you cannot save the state because there's no one where to know where to save the state uh and a lot of systems like routers or if you're using open wrt routers whichever used to be very popular they don't save the state they just didn't bother doing that so then we took a specific device uh this was a pda made made by nokia I mean no one uses it today uh and we examined it and then we saw that this device always boots with one of only six possible values for the random number generator so when you boot the device the random number generator has only one of six states and this happened because the device used ssd so it doesn't have any it doesn't have any source of entropy also the developers didn't save the state when the device was shut down so it always started from the same state also doing the reboot the rng should take the current time of day and enter it into the into the into the rng should be initialized with the current time of day when you reboot the system so this should prevent this observation we had because at least it should start with initialization which is based on the time time of day however uh okay and another thing so it takes the current time of day and also it should read some values from a hardware based noise generator and then we didn't see this happening and we looked what's happening there so what's happening is that the rng was initialized so early in the boot process so the hardware random number generator that time was still kind of cold and didn't provide any output so it was always fixed and the hardware clock indeed had the the real time however doing boot the hardware clock is being copied to the software clock and the rng took the software clock and copied it to the state what happened that this the rng uh was running before the software clock was updated so it always updated so okay these guys the guys who designed this machine I mean they were not stupid but they were busy okay and they said okay we have an rng for linux let's use it we didn't think about how it's initialized and how to make it better and they didn't observe these issues so one of the you know good things that should have come out of our work is that people should look how you know the system they're using actually you know what they do and whether it makes sense just take some random you know some piece of code and and use it on a new on a new on a new device so uh then we thought about doing something more interesting so we could so since this machine always started in a specific you know with a specific value one of six values for the generator uh then we put like code which you know whenever the machine started it ran an ssh connection to a remote host and since this ssh connection was took a key from this random number generator it always used one of six keys in this uh connection so we could use both for that then we thought about doing something more complicated but this didn't work and because it was too complicated and the idea was the far away okay so we have this rng it starts from uh like a fixed say like a fixed state the only thing that is going to change the state is you know the events that the rng copies and the only meaningful events that were relevant at the time was user input so basically the rng is going to change its state based on user input uh okay so other applications might connect to this machine from outside say ping the machine and get output from the rng which is based on the user input so perhaps if we okay take this device ask the user not to do anything and ping the machine from outside we'll get output which depends on the observable state uh the the uh state that we know it's going to be the first one uh then we ask the user to just click on one point on the machine this is going to change the value of the state of the rng we're going to ping the machine again from outside and we get an output which depends on the input that the user entered to the machine so this should enable us to do a boot first on all possible inputs of the user and learn what input the user gave to the machine and if the user is doing his input slowly enough and we ping the machine fast enough perhaps we can eavesdrop to what the user is typing just because this affects how the rng works and so we did some measurements it didn't work it worked if we had to do some change to the rng and we typed very slowly whatever but i'm but potentially it might work because okay there isn't a source here like this this random number generator it's being affected by user input and it provides output to the you know whoever pings it from outside okay so potentially there might be a system where this might be might it might be possible okay so then we forgot about it and it was very this very uh uh no impressive work about mining your p's and q's by henninger, dumeric, wustro and ix alderman which i'm sure some of you have heard of it what they did they did the large kind of many machines on the internet and check the keys that they were using so what happens okay they check the public keys machines we're using so think about rsa if two hosts have different modulus uh moduli and have the same e that's okay everyone uses the same e which is three that's that's fine if two hosts have the same n rsa public keys that's not so good because the private key of one or one host can decrypt messages sent to the other host now what happens if they have different n's but no n equals p times q so we have n1 and n2 which are which are different but p1 is equal to p2 and q1 is different in q2 so these two public keys share one prime uh one prime and have the different prime being different so what happens there why it's it's this is totally insecure because if now we do computer gcd of n1 and n2 we're going to get p okay and with p we can factor both n1 and n2 so this is really bad okay and the time to compute the gcd is 15 microseconds which is very fast for breaking rsa okay uh the same thing if we're using these a signatures or e c d s a what happens if we repeat the same nonce so so okay why should different uh uh uh uh devices have the same p's and different q's if somehow the hardware number the sude one number generators are not working properly okay otherwise like just by pure collision it's not going to happen now suppose someone else who's doing rsa signatures and in these signatures you need some uh uh nonce in in these signatures what happens if you have a weak source of randomness and you repeat the same nonce twice and that's I think uh pretty popular uh homework you know question in intro to crypto courses it's easy to see show that uh the dsa signature in that case it's completely broken okay so it's important to use random numbers in this case you can break signatures in this case you can factor uh rsa keys so we don't expect to see two rsa keys colliding in the wild however they collide they checked 400 million rsa keys and they used a vibrant of an algorithm of Dan Bernstein for computing the pair with gcd of many numbers uh and this enabled them to compute the private keys of half a percent of all https servers and zero point three percent of all ssa servers so these are cases where two different servers had different ends which are the same p and different q's this is like a complete catastrophe for both servers okay and this is like you know one out of 200 https servers had this problem that's that's very bad okay and also a lot of devices had repeated keys the same end happening over and over again and most keys that had this property were generated by network devices like this one that we also looked and they all have used ssds and not hard drives and otherwise have very weak sources of randomness the thing what happens there is that they gather some randomness from operating system events so even the this device that's rechecked had one out so it started from one out of six possible states so you know if indeed it didn't have any entropy it will always start from one state but it had very limited entropy so during the boot up process it ended up with one out of six states uh with these devices or similar ones that when they start they starting to get entropy so for the first p they generate an rsa key for the first p they have some entropy for the second i guess for the q they have more entropy so the p's are kind of very likely to be the same the cues are going to be different this enables this attack which is very bad and again this happened because people you know took existing you know software and threw them on a new device without thinking what's working in this device and what's not so this is like it's funny from the authors of that paper they had to do a disclosure to you know all these companies so they had to disclose this to 61 companies only 13 had contact info telling them who to contact okay only 28 answered 13 said that they okay they're going to fix the problem five said we only thought about it we already solved this in the past this resulted in changes to the linux kernel to use other sources of of randomness and they said that afterwards I think this is for a year afterwards they had a 20% decrease in the number of hosts that have i would say factor but i would say it is because people deploy these changes so this is an example how a weak generator can affect security in a very bad way okay so the take home messages from this part is be careful we're using a generator that you don't understand okay or where in this case it's specifically when designing a system without a hard disk so you take software for a generator you must you know you should understand what's happening there don't just throw it into a new system because then it might not work properly also collect entropy more aggressively and add hardware sources see devices with entropy at factory if possible so we don't rely on users and run for a while before generating cryptographic keys because as the system is running it gathers more and more entropy and a lot of weak entropy ends up being good it's not the end of my talk i have another even more fun part so the take home for windows is to examine other windows systems which are probably secure if Microsoft is doing their work properly which they probably are doing and also understand the relation with the operating system denerciation with i think this is like very important and i don't think that anyone understands it unless you know someone was trying to abuse abuse this relation with linux the main problem was this class systems something else i wanted to do what we didn't do is about what happens with random nomen generators running inside virtual machines so a virtual machine okay you have it's a state of the machine and then you you can copy many copies of the virtual machine and they have the same value for the random number generator then you start running them here and there in different locations and if they don't gather enough entropy from the time they start running until they do sensitive things with you know with their data with with with one with random numbers then some things might behave bad okay so with so suppose you have one virtual machine it has a state then you make a hundred or thousand copies of them they all start running from the same value of the random number generator and if they use it at the same time it's probably not good if they used after getting a little bit entropy might be good might not be good you know what's the attack scenario you know i don't know but it's interesting and also you know people who use virtual machines they don't think about this okay and unless someone designs you know the operating system to support good security for that then okay there might be a problem okay and then okay and here okay for another discussion of implementations of random number generators and here it's not that the developers were not educate enough here it seems that the developers actually wanted to to to to to enable back doors into the system using a random number generator so this is what we say called the dual ec drbg saga okay cryptography back doors okay so it's best covered in the paper from this year ccs a systematic analysis of the juniper dual ec incident by a lot of authors i'm not going to read all the names okay but it's a very it's a very impressive work that they did so ec is based on elliptic curves so just the whole thing is like it's like working modular p okay but you work in a different group and points on a group satisfy this equation each point has an x and a y and for an x you have two values for y and for a random point the entropy is in x okay i usually think modular p not in elliptic curves it's the same so the discrete log and the differing helman problems are defined also in elliptic curves so g is a group of order q and p is a generator the discrete log problem here is that given q which is r times p for r which we don't know and is random find the value r so it's like the discrete log modular p but here we should find it's like doing for it for multiplications not for explanations and the ddh problem here is giving p q r and s where q is random determine if r is r times whether there's a you know lowercase r so that uppercase r is r times p and s is r times q whether these two values are random whether they have the same discrete log whether r and s have the same discrete log to the basis p and q or not this is just differing helman in multiplication okay so back in 2006 this suggested this generator okay uh what's so the state is p and you get this value t okay it's kind of refresh value and you multiply p okay p is an elliptic curve point you multiply multiply p by t then you take the x coordinate and then you take some bits out of the x coordinate you take an integer out of the x so it's basically it's the x coordinate okay and we do it two things one is to take it back and read it back here it's always read back here and then you multiply the x coordinate back into the into into p so you multiply you take p raised to the power of t take the x coordinate some bits from there that's the new t and raise p into that power again again again again multiply it again again again again you also take that s that x coordinate multiply q that's the other part of the stake by s takes the x coordinate take this output extract some bit and this is the output of the the generator so basically the state here has p and q you raise them by this you multiply them by this s here that goes back and then you take some bits out of q and you output them okay so the internal state s i the state is updated by computing s i plus one is s i times p and you take the x coordinate and again and the value that your output is s i times q you take the x coordinate you remove 16 bits and that's the output so this seems that okay first it seems like a very silly idea why use public key operations to design a random number generator we want the random number generator to be very efficient why use public key operations for that it seems silly okay but they said okay this is looks secure because it seems to be secure based on the Diffie decisional Diffie-herman assumption because okay basically what you do here you have p and q and then you multiply both of them times s s s one say and then you take some bits of of of this thing and you output them so if the decisional Diffie-herman assumption works then this is random so this like the last component for which you output things looks random so like according to Diffie-herman this is secure since that kind of the hand-waving argument and this is secure based on the Diffie-herman assumption this is why you want to use this okay so then people started talking about it what we output is the x portion of a point on the elliptic curve so basically it's a random field element however a random element in a field or in a group is not random it's not it's not that all of its bits are random so take for instance if we walk like in modulo p which is at 23 if you look at the numbers in z 23 stars the multiplicative group modulo 23 and you look at the group numbers there those whose gcd with 23 is 1 then the most significant bit here is not uniform actually it's zero with probability 0.7 okay so values like in these groups are not necessarily random and actually if you look there then they truncate 16 the most 16 most significant bits but people showed when it was published that the first bit that they publish is predictable with no it's predictable with this probability and if you look at 240 bit outputs then you can predict them with probability we can distinguish them from random with this probability which is not good so the output is not really random so these could have been solved this is because you truncate 16 bits so the most significant bit is very non-random and you truncate you should truncate and the only truncated 16 bits okay so what you got at the end was still not very random this could have been easily solved by you know truncating more bits but they didn't want to truncate more bits they just said that's the standard we're going to use that okay uh then there was a talk on the in the crypto ramp session in 2007 by Dan Schumauer and Neil Feiglinson for mic for microsoft and started on these are slides from the original talk on the possibility of a backdoor in the nists you know r and g okay so here's the attack suppose okay this the the system used p and q and the output we have the same s and you multiply it by p and by q and the output is given by you know taken from s times q now suppose someone knew the discrete log of q to the base p so it didn't it didn't generate random p and q they chose q so that they knew that q is equal to e times p they knew the discrete log of q to the base p okay and this is easy to just generate the numbers this way if you can generate them anywhere you want okay so the uh assume that there was no truncation so the output of the generator is the x coordinate of of r maybe that is of s i times q uh giving the s coordinate you can find the y coordinate because there are only two possibilities for it and now if we know the discrete log of q to the base p we only know this we also know the discrete log of p to the base q just the inverse so therefore if we know uh uh going back if we know what's the s s i times q we can also learn what s i times p is and then i can compute all future states because this is essentially the state so if i'm going back to the to this to the initial so the output here it's some bit of s times q and the state is s times p if i know this discrete log between q and p and i know this value in actuality i only know a few bits but if i know this entire value and i know this discrete log i can easily compute this state from this value and then i know the state and from that on point one i can compute everything okay so this is what it says and it's based on the attacker choosing p and q which are like constants in the standard so that the attacker will know the discrete log of q to the base p okay but you say the output is truncated so you don't know the entire q however the only truncate 16 bits so they're basically two to the 16 options for the passable value of q once you get the output so you can go over all to the 16 options and check which one of them corresponds to the future output so you can completely break the system this is not enough truncation so if you were truncating 64 bits then attack it would have to do two to the 64 work to break the system that would have been secure but the only truncate 16 bit and not 64 80 bits so this is kind of suspicious and these guys verify the attack experimentally and then they said oh this is the first node okay we're not saying that needs to intentionally put it back though okay but we are saying that it all the security depends on solving one instance of this elliptic curve discrete log okay and we don't know if the algorithm design actually picked p and q so that he knows the discrete log or not so we cannot use this okay and this was pre-nodin so you can that time you didn't accuse the nsa of anything because it was like people didn't think that's that's gonna happen and then they said suggestions one suggestion is to truncate more than 16 bits and the other okay to generate the random point q for each instance of the pr and g and something else they could have done which is very easy is how can i convince you that i don't i don't know that the discrete log of p to the base of q to the base p i just choose q in a way which seems to be unpredictable for instance apply sha1 to the entire contents of the new york times on a specific date or something like this so this kind of convinces us that this value is as random and therefore i don't know that this discrete log is that value okay but they didn't didn't do that actually there are some some cryptographic standards or constructions where actually this is how they define random constants to convince you that they chose them sort of at random okay so this no they had the one session presentation no one talked about it afterwards and it didn't seem to matter because who's going to use this r and g that's based on public ecrypto and then snowday snowden came and these are things that appeared in his in documents he revealed so this is the NSA program that says to insert vulnerabilities into commercial encryption systems networks and endpoint communication devices and down here is to influence policies standards and specific specifications for commercial public e-technologies which is exactly you know what could have been done here influence standard this was a standard and then this is you cannot read it but basically rsa security the company implemented this ecdsa okay and they got 10 million dollars from the nsa for doing that so they implemented this r and g and moreover it was the default r and g used in their systems okay because okay you as a company we we work according to the standards which use any you know option that standard enables us and in our in our i guess marketing material we just say that we work according to these nist standards which look look okay this looks this looks fine and then this from all options in the standard it shows to use this one and it's also was revealed that they got substantial amount of money from from the NSA for doing that and this is I'm sure it's hard to read so john kelsey who's from nist asked don johnson who was a designer of dual e c like this is an email where does this q come from and what he says q is essentially the public key for some random private key it could also be generated like another canonical g but nsa stopped this idea and i i was not allowed to publicly discuss it and john kelsey afterwards said you know he thought it was strange but he was too busy and didn't make too much out of it and it's probably that's probably the case okay but then you look at more snowden documents and what you see here it says juniper and juniper and juniper like like this says juniper is a major target because they control a lot of the firewall and router market okay so this seems like an evidence that the nsa put a back door into the random number generator used by juniper products which is a lot of the firewall market means that they could eavesdrop to communication done in through these firewalls but this is actually where things become even more interesting okay so last december juniper had this you know security advisory saying there's a security advisory enables attackers who can monitor vpn traffic to decrypt the traffic and it affects screen OS that's the name of the operating system in you know these and these versions okay no other versions are affected only these versions are affected so then people started looking what's happening what happened there including the authors of the paper i i cited at the beginning so apparently juniper was using this dual ec random number generator and they use a different q not the one specified in the standard and they don't say how q was generated so they use their own q okay and this they said something about after snowden we use no other q and they don't say why they chose that q just that this is the q that we use it okay so we don't know is the q that someone gave them is this the q that they put because they want to eavesdrop to traffic they just chose to use a specific value of q and something else they say is the output of this generator is never being output it's input to another p r n g this pipsa p r n g and it's only being used as a c to this p r n g so even if there is this vulnerability and you know the q you don't see the output because it's the c to another p r n g everything should be fine what happened when people started examining the code so basically juniper said from this version our systems are insecure okay until that it was okay what happened is that at some point like from the version where the system started to become insecure according to juniper the q changed so they shipped the system with an original value of q at some point this value of q was changed and what juniper is saying is that they don't know who changed the value of q so someone broke into juniper systems and into i guess into the not into the development you know systems of juniper are broke into the source code of the r n g and changed the value of q and from that point on the system manufactured by by juniper had someone else's back door working in them okay still they say it shouldn't be a problem because you know we don't output the value of the random number of the output of the random number generator it's only being out input to another random number generator however if you look at the implementation and the authors of the paper did then there was there's a bug in the implementation and somehow low output from the dual e c random number generator is being output to the network it's like a starter not so subtle bug so the back door is can be utilized so okay we don't know what happened why did they okay why did they use dual e c why initially they use a different q than the one suggested in the standard and then even more interestingly who changed the the q okay like who knew about the possibility of this back door someone who listened to the ramp talk in 2007 or someone who invests a lot of investigation and knew that the this uh you know the system had this bug which reveals out output from the random number generator and then this person broke into juniper and changed the value of one constant so that that person could use work to the communication it's no it seems like a very no it seems like something that only states could do that's that's very interesting so okay this affected juniper's reputation more over the reputation of of nist which for many years tried to distinct itself on the nsa like that they generate systems for standards for the good of the entire community not just for the nsa and uh you know we believe that nist was actually I mean they did try to to to build good standards but perhaps they were abused by the nsa for putting back those into standards uh this is a document also from snowden the amount of money the nsa puts into putting back those into commercial products it's like 275 million dollars in uh 2012 that's a lot of money okay that's that's pretty a lot of money you can do a lot with that third or so said some after that you know I'm so happy that uh I mean resisted pressure from intel to rely only on on on hard to run number generators also add you know other random you know also generate randomness from you know all kinds of system events because then we won't depend on one company this whole back the issue also arose the trust in sgx the new secure execution environment from intel because we don't know what's happening inside so the conclusion is that the nsa and probably many others uh willfully weakened the security infrastructure so that they can eavesdrop however this backfired in the case of the dual e c because probably someone else used the the the the backdrop that they put in there so it's so uh this is you know quite amazing to observe but this is something that's you know affects the way we communicate uh with regards to random number generators security is very important okay you saw that governments are probably willing to invest hundreds of millions of dollars in the security or insecurity of random number generators it's very hard to examine uh os base rngs even if you're examining the juniper code unless you reveal this bag that kind of outputs part of the you know low output to the network then you you think that it's not it's not the best but it's still secure so it's very hard to identify uh backdoors the generators of both windows and linux did not provide for what's secure that they're probably secure this time uh and they may have additional uh design issues and i think that's the last slide i'll just say again my uh you know an commercial announcement from the beginning uh we have this for those who came doing the talk we are running in barryland the seventh annual barryland winter school this year is going to be on differential privacy from theory to practice this is a technology that's now being used by google and apple to protect privacy of of users who report data back to the centers let's say and uh we have a very nice line of speakers uh whether it's going to be probably as nice as it's here right now but it's going to be in february and it's going to be a fun and interesting event okay thanks well the spec is open so it depends on okay even if the spec is open it's it's complex okay you have to trust intel to implement what's in the spec you have to trust the random number generator it's uh it's very hard for outside observers to verify what's inside uh i think it's also hard just to verify that the spec is fine but it's like look at us as academics this would take a lot of work what incentive do you have to you know to work on that uh like the most probable case is going to be that you're going to spend probably some person years on that okay and end up saying that everything's fine that's not a good use of no of anyone's time i mean it's very important for the community but it's kind of yeah it's very it's easy to change a chip at fab time in a way that's imperceptible so if if you think you have 64 bits of entry entry becoming in i can change the chip so you only have 12 let's say and 12 is not enough for you to notice that it's constant but enough for somebody to break it in a 2 to the 12 attack and this you can do you know even if you have all the specs open source everything at fab time and then you'd really have to look at a chip at a microscopic level to notice that this is happening which is really not feasible given the size of a chip so i think this is why you can't just trust hardware i'm sorry could you say that again okay thanks