 This is joint work with Daniel Genkin and with Nadja Heninger. I think that both are here in the crowd In when we do attacks there is a life cycle that we always follow We start by finding a vulnerability and we tell that something may be vulnerable and then there is denial You can't do that. We don't need to fix that No problem So someone comes and issue a warning you need to really fix that and then someone Describes an exploit and when the exploits comes the vendor fix the problem and Later someone will Publish the paper probably those who found the exploit and in the exploit in that paper There will be the future work section that discloses another vulnerability So let's follow this cycle round one 2002 to know and others a publish a cash attack on misty. So no one really Took notice of that because it was misty and it was theoretical and didn't really work And so no one change anything no one issued any warnings But in 2005 Percival Published a real attack working against open SSL RSA So The attack that he did Works on the cash and we now go We'll now describe how the attack works Basically, the cash is a short is a small memory that sits between the main memory and the CPU and the purpose of the cash is to bridge the speed gap between the slow memory and the fast And the cash consists of Consist of sets it's organized as sets to which locations in memory map and the few ways which is the number of Memory locations that can be stored in each set and the mapping is very simple we take the memory and Some addresses in memory by a map to a single set for each address There is the set that it maps and it will always be stored in this set But it can be stored in anywhere of the ways of the set and if the cash has four ways in the set We call that the four way associative set So the attack Sorry So the attack or that Percival published in the same year Osviksha Mir and trauma published the same attack is very simple The attacker picks a memory a cache size buffer in memory Accesses this cache of these buffers So the buffer will be cached and the attacker fills the cache with the contents of these buffer and then we'll let It will let the victim execute a bit and whenever the victim access a line in memory This line will be brought into the cache and what it is brought into the cache that evicts one of the attackers lines from the cache So what the attacker does now is read back the cache and find where Read in which cache sets the reads are a bit slower And they will be slower in the places that the victim has accessed and replaced some of the attackers So the attacker can find out which cache sets the victim has accessed What can we do with that? To understand what we can do we first need to see the attacked algorithm And the algorithm that we show here is what's called fixed window exponentiation It's a way to do modular exponentiation. The algorithm is very simple. We first pre-compute two to the w values that are basically the base The base is a to the power of i so Calculate a to zero which is easy That's one and a to one which is easy We have a and a to two and so forth until a to two to the w minus one Then we do the exponentiation itself We scan our exponents as group of bits size of size w their two to the w minus one possible values We do w squares and then we multiply by the value of the i-th digit or the i-th group of the Exponent Beats of the exponent now an adversary that can Find what we've a copy we multiplied here can recover the exponent and the case of RSA Decryption or in a RSA signature and that would be a value that better kept secret So what Percival did is run the attack that I described earlier on a very similar algorithm and this is what he got This is a heat map of the axis to the cache where we have the cache line in the cache sets in the horizontal axis and time in the vertical axis and they The color or the the shade indicates how long it took to access that cache set And we look at that over time and we see these blocks here And if we compare it against the ground truth, we can understand can find that these are the square operations And when we don't have the blocks, we have a small line In some in one of the cache sets and the location of this line will depend on which value We multiplied by so this breaks the system so Percival published the attack Intel rushed to help and published a fix using a technique called scatter gather So what does scatter gather do? If we look at memory allocation, we allocate the data that we the multipliers that we pre-computed Standard allocation will allocate every byte of these multipliers in consecutive locations in memory So we'll have the first multiplier in memory We bite zero and then in bite one and then by two and so forth And if we want the two cache lines will have the first multiplier Say in the first three in three cache lines and the second multiplier will appear in three cache lines later And so forth until we cover the whole multipliers They're all of the multipliers. So basically the attack identifies by finding That's exactly what creates the pattern that we see the we see in the attack scatter gather Changes the order that the bytes are stored in memory In each cache line it stores one or a few bytes of each multiplier So now that's the in the preparation stage we scatter the Multipliers over the cache and when we need to use them The software accesses each of these bytes collects the data and creates the recovers the multiplier and multiply by it But from a fixed location. So we have accessed all of the cache lines and we don't see the pattern anymore on the cache Open SSL has gone through several Stages of modifications to improve performance and the current implementation or the one before current implementation used a slightly modified a approach the way they Organized the memory is that every group of four cache lines question lines head in contains eight bytes of each multiplier So if we have multipliers zero the first eight bytes will be in what we we now call being zero that's bite zero to seven of The first cache line of the first group and then we'll have bytes eight to fifteen will be in the first cache line of the second Group and so forth and each other each of the multipliers will have a different location in the case now if they only collect The locations that the multiplier is in that would leak information Because we'll see for multipliers that are in the first cache line will see activity on lines with that are a number a zero model of four and if we access Locations in the second cache line will see that in lines that are one module for so we'll see the difference To overcome that what open SSL does is access each of these and Then use some bit tricks to select the one that they want so in each access for example to Multipliers zero they access all of the locations in bin zero and They and they just use the ones that they need so this mitigates the attack So that's good We have the mitigation 2005-2006 Bernstein and post-fiction your trauma said this is not enough This is not enough because there is something called cash banks So we will now look at what cash banks are When processor started to become faster and do multi-scolar operations, they start doing multiple accesses to a memory in parallel and the tendency of computers doing things in parallel keeps increasing and the cache became a bottleneck So instead of having a single cache that have a single that has a single entry Intel divides the cache into currently 15 banks And if we have two accesses to to the cash that Try to get or change data in different banks They proceed in parallel if we have multiple accesses to the same cash bank Then one of them will be served immediately and the other will wait a cycle and This create timing variation based on which cash bank we access and the cash banks are within the cash so There is a risk there. So that's the beginning of the second cycle we have this paper telling us that there is a problem and in 2011 Any brookale gave the ramp session in chess saying that open SSL mitigates the Side channels because it accesses every cash bank every cash line in memory. So we don't have a Secret dependent access at the resolution a lower than cash in 2013 Bernstein and the Schwab a stood. I think it was here. Was it? Yeah, I don't and Said that there is a and demonstrated that there is that there are timing variation in 2015 Peter Schwab asked the world to Provide a real attack and if Peter Schwab asks we do so This is This is our work What we did is basically An attack that we call cash bleed. I think then for the same then Bernstein We have the code for the attack. We first take the time put it in store sorry in location and Then we just go and access memory locations that are spaced by 40 hexabytes. That's 64. That's exactly cash a size of a cash line and We added them to various registers and we just want to do as many cash accesses in that as we can and Register R9 the offset there will tell us which cash bank we access and We tie and at the end of these 256 Accesses we take the timer again subtract from the previous timer and we got how long this operation that took And now we start drawing graphs of what we get So if we have Our cash bleed running with nothing else happening in the other hyper thread We get the count at below 250 They show us that the processor is really doing more than one operation per cycle because we had at least 256 editions there and all of them Managed to get in less than two hundred and fifty six Once we add another a high pay another Program running in the other hyper thread program that just does compute nothing in memory We already see some changing the timing the reason is that the Core shares Multiple resources between the two hyper threads and the other hyper thread uses something else If the other a hyper thread if the other hyper thread run it runs a program that always accesses this Cash bank that we are trying to access We see that we suddenly we see the effect of these a a Cash bank collisions But this is not a real scenario in the ad our Victim will never access just one cash bank So we tried looking at a mixed load in this case. It was three computing Instruction and one access to memory Where the access to memory can be in one scenario access to the same cash bank that we are monitoring and in another Access to a different cash bank and we see that we get two different distinct distributions But there is some overlap so we cannot but use a single sample to And decide whether there was an access to the cash bank or not in real life These distributions are even much more closer together because the accesses are much more sparse than one in every three And they are not always going to the same cash bank. So we tried this thing on open SSL What we did is we captured a thousand They captured the trace of a running cash bleed a thousand times of a thousand traces of cash bleed running a lot in parallel with open SSL decryption And this is what we get if we monitor bean zero or any of the even beans we get the The line that looks like a top if we get monitor cash being one or any of the odd beans we get them a bit lower And we can clearly see that open SSL is doing something then drops and then do something get repeat doing something and drops again And this is something that we would expect the this open SSL uses the Chinese remainder theorem may RSA so it does to exponential and the difference between the lines We later found out that the reason that there is difference between line is the algorithm that they use for a doing the modular reduction It uses a hundred hundred twenty eight bits numbers. So it has different taxes patterns to odd and to even beans Anyway, if we zoom in a bit We see a pattern like that we see a group of nine a a Peaks followed by something a bit wider And this is something that we would expect because we have w being five. So we'll have five squares Followed by multiplication and we'll have modular reductions in between them. So that gives us ten and our and our multiplication, sorry, we have four four squares and and the W four and window for four squares and multiplication and the modular reductions in between that total of ten The more important thing is that in some of the multiplication. We see that one of the lines for the beans is a bit higher than the others and When we look at our a Tower grant truth we can find out that the bin Correspond the number of the bin corresponds to the three least significant digits. There are three least significant bits in the window Okay, so we have the data From there, I do not show the even beans here because they appear over here and The clash and the graph doesn't look really nice So that's beautiful problem almost solved However, here we are somewhere at the beginning of the exponent When we go to the end of the exponent, we have a bit of a problem they well open a cell does exponentiation the operating system runs and Timing starts to skew and we try to average thousand skew timers and we get what looks like noise So we pass that noise through a low-pass filter and Again, we start seeing Some nice waves and some of them are higher than others And if we collect all the waves we can just read the beats. So here is being seven a Bit higher than everyone else and beat seven again and being seven again and being a four and so forth So we just got all the the three least significant bits of each of the windows. So That's good, but that's only 60% of the beats of the a secretion of the exponent and For this is for 2048 beats are a say that leaves About 400 beats to guess So to guess the so instead of guessing these we used the Henniger-Shacham technique that basically spent as pants tree and collect collect information from both sides of the the information we know on both the exponents and Within two hours to CPU hours Takes about three minutes on Nadia's a machine We got the exponent So we're back at the circle we told open SSL open SSL issue the fix for that and We got our paper What opens a SLD is two things the first is they use 128 bit reads Instead of 64 bits and they use the original masking that they needed So even if they leak they would leak only two bits per Expand that's below the threshold that we need to do a Henniger-Shacham if we wanted The other problem is that instead of reading The same cash the same cash bank of the same being that each read they read diagonally in each line So each four accesses Cover the whole cash The order that they will access these four act to make these four accesses depends on the secret exponent But we don't know yet how to Extract that number or how to find this order that it's the effect is probably too small for us to detect it With the technology we have today so Let's bring us we notified open SSL about that the the response was You can't do that and you like challenges. So here's a challenge for you so we're at the beginning of the third round and now with that We'll have to leave until someone come as comes up with the effects so thank you everyone for listening and That's what I have to say