 So, welcome back to our last technical discussion of this workshop. We will now talk about something on attacks on AES. It was widely believed for a long time that there were no attacks possible on AES, but this is a very special kind of attack called a side channel attack. So, this is the background we talked about before. There was deaths and then there was triple deaths and they were found to be insecure. So, a substitute had to be found and the substitute was in the form of AES after a grueling competition. So, the big procedure to find out a substitute for deaths and finally, AES was selected. You know this already typically 128 bits block size and key size, but you can also support 192 and 256. The number of rounds in those cases would be 12 or 14. So, in our case we are just talking now right now about 128 bit AES with 10 rounds. A quick review, there are 10 rounds. This is the first pre sort of pre round and then you have got 9 of these and then this is the 10th slightly different the 10th one is slightly different from the other 9. And as usual byte substitution, shift rows, mix columns, round key addition. Once again the way you would represent the plain text to start with. So, the plain text is coming in over here. The plain text is represented as an array a 4 byte array of bytes because the block size is 128 bits. So, there are 16 of these things each one having a byte it is 16 multiplied by 8 or 128 for the block size and you can think of this quantity as an array it keeps getting transformed this array will get transformed as it is going from step to the next step and so on. So, keep in mind this sort of a 4 by 4 array we will need all these little little pieces of information as we see the attack in progress. Very quickly those of you are familiar with fields byte substitution uses a 16 by 16 S box which is a table. So, we use table look up for this thing as we will see later on we do not need to use this actually in the software implementation. But in principle use an S box table 16 by 16 table where the i jth entry S i j is so if you take the entry in row 3 and column 7 it will be 3 here and 7 here in binary and then take the inverse modulo what? This thing the inverse modulo this thing we are talking about a field and those of you are familiar with fields this is the kind of field we are talking about gf2 raise to 8 which has 2 raise to 8 elements that is 256 elements each element is a string of length 8 bits. So, this is how you would represent a binary field each element is a string of 8 elements and if you take the ith row jth column the corresponding entry in the ith row and the jth column of this S box is this thing just concatenate the 2 things inverse of that modulo the irreducible polynomial and exclusive or with the bit string 6 3 that is 0 1 1 0 0 0 1 1 next step is byte substitution. So, you take the entry in the first row and you take the first row and you shift or rotate every entry by one position to the left then the third row two positions fourth row three positions I am sorry this was byte substitution now we have row shift each element in the ith row I line between 0 and 3 of the state under state array undergoes a left circular shift of I positions. So, this conforms to our template of a secret key cipher there are substitutions and there are permutations. So, this would correspond to a transposition or a permutation the previous thing would correspond to substitution we need to have both and they have some more of this permutation thing to make things even harder from the for the hacker and that is column mixing. So, what you do is you pre multiply the state matrix you got the state matrix you pre multiplied by this matrix again a 4 by 4 matrix that state matrix is also 4 by 4. So, you pre multiply that thing by this and each of these things is to be interpreted as 2 hex characters or a field in this field of in this field gf2 raise to 8. So, each element is a string of length 8 bits. So, 2 hex characters is 8 bits. So, this is a field element this is a field element and so on and very interestingly enough as you would want after a few rounds and output will depend on all the inputs. So, it will be thoroughly diffused as it goes along and then the last operation within a round is round key addition. Each round has a separate key obtained from the original key by using a key expansion algorithm. So, the algorithm details are in the book that is also a security issue obtaining the round key from the original key cannot be an arbitrary procedure it has to have some sense in it. So, the S box the round key generation all of these little things are important from the point of view of security. Basically, the round keys again 128 bits represented as a 4 by 4 array and it is simply x o rd x odd with the current state to obtain the next state. So, this is the thing you cannot make something secure if you do not know how to break it. So, our goal was to break it. So, for a while we thought you could not break it because nobody reported any then we saw some papers that talked about cash based side channel attacks. So, what is a side channel attack anybody launching a side channel attack on a cryptographic algorithm to get for example passwords from a cell phone or from a smart card etcetera etcetera. Using side channels like power and timing information how much time does this RSA thing take each encryption takes how much time how much power does it consume etcetera these all different side channels. One side channel that should be familiar to most of us in the computer science community is CASH C-A-C-H-E cash based side channel attacks. So, we focused our attention on cash based side channel attacks because we have been teaching computer architecture and we pretty much know how CASH works, but of course one of the problems is that these CASH systems keep getting more and more sophisticated over time. So, attempts at designing an implementing side channel attacks we implemented a cash based side channel attack on various machines starting with the Intel dual core and then i3 i5 etcetera. Now, nice as all this sounds those first four steps it is not actually implemented that way in your browser for example, your browser for instance uses open SSL and that is a software implementation. Now, what exactly goes on is a heavy amount of table look ups. Now let us see how big those tables are and the fact that they are stored in CASH means that it may be possible for a spy to spy on in a very indirect way by the way cash systems are very secure by and large, but this is a very clever and indirect way of trying to spy on what key you might have through CASH accesses. So, let us see what this idea is first and foremost these tables are organized as four of them each of them is 1 kilobytes. So, there are actually 256 4 byte entries. Now, let us try to make some sense of this there are the four tables I mentioned each having 256 bytes are really t0, t1, t2 and t3. Now, this is a very clever implementation. So, let me try to explain how this is what exactly all this means. I will just let you look at it for one or two minutes then I just explain what all this thing is. There are these variables x0, x1 etcetera these correspond to the output of that operation. So, in fact, the output of a round so, this x0 etcetera are the inputs to round R that subscript or that superscript that you see is the round number. So, all these are the values of that matrix before you get into the Rth round and this is when you finish the Rth round. So, it gets transformed by those four things right. Now, very interestingly enough you do not need to actually do any field operations which are costly in terms of time you simply have to do table look ups. But the cleverness is really, really amazing. This is a very clever implementation we will just see how this is done. So, once again there are four tables. So, this is the input I am not putting the round number here just to avoid confusion actually this should be round R and then the output will be here. So, I am going to multiply these two matrices I am not just going to put R plus 1 everywhere because it will look too crowded. So, this is the input to a round this is the output to the round again 4 by 4 matrix each matrix is actually a byte 2 hex characters and then I am pre-multiplying this by that you remember what was this thing what was this thing. So, in one step I had column mixing and what is that thing I am pre-multiplying this guy by this guy I like to visualize multiplication of two matrices like this this thing multiply by this plus this multiply by this plus this multiply by this plus this multiply by this is this. If I take this element it is the second row and so it is this column over here second row second column this multiply by this plus this multiply by this plus this multiply by this plus this multiply by this is this. So it's a nice pictorial way of seeing things. So this is what is happening, matrix multiplication, but we are going to make it very, very efficient in our software implementation. So I believe it is this. So 02030101 and the whole thing shifted. Now don't be confused with this whole thing and don't be fooled by this. This 02 is not just taking this number and multiplying it by 2 and multiplying by 3. These are field elements, okay? So it is actually field multiplication. I'm not getting into how it is done. That will take too much time, but it's really field multiplication. And you get this thing. Now the trick over here is, so actually this is the original matrix before it comes to round R. What are the things that will happen? It's going to have to be substituted. Each of these guys is going to be substituted by something using the S-box. Then there's going to be a shifting. So this X1 will move there and so on and so forth. Sorry, it'll move in the other direction. X5 will move here. It's a left shift. And then this guy over here will move there and so on and so forth. So don't forget the steps. Substitution, shifting. Then the column mixing, which is this multiplication. And then the exclusive OR of the round key. Now guess what is the beautiful idea over here in the implementation? What you do is, you take this X0 and you use it as an index into table T0. So I take this X0 and I use it as an index into table T0. And out will pop all these things. X0 multiplied by zero two. X0 multiplied by zero one. X0 multiplied by zero one. And X0 multiplied by zero three. I want to be as efficient as I can. I just send this index X0. Suppose this number is five, zero five. Then I will look at the fifth entry inside that table T0. And I will get what? This multiplied by this. How many bits is that? It is one word. Think about writing this in assembly language. It is one word. How many bits? 16, sure. This thing multiplied by this. 32 bits because it is, each one of these things is ultimately a field element. And a field element is eight bits in this case. So eight bits, eight bits, eight bits, eight bits. This multiplied by this. This multiplied by this. This multiplied by this. This multiplied by this. This just serves as an index into the table. Then, so it's really column operations right now. So I'm just talking about this column. I'm going to get this thing. I'm going to get this thing, this thing, and this thing. Then I will look at X1 multiplied by all these guys in turn. For that, I'll look at table T1. So again, I'm using this as an index into the table. It could be anything between zero zero and FF. So that means one of 256 possible values. I look at that entry. X1 multiplied by zero three. So I just have one assembly language instruction and on the bus, I get those 32 bits. In just one instruction. So then I take X1. The only thing over here is I shouldn't be taking X1. I should be taking X5. Why? Because of the shift. So even though it is substitution followed by shift, I'm just doing it differently. There's no harm in doing that. I'm doing the shifting first and then the substitution. And then the substitution and the column mixing is going to be subsumed in this table. This table will take care of doing both. The substitution part, because this has to be substituted before I do the multiplication. It's not just, so it is this index, but what is inside that table is this thing, perform a substitution on it, and then this multiplication. Likewise for this. So I get this multiplied by each of those things. This meaning, this after doing the, or rather, this is the index, but in that table will be not just X0 multiplied by 2, X0 multiplied by 1. It'll be X0, take the substituted value and then multiply by 2, then multiply by 1, then multiply by 1, then multiply by 3. Then I take this X1 and I get, so what the table contains is X5, the substituted value, and that substituted value multiplied by 3, 2, 1 and 1. And I have to look at table T1. Then I look at table T2 for this thing, and then table T3 for this. So this is the standard way in which it's implemented in software with some minor variations, which I won't talk about right now. So if you understand this, then this thing should be clear. So notice T0, X0, R. This is standard notation for representing an array access. Array T0, and I'm looking at the X0, X0 with entry. Then notice I'm not looking at X4, I'm looking at X5. Why X5 and not X4? Because of the shift. What should I look at after that? X10. Just check everything up. So I will take this, then I'll take X5, then I'll take this guy, X10. X6, X7, X8, X9, X10, and so on. And those are indices into these different tables, T0, T1, T2 and T3 in turn. So to get this column, I'm going to make one access to table T0, one access to table T1, one access to table T2, and one access to table T3. That's for this first column. To get the second column, once again, one access to this, this, this, and this, and so on. So you can very well see now how many accesses for one round, 16 accesses, 16 accesses of the table. Four accesses to T0, four to T1, four to T2, and four to T3. A total of 16 accesses. So now the question is if I'm, and these tables will typically be in cache. So you will have T0, T1, T2, T3. Let's again verify what is the size of each of these things. Quickly, what is the entry in this table? Each element is how much? How many bytes? Four bytes. Four bytes. How many entries per table? If you understand these funders, then the attack will become simple. Sir, there is one simple question. How this particular table matrix is designed? We can remember that this is 020311, but how it is designed? Oh, how? What is the concept behind design of this? Okay, so that is a lot of research into how secure this thing is going to be. So they found that you can make it, there's no harm. I mean, it's not like you have to have very complicated values over here for each of these things. Just having two, three, and even having two ones is perfectly okay. So there was a lot of security, I mean, investigation about the security properties of it, which we can't get into over here before they decided that it will be two, three, one, one, and so on. What is the reason behind deciding that Galloway's field equation XS28 plus it? That is the irreducible polynomial. So let me just tell you something about this field. Just like we do a field arithmetic, say in the case of a field over, there are two fields in cryptography that we are most interested in, prime fields and binary fields. So just like in prime fields, and indeed in the group case also, when we talk about multiplication, we talk about multiplication modulo p, a prime number. In this case, the analogy is multiplication modulo and irreducible polynomial. So we have to consider an irreducible polynomial. Now, how do we find that out and so on? We discussed that in a mathematics course. There are some books. For example, Alfred Menezes has a book on elliptic curve cryptography, where you will list the irreducible polynomials for different fields and so on. Thank you. So they just chose one of those irreducible polynomials. Okay, so coming back to this, this table T zero, each entry is four bytes. We need to understand these things pretty carefully. Each entry is four bytes, and how many entries are there? I'm using this thing as an index into the table. Only four. Only four entries in a table. This thing can be anything, right? Each of these things is what? It's a byte. 256, right? Because if it's a byte, there are 256 combinations. So depending on what this is, if this number is 254, then I have to look at the 254th entry and so on and so forth. 254 decimal, converted to hexadecimal. So there are 256 entries inside this. So what is the size of the table? So one K, one K, one K, one K. So total of four K, and that will very easily fit into most data caches. So now that's the thing I'm trying to do. I'm gonna try to spy on this data cache as the attacker. Before I continue, any questions about this equation, these equations? I think it's a really, really beautiful and efficient implement, very elegant, efficient implementation. Sir, can you please elaborate on strong collision resistance and weak collision resistance? That's in the context of cryptographic hash, right? Yeah, no, no, no. Here, actually in AES also, they speak about strong collision resistance and weak collision resistance, especially William Starlings discusses that. Okay, okay. Can we just keep that for a little later? Yeah, sure. Are there any questions about the implementation? Because unless we understand this implementation, we can't understand the attack. So everything is out here. You do four table lookups to get this thing is really the first column of the output, the output of a round. You take the first element in that column, then the second element in the column, the third element in the column, the fourth element in the column. This access will return a 32-bit quantity. So you've got four of these things exclusive odd and then you exclusive odd with the 32-bit, with the 128-bit key, again represented as a four by four matrix. So what did we do? We decided to actually implement the open SSL thing on different processors. We started with Intel dual core. You need to understand the statistics of all these things, the different numbers over here. This thing uses a unified L1 cache, 32 kilobytes, and only two levels of cache, two megabytes at L2, and both are eight-way set associative. Then Intel I3, which has a four core, separate instruction and data cache, each of 32 kilobytes. L2 cache is 256 kilobytes, shared between P0 and P2, and another shared one between P1 and P3 each eight-way, and then an L3 cache which is shared between all of them, all the four cores, which is three megabytes. The block size is 64 bytes. If anything is confusing, please raise your hand and ask. So everybody knows what is the block size. So the block is the granularity of transfer of information between the cache system and the processor. So whenever you request for anything or between L2 and L1, whenever there's a miss inside L1 and you get it from L2, assuming it's an L2, then you don't get just the one byte or the one word that you wanted, you will get the entire block. So most of the Intel processors have a block size of 64 bytes. The four AES tables, which occupy a total of four kilobytes, are placed back to back in virtual memory. They occupy a total of 64 cache blocks, 16 blocks per table, and 16 elements per block. Let's very quickly look at that. For each block of plain text to be encrypted, all four tables need to be accessed. You remember we said T0, T1, T2, T3, and so on? But not each block of a table is accessed. And that is the key thing. If I can spy and see which things were accessed and which were not accessed, I cannot see what is inside those tables and so on. I cannot see anything about their data structures. The attacker cannot see, but he can indirectly deduce certain things about what was accessed and what was not accessed. So if you look at this organization once again, so each of these tables is 10, 24 elements long, and each element is four bytes. Now I'm going to organize them inside cache. Cache is much fatter, 64 bytes. So it's going to be mapped somewhere in this cache. So say these tables now are, let's suppose they are contiguous. So since this is 64 bytes, and this is four bytes, I can have 16 of these guys sitting down inside. So this is one block, a couple of these. How many blocks for this table? So let's say even, yeah, sorry? P not number of rows, 256. I'm sorry, I'm sorry. 250, I'm sorry. So 256 rows and four bytes. So the question is, how many of these things do I require? So this is for T0. And then so there are 16 blocks over here, and typically they'll be in cache. So 16 blocks, another 16 blocks for the next guy, next guy, and so on. So that's why I said in this thing, 16 elements per block and 16 blocks per table. Is there any question about the slide? 16 elements per block. 16 elements per block. How much is each element? 4 bytes. 4 bytes. Each block is typical block on Intel machine, block size in Intel machines is 64. So that's why 16 elements sitting down there. 4 bytes, it's okay, but how are we getting 256 that? Why are there 256 entries in the table? Yeah, let's see, this is the whole thing. I wanted people, I want people to appreciate this thing. You're using each of these things as an index into the table. Okay, you're using this as an index into the table. You just say this is the address into the table and out pops out this thing after being substituted, multiplied by this guy, multiplied by that guy, multiplied by that guy, multiplied by that guy. Everything is made ready for you. The substitution is done. So these tables are, I mean, not just arbitrary, those T0, T1, what is inside that table? Say T0, what is inside that table? Suppose this number is 27, okay? Do one thing, pass 27 through that S box. So you get some value, which is again a byte, and then multiply that value, let's say you get the value 105 in decimal. Multiply 105 by two, field multiplication by one, by one, and by three, and that is what is inside that entry corresponding to this value. So this is really an index into the table. Now, because this is an index into the table, this can have, and as we know, this is a byte. If it's a byte, it can have how many values? Any possible value, right? Between zero and 255. So I should be prepared for this thing to have any value. Therefore, I should have 256 entries in the table. So what is the main idea in the attack? By identifying which blocks of the table are not accessed, valuable information about certain bits of the AES key may be obtained. So it turns out, and this is extensive mathematics, this is a paper by Shamir, et cetera. There were two versions of the paper, the more recent one was in 2011, I believe, and there was one in 2007. So basically the thing is, can I figure out which lines, which blocks of the cache are accessed, and which blocks are not accessed? So coming back to this figure, these are the 16 blocks of P0, the 16 blocks of T1 and so on and so forth. So what I will do is, I will try to figure out, as it turns out for a single encryption of a single value, all of these things will be, most of these things will be accessed. So these are 16, 16, 16, 16. So definitely this table will be accessed many times. How many times this table? Four times, and then you see there are 10 rounds, so 40 totally. So it will be accessed many times, this will be accessed many times, but the point is that even though the table is accessed, I have no guarantee that every single block inside the table will be accessed. So out of 16, if you work out the mathematics, the statistics, you will find that out of these 16 things, maybe two or three on average per encryption will not be accessed. Now that is the key point, and that's the brain behind all this attack. Which of these things is not accessed? If I know those, I can get hints about what is the round key, about certain bits of the round key. I'll repeat again. This is definitely going to be accessed because of the way this implementation works, but not every single line inside this table will be accessed. Based on my knowledge of which lines are not accessed, I will be able to get valuable hints about what is the key. Now what exactly are those hints and how we get them is a complicated story. It's about three pages of mathematics which we won't get into over here. But that is the basic idea. So now comes my attack. I would like to figure out which lines are accessed or more importantly which are not accessed in this. So how do I do that? So the trick is to actually come up with an array. So the attacker comes up with an array. He populates that array so that it fills the entire cache. So basically we're talking about L2 cache in this case. He fills the entire cache with his own data. Every single line of the cache he writes into. So this line, he writes into this line and so on and so forth. And then he allows the victim to start his encryption. So the victim starts encrypting and accessing this. Now mind you what happens. You know how cache works. So the attacker populates this entire cache. Now the victim starts accessing those lines, whichever lines of T0, of T1, T2, T3, then second round, third round, et cetera, et cetera. And then the attacker comes again and guess what he does? He starts to access his array. So take an element over here, access an element here, here, here, here. Now what happened? He finds the access very fast over here as ignoring the operating system and all the data structures from the operating system. He accesses this, this, this. Is it fast or slow? Fast? He comes here, suddenly finds it slow. Why? The victim has put his own stuff inside there. What stuff? He's been accessing these tables. The way cache works is whatever he needs, he puts inside this thing. So the victim has put his own line and the attacker's line has gone away. So when the attacker tries to access that element of the array, it takes slower time. So he'll find that in certain many cases it is slower and in some cases it is faster. I mean it's not as slow. So that means the one that is not as slow is the one that has not been accessed by the attacker. So he knows for example at the end of this thing that this and this and this, three out of these 16 lines were not accessed. He knows over here two lines were not accessed. He knows over here four lines were not accessed. He knows over here everything was accessed. And from that he starts getting valuable clues as to what are some of the bits of the different round keys and of the original key. So that is the main goal. So this is the main idea. By identifying which blocks of the table are not accessed, valuable information about certain bits of the AES key may be obtained. Once again, which bits and so on, I can't tell you right now, it'll take just too much time. It's a very intense kind of mathematics which takes almost an hour and a half to go through. So an attacker will, the result of this is, the attacker will be able to bring down the complexity of an attack. So if I'm doing a brute force attack, say a non-plane text attack, I have how many guesses to make because the key is 128 bits, two raised to 128 possible values. However, because of this cleverness, I can bring down the complexity of the attack from two raised to 128 to two raised to just 48. So this is within the realm of practicality. Now once I know that the key is, I've shrunk the space from two raised to 128 to something much, much, much smaller, two raised to 48. And then I can figure out the actual key just using my laptop. So this is the main thing, the main trick. The reference bit concept of page replacement algorithms cannot be used to decide which blocks are accessed and which one not. Certainly, there is no page replacement. There is block replacement. It is similar to what is done in operating systems where you have page replacement. The only thing is, in the case of the operating system, you've got enough time to handle page replacement. So all of that is done in software and you can be very accurate about it. In the case of block replacement and cache, you have to do it very, very fast, basically in one pipeline clock cycle. So there are new algorithms for that. I mean, there are very specific algorithms and approximate algorithms for replacement. But over here in my attack, in our attack, we are very sure that if I take more time, then the attacker must have brought his stuff inside. That's the fact that I'm exploiting, correct? I have populated every single line of the cache. For the time being, forget about other processes, put your browser off, put all the other applications off so that they don't interfere and bring their data structures inside the cache. I'm talking about data cache, okay? So I have populated the entire cache and then this guy is allowed to start. So obviously, he's running AES. He will bring the AES tables. And as I said before, most of the lines of that table will be accessed, except maybe two or three in each of those T0, T1, T2, T3. Then I will start to read the entire thing. Wherever I find that it takes more time to access my element, then I know it must be because my element was kicked out and the attacker brought, the system brought, the attackers line in that place. So you see the indirect way in which I'm doing this? I cannot read his elements or anything like that. It's a very, very indirect, very subtle way and very difficult way actually to implement in practice. So I populate the entire cache with an array. Then I allow the victim to start. The victim is assumed to be a storage provider. He is taking different people's transcripts, et cetera, et cetera. Your sales deed, your school-leaving certificate, whatever you want, you want to store it securely. So you give it to this provider. He will store it safely inside this vault. What does he do to store it securely? He encrypts it. So I'm making a reasonable assumption that he encrypts it with a common key. So you want to store your school-leaving certificate with me, I'm a storage provider. I promise that I will store it securely and without a problem. So you give it to me, I will encrypt it before I store it. If you ask me back for it, I'll ask you to authenticate yourself and then I'll give it to you back. All this for a small fee and all of you use my services. So I use the same key because there's no problem, right? I mean, it's not like you can just enter in over there. You have to authenticate yourself. So those are the underlying assumptions. What is the victim doing? He is a storage provider. He is encrypting all your certificates, et cetera and storing it safely with the same key. So in other words, AES is done again and again and again and again forever and ever with the same key. If it's not using the same key, I can't use this attack. So here is the attack. Access S times W blocks using the attacker's list of pointers to get the cache completely occupied. This is what the attacker is doing. So over here, W stands for, S is the number of sets and W is the number of, is the associativity. Everybody knows these terms, the associativity of a cache. So cache associativity, it's a set associative in the case of the Intel caches. So access S times W blocks, basically all the blocks and we are right now targeting L2. Access all the blocks in L2 cache using the attacker's list of pointers to get the cache completely occupied. Then control is passed to the victim. The victim is your storage provider. So he performs AES encryption and in the process of so doing, he brings precisely those blocks that he needs of T0, T1, T2 and T3. He brings them into cache because he needs it to compute the encryption. Then control passes back to the attacker. For each set, he does the same thing now. For each set in the cache, access W blocks which map to it. So W is the associativity, say eight for example. So access all the eight blocks in the set. Record the maximum time taken for access among the W blocks and plot this. So that's what we did. For example, we started with the Intel I3. And what we found is we were very unhappy because for many, many weeks we were struggling with this problem. We didn't find any higher access time. You can't see any plateau in this kind of graph. So we can't figure out where are those tables. The number 512 is the number of sets. So just to clarify, there are 512s in this thing because the Intel I3, let's just look at the L2 cache over here. L2 cache is two megabytes. So does that give you 512 sets? Just let's clarify quickly. The number of sets is the size of the cache divided out by the line size or the block size multiplied by the associativity. So you get two raised to what? This is the number of sets. So there are 4096 sets over here. 4096 sets in the case of Intel dual core and in the case of the other one, 256 kilobytes. So the number of sets in that case we work the same kind of arithmetic would be 512. 512 sets. So we plotted this graph and to our disappointment we found that we could not identify where exactly those tables are in cache. Now guess what? So that's what I'm saying. The problem is extremely difficult and frustrating. The problem is that in modern processors you use something called prefetching. This is hardware prefetching. You also have compiler-based prefetching. So if a block B is requested from main memory, block B plus one is also prefetched in anticipation of its future use. So this greatly complicates the attack. So what we did now is to modify the attack. It makes it more time consuming, but this is the attack. Now for each cache set S prime, we do the following. So there are two loops over here. The previous attack had only one loop. For each cache set S prime do the following. Access the W blocks using the attacker's list of pointers that map to the same cache set S prime. Perform AS encryption to bring all those blocks from amongst those 64 into the cache that were used for encryption. And then again control passes to the attacker. Access the W blocks which map to the same set S prime record the maximum access time amongst the W blocks. So there are two loops now. You're doing this for every single set. And finally we saw some hope. You can see a little bit of a plateau in the case of this Intel i3, which is the more complicated machine to attack. So this is where the actual AES tables are located. And then the second part of the attack tries to figure out. So most of these, as I just mentioned, most of these 64 blocks are actually accessed except for a few of them. And knowing those few, that's the second part of the attack, knowing those few, you can find out what are certain bits of the key. So the attack has got two steps. Number one is where are the AES tables? And the second step is which specific blocks and there are only a few of them are not accessed during a specific encryption step. So you do a complete encryption and figure out which of these things is not accessed. And from that you get valuable clues as to what are certain bits of the AES key. And the same thing repeated for the Intel dual core showed something else now. So security can be a very frustrating area, especially if you're doing research in it. And systems level stuff can be even more frustrating. So as you can see over here, suddenly we found that there are two of these things. So my students come to me and said, sir, I really find something strange. I'm getting these two plateaus. What could be the problem? Anybody can guess? Suppose your student comes to you and says, I'm trying to find out the location of the AES tables, but they're not all contiguous. They seem to be here contiguous. There's a little plateau in that four case area, those 64 blocks. Over here, it doesn't seem like that. It seems like there are two plateaus. What is going on? What is the page size on Intel machines? Typical page size by default, four kilobytes, right? The typical page size in Intel machines, 4K. So now what's happening over here? We try to understand this thing. And then finally, the light dawned on us that what might be happening is, this is my virtual memory. And the 4K very often is all contiguous in virtual memory. But there may be times when that 4K is not contiguous, it might be split between two things. So some of it could be here and some of it could be there. So now what happens, because we use paging in virtual memories, this thing gets mapped to, so this is virtual memory, and it gets mapped to physical memory. So the way this thing gets mapped is completely arbitrary, right? So this can get mapped over here, this can get mapped somewhere else. So some piece is here and some piece is over there. That's exactly why we see this thing in two pieces. Sometimes it's in one piece, but sometimes it's in two pieces. So I've talked about the first part of the attack. The second part is a little simpler and then enables you to get quite a few bits of the key, and then you have to guess the rest of it by doing a brute force search. So as I said, from 2 raised to 128 to around 2 raised to 40 or so. And the attack was finally successful. So any questions before we continue? What is the assumption that we are inside the victims? Yes, yes, you're inside not only the victims PC, you are imagined in cloud computing. You can have, typically you'll have a virtual machine. To you and a virtual machine, victim and attacker will have two virtual machines. It could entirely be on the same core. So we haven't as yet gotten into two virtual machines that it's gonna be much more complicated. Right now we are considering them to be two separate processes. So for the attack to work, many things have to cooperate with you. So for one thing, you have to put all the other applications off as far as possible. Otherwise they will all start accessing cache and your results will be completely messed up. Which programming language your students are using for implementing this? So we use basically C. And also there is a part of this thing that's written in an assembly language. The speed matters. The speed and the memory or whatever it is. So the profiler that you use does matter. That is an assembly language instruction called RDTSC, which is used. And I just like, those of you are interested in some research and would like either to get started or would like your students to get started on something. If you've got very bright students, you might want to give them parts of this. There was a very interesting article, a paper that appeared just in 2011 by some guys. I'll just write down the paper over here. If you do a Google search on AES and AES attacks, this paper will be cited. So I'm just gonna mention it. Those of you who are interested in research can look up this paper. It's a marvelous paper using operating systems, neural networks and fundamentals in different areas. So this paper is in front of me over here and it says, the title is. So these are hardcore practitioners. They come from an academic setting, but they are hardcore practitioners. Everything there has been done. They went to the conference, iterably symposium on security and privacy, and they actually demonstrated this attack in front of everybody on their laptop. So these are people from a university in Switzerland. It's called Bern University of Applied Sciences. It's a really marvelous sort of paper, I think. What kind of data will be there using in that array population? In the array population, you just take any array you want. The attacker populates an array. You can put any value you want. Basically, it's not the value that's there. It's whether you accessed it. How much time did it take you to access it? Suppose if the data, whatever the attacker is giving, it is the equivalent of data in the victim side also. So the time will not be counted in that. So what do you mean same amount of data? The attacker's data is a huge array which fills the cache completely. How much of the cache size, 512 kilobytes, et cetera? The victim's data is just how much? Four kilobytes. Just those little tables. So I am populating the entire cache. I don't know. The first thing I know to attack this thing, first part of my attack is where is this guy's table? So to find out where are the tables? So I start writing into every, or reading from every element of my array. Not every element, but every line. So if it's an integer array, each integer is four bytes and each line is 64 bytes. So I read every 16th element. Read this, this, this, this. See how much time. I time every single element that I've read. Then I allow the attacker to, I mean the victim to execute his AES. He will kick out all the lines of mine wherever he's planted his array, correct? Then again I read. Wherever he's put his stuff, it will take me more time to read. Why? Because there'll be a miss in cache. That particular element of my attacker's array will have to be fetched from L3 and put inside L2. That's how I know that he must have accessed this line. But I know he's accessed this line. I don't know which element in the line. We had seen there are 16 elements in the line. See, it's all very, very complicated actually. There are 16 elements per block. So I know he's accessed one of the 16, but I don't know which one exactly. That's why I've got to do this again and again and again. The only attackers can assume some data, but not the whole data. He can't take it from victim. Some presumption. I cannot see the data or anything of the victim. I can only deduce. Deduce size to two, two power 128, two power 48. Yes. Then he can use brute force attack. Exactly, exactly. So what I do by this, and I'm not telling you how this is done. Once I know the access patterns, I use this fancy algorithm to find out certain bits of the key. So from two raised to 128, I've reduced my search space to two raised to 40 or so. Now I can do a brute force on two raised to 40. That's not so bad. What two raised to 10 is one K, two raised to 20 is a meg, two raised to 30 is a terra and so on. So I've got two raised to 40. That's within the realm of practicality. When the attacker writes on the cache, actually the AIS algorithm uses different data, it seems. Right? So when it access, you don't think it uses it. It is. So does this victim AIS does not authenticate what it is using? Maybe it lands in a wrong encryption. Data values. The victim just accesses its arrays, right? Yeah. The victim doesn't know anything or doesn't care about cache. He's just accessing his arrays. No, accessing the array which has the wrong data, that is all time. No, no, how can that be? Cache will take care of it, right? Just because you have one process and I'm one process and you've got some data and I've got some data and we don't have enough space. So something that you have has to be thrown out and something that's from my data structure has to be put in. Doesn't mean you'll read wrong data. Yeah. There are tens of processes running. Do you think there's ever a problem with somebody reading somebody else's data and so on or messing around with somebody else's data? Computer architect will take very much care to make sure that none of that happens. That is why I cannot see anything that he's doing. I cannot see any of his data structures in general. I only deduce that he must have accessed this line. He must have accessed this line. I deduce it and then I very cleverly from that deduction, from knowing which things he's accessed and which he's not, then I can deduce some bits of the key. I cannot see anything that he's doing. There is a hard firewall between multiple processes in general. Otherwise, what is the access control? I mean, if any process is going to see any other process or mess with somebody else's data structures, heaven help us. The processor may be like transferring the pages and bringing back the, that is why it's spending in slow access. That's, so it's like this. The way cache works is, I put some stuff inside cache, correct? Suppose it is, I read something. When I read, now, when I read to the second time, if nobody has removed that page, I mean, nobody means there was enough space for somebody else's line and so on, then that line would still be there. So when I access it a second time, there's not going to be any difference. But if there are many guys competing, cache is a shared resource between multiple processes. If multiple guys want to access their own data structures, there's not going to be enough space for all of us. So somebody will have to be thrown out to make way for the next person. So I'm the attacker, okay, fine. I bring all my stuff over there. Now, the next time I go over there, I may not find all my stuff. Why? Because this guy brought his stuff. And in order for him to bring his stuff, he had to throw somebody else's stuff out when I got thrown out. So when I read my stuff again, the attacker, he will find that some things take more time. That's where I deduce that the victim brought his stuff in exactly those places. So I'm just thinking actually, see, it could be a very ideal situation where nothing is happening except that operation of AES. Yes. The concept of this is likely, unlike actually. Exactly. So there's a lot of trouble that could take place when you're launching this attack. There's a lot of interference from other processes. So that is why I said before, you want to put almost everything else off. Of course, the operating system still has to run. Yes. Put as many processes off as possible. And you can actually try it. So the students actually tried it. They performed different experiments. You put the browser on, put more tabs of the browser. And as you start doing this, there'll be more and more disturbance in your measurements. In reality, it should take more time than... What should take more time? Forget the time. I may not be able to deduce the keys at all if there is this noise. Because as you said, they demonstrated because even if I keep... Really, if you do have the time, those of you are interested in cryptography and in particular systems issues and the combination of those, I would very strongly suggest you read this paper. You will be surprised at the level of sophistication. I mean, I really cannot talk about it. We don't have time over here. But it's amazing what they have done. If you really sit down and think about it and maybe involve some of your bright students in this thing, you will just see that it's totally phenomenal what can this kind of attack. How they actually gamed the operating system. The attackers spawned multiple threads and so on. Why? So that after each access from the victim, the victim was preempted. So lots of... I've not mentioned all those tricks over here, but all of those things actually have been done in this attack. Thank you, sir.