 Hello everyone, I'm Shivam Haseen and I will be presenting this talk which is seen in the middle that is SITM and What we'll be presenting is a middle-round differential crypt analysis attack, which is assisted by side channel and It is applicable to a range of his pinbox papers So this is a joint work with Yaku Priyay, Shalu Gu, Tirman Kujap, Proma Pu Siye and Siang Ling Singh Before the talk, just to introduce myself, I'm Dr. Shivam Haseen. I'm a senior research scientist at Temasek Lab, NTU Singapore Before NTU, I was working in telecom polytech in France and was also doing my PhD over there And my research interests are focused on physical attacks, that is side channel attacks, fault attacks, their combinations Then counter measures, certifications, as well as more lately hardware security of AI So in this talk, I'll be first giving some basic context and then directly the description of the SITM attack and how do we actually perform it Then we'll be extending this attack to deep ground shuffling and finally concluding it So let's start with some context So side channel attacks are now known for almost 25 years and there are several different taxonomy and different ways to classify them The most simple one is actually simple and differential side channel attacks So in simple side channel attacks, the idea is to use visual inspection to look for secret information and power EM measurement For example, looking for square and multiply in explanation And differential attacks are typically as a CPA on AES is done with a known input output to recover a secret key using statistical method This is either based on some leakage model generic like having weight or it could be profiled There are several variants of this attack, but in the most general sense, this is one of the most widely used attack in side channel attacks And most, if not all types of differential side channel attacks are limited to corner rounds Because they manipulate information related to known plaintext or cybertext and this is the main application setting So a couple of years ago, another variant of such attack was proposed Which was known as SCARTPA or side channel assisted differential plaintext attack So how does SCARTPA operate? So this attack is limited to only bit permutation based ciphers like present and gift And the idea is that you take a sequential or a microcontroller based implementation of present and you measure the power consumption for a given plaintext And then you change one level of the plaintext only, keep everything as the same and then you measure the power consumption again And just by looking at the power consumption difference, you could actually track the propagation of the difference into the deeper rounds like round one and round two And that reveals information on the key, so different differences can actually lead to different kind of propagation and therefore can reveal different information on the key So in this paper, we generalize SCARTPA like attacks in form of SITM to generic middle or deep round attack on a wider class of SBN block ciphers We validated these attacks on 8-bit AVR and 32-bit ARM microcontrollers We also claim to first ones to demonstrate this attack on middle round protected with shuffling counter measures and also attacking complex ciphers like a ES128 up to as deep as four rounds There are other contributions of this work which are not like the main part of this presentation, like there are results on skinny and present This attack can also be extended to other ciphers like gift, rectangle and Midori and we also propose a methodology to compute the number of rounds to mass to protect SITM But all these details are not in this presentation, so interested people can definitely take a look into the program Now the last slide of the context is attacker model, so our attacker setting is as follows, we assume a sequential software implementation that is running on a microcontroller for example It's a chosen plaintext attack because we are inserting known differences, so therefore that justifies the chosen plaintext attack We observe side channel leakage in the middle round, not in the first or the last round but somewhere in between as I explained that will be clear And what exactly we are looking for in the side channel trace is we want to identify if from one encryption to another if a particular intermediate value has changed So of course this attack can be applied on different targets but the target that we have in mind for the rest of the presentation is heterogeneous counter measures which means that Since corner rounds are more vulnerable to side channel attack or let's say most of the attacks, side channel attacks target only the corner rounds So the implementations where corner rounds are well protected for example with masking but middle rounds have either unprotected or have lightweight counter measures like shop So coming to the attack that is a C in the middle or SITM attack We can start with just understanding with the example of an AES128 So AES having a very nice diffusion functions like shift row and mix column Assures that a single byte difference inserted at the plaintext propagates through the whole state very fast So in a couple of rounds actually a single byte plaintext will make a difference or propagate all over the state And if you go the deeper you go the probability of that propagation becomes closer and closer to one But there are some exceptions that is if you if an attacker is able to choose particular set of differences or there exist certain differences that actually leads opposite So before getting debuts it starts with a convergence and then a diffusion will occur So this exactly is one example that we are interested in so we insert one diagonal difference into the plaintext such that they actually lead to a convergence of the difference to a single byte Which then in the following rounds will follow the similar pattern of spreading to a column and then to the full state But for the initial round we will see a convergence and which will be different from which will be a very rare event and it will be different from all of the executions And the convergence is what we intend to detect by side jump So we do a experimental validation. So the experimental validation with it on a 8 bit at mega idea where you could see a single convergence or the single difference can be detected in round two or that propagates to four differences around three similar actually can be seen for a 32 bit Amputex M3 where a single difference in round two can be formed by choosing index difference appropriately and then that propagates to round three to cause one column difference Now SITM exploits such cases and the methodology is as follows. So the first thing SITM does it insert plaintext differences And observe differential pattern in the middle rounds. Now since We don't know the key. It is not possible to predict what latest difference will actually cause a convergence and therefore we have to test many combination until we found a Like convergence in the differential pattern in the middle round And once a convergence is found that actually lead us to to recover partial key using the index pair. So for example in AS a convergence in this case will lead us to recover one column or 32 bits of the key and we repeat this for other columns independently And finally if for certain cycle for example a is 192 or 256 if we need multiple round keys to have the master key then we would repeat the attack on different parts. So the key recovery for example for AS 128 comes as follows. So in the first step which is we insert differences. So we take an example where we are inserting differences is 0, 5, 10 and 15th byte of the state and The probability of finding a convergence for this particular case or such particular cases is 2 to power minus 22. I will not go into the details of the number because due to scarcity of time, but the details can be found into the paper. And so we have a probability of 2 to power minus 22 to have a convergence to occur and we need approximately 2 to power 11.5 plain text to observe this convergence. So basically we generate plain text with such with differences only in these bytes and the other bytes remain fixed and we just keep on querying the AS and we observe the middle round and we see a particular differential pattern that is of interest exists. And what is this differential pattern it is basically one active byte in round two or four active by one active column in round three. So, you can observe at either places. So for example, if the first two rounds are protected then you can observe this in the third round. So, once you are able to actually detect this, this particular active byte, the key recovery goes as follows. So that you assume you hypothesize the value of the of the active byte and you propagate backwards towards the plain text, the plain text difference is known and that will actually help you to derive equations between the between the observed difference and the master key and that will actually be not satisfied by a single key but a set of keys and then you will have to repeat this experiment to actually filter the key candidates using new new index. And then this has to be repeated for all the columns. So, first you need to do par 11.5 to generate a convergence, then you have to repeat this. That cause an additional two to bar nine plain text, which which will actually when added together will give you the number of plain text required to derive one column of the key, and then you have to repeat it separately for all the four columns, and that all together comes on average to two to about 13.73. The attack can very well be extended to round four. So instead of observing a single, like one column to a single byte difference, we could imagine a case that is shown in the figure where you are looking at two columns active. And that actually leads to two bytes that are that are active in at the end of the first round, and that leads to a very special pattern in the fourth round. Now since the observation is the fourth round and coming back makes the equations more complex, the number of required plain text for this attack will naturally increase. And in this case, the number of plain text required are two to bar 27.5. The details of how these are derived are can be found into the paper. So to summarize the results, we look at different cycles in particular as skinny and present, and these results of course are applicable to other spin cycles as well which share a common structure, and the main result here is that with approximately two to power 12 to two to power 28 ciphertext, all these implementations were vulnerable to a set of attack, and the attack can go as deep as fifth round in ASN up to 12 rounds in skinny. So this I think is the key feature of this attack that it can go really deep if you are able to find differential patterns that that can be explored. Also, without going into detail how they are derived, the details are in the paper, like we do provide a methodology to calculate the minimum number of rounds that should be masked. And if you if you see across the study cipher, the number of rounds vary from like the minimum number of rounds of the study examples was 70% and this is only for the particular attack. If there's a future optimization of the attack, this number of rounds to mass might increase. And therefore, like the bottom line or the main conclusion from this, we can derive is that it is important to protect all the rounds, and not only middle rounds which also has been indicated by several books in the past. Now, we extend this attack to deep round shuffling. So again, I will start with a very quick introduction to shuffling. So normally, a normal algorithm like a is will will actually execute in a very regular sequence pattern. So for example, from one execution to another one operation will be followed by another operation and that sequence will remains the same which are represented as different colors in this in this example. And in a shuffle case, what and what a defender can do is basically just randomize the order of these operation so that they don't execute at the same time sample, and therefore increases the difficulty of performing attacks like correlation by by distributing the leakage over a period of time rather than being considered concentrated at a single time. So, in this actually increases the number of like distributes over n factorial possibilities for the sequence, and that actually leads to the difficulty of that. Again, shuffling is a countermeasure it's it's basically something which is causing, which is causing noise. So already, it becomes difficult to attack in the first round, but the deeper you go the equations become more and more complex, and therefore with this with the added noise from shuffling, this becomes even harder. And therefore, there are not a lot of work which have concentrated on attacking deep round countermeasures. So, what we demonstrated in the following is a following attack setting. We again take a case of heterogeneous countermeasure where the corner rounds are well predicted that is masking plus shuffling. But in the middle rounds the masking is disabled and only shuffling is enabled so that the designer can still win some performance over full ask implementation. So we have 16 s boxes, therefore 16 factorial execution sequences are possible, but since it's shuffle so averaging is not possible. And this leads to a low SNR. So, the attack procedure is as follows basically we are interested in a plain text difference. So we insert a plain text difference. And so there are two traces for each of the plain text, and basically which are represented by TR 0 and TR 1. And both these traces have 16 s boxes, and all these 16 s boxes, let's say we are able to find one P O I for simplicity for each s box. So we have 16 P O I for 16 s boxes in these two traces. Now what we what an attacker will do is basically it will compute a pair wise difference for each P O I. So in an unprotected implementation, this would be easy because the the the executions are not randomized. And therefore, the same as box will be subtracted to each other and if there's a particular bite active or not active can be likely detected. But in case of shuffling, since the order of between TR 0 and TR 1 will be different. This is not easy to detect. But still, if we are repeating enough number of time there is a chance that for some sequences to two s boxes are actually having the same same position in the execution for two consecutive seconds. So for example, if you're looking at this box number zero. So for if we test enough number of samples, there will be some cases where as box zero is executed at the zero position in TR 0 and TR 1 and we are we are basically interested in that. And what we do is like after taking the pair wise differences that is TR 0 minus TR 1 and TR 0 as TR 1 for the post position and all the 15 position we compute a sum of the differences which is known as D over here. And if there is no convergence detected, this value will be pretty will have a very will be close to some random value that like the little bounce on which can be easily determined. But if our convergence has occurred, the value of D will go down that will be actually deviate from that that distribution. And this is exactly what we used to detect the convergence. And once the conversion is detected, the key recovery follows exactly the same process. So here is a simulated and real experiments or from real phrases, which we compute at different SNR values. And you could see that when a convergence is offering and it is not offering the distribution is slightly different and can be distinguished from each other. However, this these two figures are only for demonstration because unless you're profiling you won't be able to obtain this these distributions easily. And therefore, the idea, but the idea that we can simply follow is that instead of trying to draw the distribution be just enumerate like from the minimum value of D until we find few convergences. So in a in the experiment that we did, we actually found up to four convergence in the top 20 values of D and that actually allowed us to recover. So finally, coming to the conclusion, if we compare this attack, there are several candidates which can be compared, which it can be compared to so differential side channel attack and SITM are different because one targets corner rounds and other targets middle rounds. But there are other known attacks like collision based algebraic side channel attacks or soft and analytical side channel attacks, which can target middle round. But how we differ for how SITM differ from from these attack is that first of all, it has a low SNR sensitivity as we showed like even in low noise as well as like noise generating countermeasure, it was able to actually recover the key and there is no profiling needed. We do need to know the point of interest, but there are several non-profile techniques to detect that. But as for example, Sascha, we don't need to to actually know the performance detail profiling. So to conclude, we presented SITM, which is a side channel assisted middle round differential crypt analysis attack. It is a generalized deep round attack on SPN ciphers, where the results were presented on a bunch of ciphers, including AES, KINEE present and these days sendable to others. We show that we can target AES up to five rounds, KINEE up to 12 rounds. This was the first attack on a middle round shuffling. And again, the results reinstates that there is a need for protecting all rounds and only protecting corner rounds is noted. Thank you.