 Hi, my name is Fernando and I will be presenting Implementing Grover Oracles for Quantum Key Search on AES and OMC. This is joint work with Samuel Jakes, Michael Nettig and Martin Rotterer. As many of us know, in 2016 the National Institute for Standards and Technologies put out a call to standardize post-quantum cryptography. In this call, they defined five categories specifying the security of a scheme. Countries 1, 3 and 5 are defined based on the hardness to run key search against AES 128, 192 and 256. A fundamental question is then how hard is it to break AES with a quantum computer? The assumption is that AES is an ideal block cipher and that there is no structural weakness that could be attacked, hence the only known strategy is to use Grover's algorithm to speed up key search. Grover's algorithm works by loading the key space of size n on its input register and running about square root n iterations that check whether a key is the correct one for a set of fixed plain-text cipher text pairs. Terminating Grover's algorithm early greatly reduces the success probability of the attack. Since we are going to be talking about quantum circuits, first we should talk about how to cause them. We work with three possible cost metrics in this work. One is the cost or the depth of the circuit. The idea here is that each gate in a circuit takes some amount of time to be executed, hence the depth in terms of gates would be proportional to the time it takes to evaluate it. Another metric is g cost or the total number of gates in the circuit. This means that if a particular qubit is idle at some point it will not have a cost. Finally, we have the depth times width of the circuit. This captures the idea that even idle qubits that are not being operated upon have to be error corrected and this has a cost. If this error correction has a classical cost, this can be used to partially compare the cost of a quantum algorithm with that of a classical one. In other cases, different weights can be assigned to different gates. In particular, if we look at the Clifford plus T universal set of quantum gates, T gates are considered to be more expensive to prepare than Clifford gates. In the case of Grover search, Salka showed that optimally, if we use S machines to parallelize the algorithm, at most we can divide the total depth by square root of S. This differs from exhaustive classical search where the depth is divided by S. This means that trying to run Grover's in less time increases its cost. To have the time, one requires four times many machines and the cost doubles. To capture this, Niez suggested introducing a quantity called max depth, specifying how much total quantum computation can be realistically run as part of an attack. Niez does not relate to the coherence time of the qubits, but rather tries to capture the total amount of computation. Niez then uses the trade-off result by Salka to estimate the cost of breaking AS depending on max depth, based on the total depth D and the gate count G of running an unbounded Grover search against AS. They pick values D and G from graphs at all and obtain the table presented in this slide, that represents the gate cost of running parallel Grover search against AS. Our idea was the following. Niez cares about reducing depth of the attacks, but picks D and G from the analysis by Grassler at all that was instead aimed at reducing the width or the number of qubits used for the attack, assuming no parallelization. What would happen if we designed the circuit to reduce depth instead? In hindsight, parallelization is so bad that minimizing depth is greatly beneficial. To do this analysis, we implemented the AS circuit in the Q-sharp programming language. This allows us to write unit tests for our code, reassuring us that we did correctly implement AS, it's friendly to modify and work on, it lets us automatically estimate the circuit size and allows us to port already existing implementations of AS into the language. We also make the following assumptions. We only work with logical qubits. We don't assume a particular framework such as error correcting using the surface code, and this has two effects. On one hand, we ignore the cost for error correction, or the need to apply gates to physically nearby qubits. On the other hand, we don't get speedups such as free CNOT fanouts. Finally, we assume that swapping values between qubits can be done for free, by not swapping and rather keeping track of the swaps that should have happened. This is not necessarily a realistic model, but it's what the previous literature on Grover against AS had used, and hence also what implicitly NIST was using. So to minimize the circuit depth, we looked at the various components of AS. We started by looking at the S-box. We ported many linear programs to Q-sharp, and we ended up choosing one of Boyer and Peralta's designs that minimizes depth. Also, we've been partially scooped. In partially concurrent and independent work, Langenberg et al. proposed a similar S-box change. We also looked at the fundamental logic gates. The most costly gates seem to be AND gates. Previous work implemented it by using a toughly gate with a specific design. We replaced this toughly gate by a smaller design for an AND gate. This does not work as a toughly gate, but it's sufficient for AS and it's much smaller. Furthermore, it's a joint operator uses measurement base and computation, resulting in an joint operator free of T-gates. For key expansion, Grassel et al. cache some bytes of the round key that are costly to compute, but this is tricky to keep track of while writing the implementation. Instead, we look at in-place round key expansion. We load the key on the input wires, and in place we expand the bytes of the round keys. We can do this on demand so that we only expand as many bytes as are necessary at any amount of time. These results in saving depth with respect to pre-computing the full expanded key, because the computation of the round key can be done in parallel to the round. We also look at mixed column, and instead of doing a PLU decomposition as Grassel et al. do, we chose a recent design by Maximov that results in a shallower but wider design. Finally, we also fix a mistake that Grassel et al. make in computing the amount of plaintext ciphertext pairs required to uniquely determine an AS key. We also consider unbounded lower probability attacks by using less pairs. We're now looking at the cost reported by Grassel et al. in 2016. No parallelization is being considered. This column indicates the number of plaintext ciphertext pairs that we have. This one indicates the number of qubits that we have. These indicate the number of gates used, where M indicates measurements. Grassel et al. do not use measurements as part of their Grover oracle, where we do as part of our AND gate. Then we have the depth of the circuit, and here we can look at only T gates, or we can consider all the gates. Then we have the total gate count and the total depth times width of the circuit. Finally, we have the probability that Grover's search is successful. We now include the numbers from Langenberg et al. and from our work, and we'll point out the main differences. First of all, we have lots of measurements in our circuit. Yet we have no reason to believe that a measurement is more expensive than applying a quantum gate. And even though 2 to the 77 is a lot of gates, it's still less than the number of Clifford gates or T gates applied, so we don't believe that this should be a problem. Also, we reduce the number of pairs used, and this has an effect on the success probability of the attack. Finally, when looking at the cost of the attacks, we gain a factor of 2 to the 10 in the T depth, with respect to Grassley et al., a 2 to the 5 gain in gate cost and a 2 to the 7 in depth times width cost. Now, in the previous table, the only attack that could be carried within depth constraints was attacking AS128 within 2 to the 96 depth. For all the other combinations, we'll need parallelization. There are two ways of parallelizing Grover. One is called outer parallelization. Here the idea is that we have AS machines that are running Grover independently, and we stop them early to save square root AS depth. We notice that the success probability in this case can only approach about 91%, and so it's not possible to run a probability one attack by stopping early. An alternative strategy is inner parallelization. Here the key space is partitioned in SD disjoint subsets. Of course, only one of these subsets contains the key that we're trying to recover. Then, we run Grover search on these subsets of the key space. Interestingly, cutting by square root AS the depth will still give us probability one of finding the right key in its own partition of the key space. So this allows us to save square root AS depth and still have a probability one attack. Also, a side effect of inner parallelization is that since we are looking for a key in a smaller subset of the key space, it's unlikely that other spurious keys mapping the same plaintext to ciphertext will be present in the same subset. This means that we can afford running the quantum part of the attack using less plaintext ciphertext pairs. This results in less qubits being required. Now these are costs for the parallelized Grover attack. These two columns show that except for AS128 in MaxLabs2296, in all other cases we use all of the depth budget available. AS is the number of machines that are required to run the attack and W is the total number of qubits required. Here we can notice that these numbers are very big and that in the most extreme case, which is AS256 in MaxLabs2240, a quantum attack might require about 2 to the 200 quantum computers, that of course sounds implausible. And then if we look at the gate cost and depth times width cost, again in the most extreme case we might need executing 2 to the 245 quantum gates. So overall it might result in being more expensive than running classical key search instead. Some observations here should be made. One is that some of these costs are so big that it might mean that no attack exists. Say for example that we have a candidate for a category 5 scheme that does a similar analysis and it results in an attack with gate cost 2 to the 230 in MaxLabs2240. Now this is strictly speaking less than the hypothetical gate cost for attacking AS256. But it is still huge and it might be not practical, just like building 2 to the 197 quantum computers will likely be non-practical. So is this scheme as secure as AS256? In some sense no. But then maybe what these numbers are telling us is that there exists no quantum attack against AS256 and maybe also the numbers from this hypothetical category 5 candidate say that no attack exists, hence maybe it is in category 5. Also we should observe that so far we have assumed that there is no limit to the number of qubits available, but clearly qubits are not free and so maybe we should consider having a quantity maxWidth. The problem is then that if we have both a maxDepth and a maxWidth we might end up in a situation where no probability one attack against AS exists. What happens then is that maybe we should be talking about low probability quantum attacks against these schemes. Finally we have compared the table from the NIST call for proposals with the numbers from our analysis. This is the resulting table. By looking at the approximation column which tries to interpolate the various attacks we can see that we knocked off about 13 bits in some sense of quantum security against AS. Of course these remain highly impractical black box attacks. But at the same time in some sense it means that the proposed quantum schemes are somewhat more secure because it's easier to be as secure as AS. We also did a cryptanalysis of low MC. Low MC is a block cipher family designed for FHE and MPC and it's designed to have low multiplicative complexity. It is used as part of the picnic digital signature submission and since key recovery against low MC could lead to an attack against picnic we costed groverizing key recovery. We use the same techniques and end up having costs that are superior to AS due mostly to the bigger key expansion circuit and so to the best of our knowledge low MC is harder to quantumly break than AS in some metric. Talking now about future directions it would still be interesting to minimize depth of the AS s-box and other components. There are some approaches that use a quantum computer to trade-offs width for depth of unitary operators. The problem with these is that the resulting unitary cannot be classically simulated and hence it cannot be unit tested unless a big enough quantum computer is built. Otherwise we could try using classical circuit minimizers. We could also improve the implementation of low MC. There was a new technique presented at EuroCrip last year about shallower circuits but we didn't try implementing it. Also we could redo the analysis in the surface code setting. This max depth analysis could also be applied to some of the quantum attacks against the post-quantum candidates and the same could be done with max width. Finally we should know that our analysis does not cover in any way multi-target attacks. That's all I wanted to say. Remember to like, comment and subscribe and see you at the panel discussion. Thank you.