 And thank you to the other speakers in the session for adding about 10 minutes to my time. Now I can get to the talk. Also I want to mention that I'm continuing the tradition of the most junior person on the paper to be able to talk. Alright, so let's have a closer look at the algorithm to compute the curve discrete logarithms. So I only put this slide because I could mention the word Bitcoin. Always good these days. Okay, so all of these curves will be broken. But how broken? So what are the exact resources we need for actually computing the discrete logarithms? So when we started to work on this we were really interested to implement this. To really write down exact circuits and simulate them, test them to find bugs. And then precisely count all qubits gates and the depth by getting it from the implementation. There's a previous work on this by Proust and Zalka from 2004 and since then almost nothing has happened. So they describe the whole algorithm in their paper. But it's all text. It's very nicely written but there's no concrete circuit and some of these methods there. It wasn't clear to us whether that would actually work. So we really wanted to test their algorithm. Alright, so what is the discrete logarithm problem? As you all know, it decurves the rational points or the revealing group. And we're going to restrict right away to the large characteristic case that is commonly crypto, a large characteristic prime P, and we look at the rational points over that field. And then this group has usually a prime order or a small co-factor tends a prime. And the exponentiation in this group we write in additive notation so that becomes a scalar multiplication, n times P for example. And then the ECDRP has given two points P and Q of prime order r, for example, such that Q is a multiple of P we need to find out. Alright, so here's a very rough description of short logarithm. It has the specific quantum characteristic elements. So the first n is the bit size of the prime. Then the first step in this algorithm is to produce a superposition over two n plus one bit registers. So we chose n plus one bit because if the prime is n bits then you can get all the scalars with n plus one bits. So the first step is doing this superposition. And you see we have a third register here which has a derivative curve point. And then the second step, this is essentially just the evaluation of the classical function. So we're computing this multi-scalar multiplication over there. And then after that another typical quantum step, a quantum Fourier transformation. And this is done to do base estimation. In this case it's to find the period of this function that maps KL to k people. And once we have that we can do some classical processing to get the discrete log as a ratio of those two values k and l. Alright, so how does this look as a circuit picture? So on the left here we have those two registers where we start over with zeros. We do the superposition with Hadamard gates. And then this is how we do the multi-scalar multiplication. So all these points, p and q, we're given the problem in a classical form, right? So we know p and q so we can pre-compute all those. Nothing we need to do on a quantum computer. So all these points, those two power multiples we already have. And then we just make circuits that specifically do this point addition in a controlled way. On the right here we have Fourier transforms, measurements in the end. But this is actually quite wasteful on qubits. So there's a smarter way to do that, which is using the semi-classical Fourier transform. And that just takes this parallel, so to say, superposition thing and then do a Fourier transform at the end. It just does it serially on one qubit. This looks very convenient. You can really write something like that on the slides as well. So the standard things estimation picture looks something like that, right? So here the difference now is we do measurements on this qubit after each round. And then the next, these phase shift matrices, they depend on all previous measurements. So Borega had this factoring algorithm that uses this. That's the more or less best algorithm for factoring, which uses two n plus two qubits if you do it with the manner of implementation. Okay, so we're going to do this instead so we can save all these qubits. And then if you look at the details, so these are just very simple single qubit gates. They don't come to love and they're really not a big part of the computation. So we can just focus on the point addition here. And what we're going to do is we're going to express these things as Toffoli gate network. So the Toffoli gate is this doubly controlled knot gate. And it's known that this is universal for reversible computing. And it's very convenient. So it's a classic, okay, actually, so we can easily simulate it classically. And there's some advantages you can do debugging more easily. And even if you want to run this later on a real quantum computer, and this quantum computer has, for example, the Clifford plus T gate set implemented, then you can exactly implement the Toffoli gate. I think it takes about 70 gates or something. Okay, so this is the setting. Let's revisit our motivation page here. So the plan is as follows. We're not going to do the whole short algorithm. We're just going to focus on the point addition. We have a choice here. We can't really simulate, of course. We can't really simulate for proper parameters the whole algorithm. That would be fun. So we can either do small parts for real world parameters, or we do small parameters and do the whole thing. So we decided to have a go at real world parameters, and then just focus on the curve addition. Count everything in terms of Toffoli gates, because they are the most costly. And then, in forgetting the cost for the whole algorithm, we just multiply those gate numbers by 2n. Because we do so many steps. So what do we have to do? And I really like that we can go to the good, old, affine, short, viagra curve. So down there is the point addition, good old formula, and this is the case where we don't have any special cases. So neither of the points is the point of infinity, and they're not the same, and also not negatives of each other. And then this is the formula we have. Most people have seen this. So now this needs to be implemented in a reversible way. So we have to have an accumulator point here, some auxiliary qubits, and then we add this constant to the IP. And we would like to do that by having as few qubits as possible. So there's a generic way to take any computation and make it reversible. You just basically store a lot of the intermediate values and later you have to run it backwards or do the computation again to clean things up. But this usually blows up the number of qubits a lot. So we're trying to do something smarter. Also, we will use the fact that the modulus is classically known and constant in all these points, as I mentioned. So they were not using any qubits for storing these. So this can all be baked into the circuits. And then our objective here is to first optimize qubits and then the number of qubits. Okay. So here's a very simple observation. So this is again computing the point G3, which is P1 plus P2. These are formulas. And they are actually, I mean, it's a really rude law so we can just switch one and two. It doesn't matter. And then if we add P1 plus P2, what is P? Or do we go back? Subtract P2 again, right? So P1 is P3 minus P2 and we can write down the formulas for those coordinates. And I look at that and we get a different slope. So this number is actually the slope. If you draw the chord attention rule for adding two points on a lifting curve, you get exactly this number. And then you get the other one over there. And if you take any of this one or a bad formula and solve it for lambda, you will see that this lambda is actually the negative of the one going back. So this is not very surprising and many people have probably seen this before. But this is also what Prus and Salka used in their paper. And this is a smarter way to actually clean things up in that garbage and so on. So don't read this. This is just a straight line program of the elliptic curve group law in terms of functions operating on qubit registers. But if you look at these grantees, so here, so if it's the bracket with the one that needs a control data set, so we're actually doing the operation and if it's a zero, then you can see in the end we have done nothing. But here is the slope that's computed here. And then you can see here we're computing it again and cleaning it up this way. But we're using already the things we have computed along the way. So this is how it looks at a certain picture. Again, you can also see how many qubits we need. We need n for, well actually 2n for the curve coordinates and then n for the slope. Then we need another temporary register here. Oh yeah, and then we have this control. That's also interesting to note here. So we could write this down without the control. But if you want to control the whole point addition, you don't have to control every single operation. We just need to do these five once here. If you take them out, you can go through the circuit and see that actually in the end it does nothing. It does some computations and reverses them in the end. And that's a principle we've also applied to several of the modular arithmetic algorithms, which we're going to talk about now because now we need to look at all these functions. So these are subtractions of constants. Here's an inversion of the finite fields and multiplication and so on and so forth. Alright, modular arithmetic. We started at some previous work, so we took the integer addition subtraction circuits from the Takashi et al paper. And then, as I mentioned, because we have constant modulus, we regularly need just additions and subtractions of constants. And we took that from Tanner et al, where they basically did a very similar work for the factory algorithm. And now here you can see how modular addition looks. It's just an integer addition. And then what we do is we add those two integers. We subtract P anyway, then we look whether it gets negative and if so, we add it back. And then we compare to clean up this bit. For the doubling, you can do some optimization here. You don't have to do the addition. It's just the shift here to get the times 2. And for an odd modulus, you can just look at the last bit in the end whether you have subtracted P or not because it's supposed to be even if you didn't subtract P, otherwise it's odd. All right, so now it becomes a bit more interesting. Modular multiplication. So this is the approach by Proust and Salka. That's how they describe it in their paper. They say, okay, we take one of the integers in the X here in the bitty composition. And then you can write the multiplication like that. So you can go from the inside out. You add, conditionally, under X and minus 1, the value Y to an accumulator register. Then you double it and so on. It computes exactly this expression up there. And now here what we do, these are modular operations. So we do reductions in every step here. The addition, the doubling, and so on and so forth. So that looks pretty compact. You just only need these two n plus a little bit more qubits. But that looks pretty costly in terms of depth. And then of course, because it works, we can also look at Montgomery multiplication. The difference here is now these blocks are not modular operations. These are just integer additions. As a similar structure, we go through the bits of X and we again add Y to the accumulator. And then the standard Montgomery thing is to look at is the intermediate result. If not, we add P to make it even, then we can divide by 2. And this is just those three operations. So that looks, if you expand the previous modular ones, then this is way more complicated. And here it's really a lot simpler. There's a problem here though. We need some more qubits. You can see those down here. These are the bits that tell you whether you added P or not. And we sort of have to keep them around. We didn't find a way to get rid of them. So the problem here is we have to later run this algorithm backwards to clean them up. So we actually have to do it twice to get a multiplication. But still in the end, I'll show you the numbers. In a few slides, it was better to do Montgomery for reasons I won't mention. So we have multiplications, addition, subtractions. Now we also need an inversion. And for this, we deviated from the frozen Zaka paper. What they suggested is just to do regular Euclid, regular Euclidean algorithm, to really see how to do that easily. So this one is a lot easier. So that's the binary GCD. So you have essentially four stacks here. You just look at the numbers. It's the first one, even, you divide it by two. The second one, even, you divide it by two and so on. So fourth, you always construct things. You can divide by two. And then as in regular Euclid's algorithm, you have these variables that go along in the end, one of them will be the inverse. So this doesn't exactly compute the inverse. You have power of two here. But when you're working in Montgomery representation, you just add certain power of two here and multiply certain power of two to correct that. So this is already, this looks more complicated than all the multiplications. And so here is this while loop. And there's an upper bound on this 2M. So we have to run it 2M times. So that's what one round of this looks like. So first of all, start maybe in this little box here. So you can see this is actually the first step. This is if U is even, you divide by two, multiply the corresponding coefficient here by two. And that's the second one, third one and fourth, right? And all this stuff here just collects information about U and V. It just encodes it down here in these qubits and that's used here to select which case it is. So now we have to run this 2M times, but we don't know when it's done. And this is what is done here. We check whether the V is zero. So this is a flag qubit that essentially starts on one. It stays one until V becomes zero. Then it flips. And what it does, it doesn't do this whole block anymore. It just increments a counter down here. I apologize, this is not the same K as the one before. This is basically n minus K. Okay, so this is 2M times, but it looks complicated. But on the other hand, these are just integer traditions. So it's not a lot more than that multiplication. But you can see here. So we really wrote all this stuff in this liquid framework which is an abstract based quantum computing language toolkit. And then we have some functions that can count gates. I mean, in qubits we can just count ourselves. Okay, so you can see this double and add multiplication uses fewer qubits than Montgomery, but then twice as many gates. And since we're using this inversion, we have a lot of qubits lying around anyway. We thought we could just use those so we really can write it down by using only the qubits in the inversion and then save on the depth here. So overall, we just went for the Montgomery multiplication. Okay, and these are real simulation results for the mist-standardized identity curve points. So you can see, for example, for P256, we need 2,000 qubits, about 10 to the 11 top 50 gates. So the depth, we also computed automatically from the circuit generated by the program and it's usually a little bit smaller than the overall number of gates. So we had a long serial computation overall. So we got away with 9n plus something qubits and that differs slightly from the 6n predicted by Prus and Saka. So there are several reasons for that. One is that they use the regular euclid and they claim you can do register sharing. So these variables u and v and r and s, they occur in the regular one as well. And if you look, they started out with P, the number you want to invert, 0 and 1. And you can always pair two of them together and they will never use, so a pair of them will never use more than n qubits so they can reuse that, but it's very complicated to implement that, so we didn't do that. And then there's something, I think, they forgot a register in the original paper which they just dropped somewhere. We couldn't get rid of it. And then as a comparison to the factoring algorithm, so this on the same line, we've put equivalent classical security levels according to the NIST recommendations or the 256-bit line, we pair that with a 3072 bit RSA modulus. And then you can see, it seems to be harder to factor. So actually these two algorithms I think have a similar level of optimization so you can actually compare. There's some other algorithms you can do on the factoring side. But at the level we're comparing this, it seems to be fair here. So that would suggest that it would decurbs break earlier. That's it, thank you. What are the questions? The number of qubits reported there, is that before our correction? Yes, that's logic, if you would say. And the second one is, well, my knowledge about quantum engineering is negligible, so maybe it's nonsense. But that's the topology of circuit matter, so I see that very often wires jump from one qubit at the top maybe to one at the bottom. I wonder if that's some sort of extra hardness. Definitely it is. That we just started from the logical level. But definitely it is a problem. The layout of the processor is important and then you cannot just... For example, I split up some of these functions where part of it was at the top and it's probably not that easy in the end. So if you really want to implement that on a concrete computer with a two-dimensional layout, it matters that things are correct or not. So looking at these numbers, the number of qubits is actually quite reasonable but the number of topolygates is surprisingly high. So how difficult is it to build 10 to the 12 topolygates quantum pieces of hardware and I have no idea whether it is an insurmountable difficulty or 10 to the 12 topolygates is easy to do. Can you say a few words? I must say I don't really know. I wouldn't ask Martin for that. But I know that some people are talking about certain other applications where similar numbers occur and they claim that it can be done. At this point I doubt this is reasonable within a reasonable amount of time but I don't really know. Because everybody is discussing the number of qubits you need to hear about the number of topolygates so I am getting an impediment. I mean as comparison there is this paper by Martin and others on AES where the number of topolygates for breaking AES with Grover's 2 to the ABB 6. So in that comparison this is rather small but also we don't claim that any is broken. Any others? Thank you very much.