 questions after the lectures and some things that I think are worth sharing in the answers. I would like to go through. We have five lectures, so I think there is time to, you know, it's more important to make sure that everyone hears the answers to these interesting questions than to rush ahead. So, first, to the topic of the first lecture, which was about hidden subgroup problem, I was asked the question, okay, what happens in the non-Abelian case? And he is just a brief summary. So, the problem is not really with the Fourier transform. The Fourier transform itself is somewhat harder to define and implement for non-Abelian group, but that's not the crucial problem why the non-Abelian case cannot be solved efficiently, or at least we don't know how to solve it. The main problem is that when you do this standard algorithm that I showed you, just prepare uniform superposition, query the function measure, the problem is that you don't know how to recover the subgroup from whatever you see there. If you remember, the most difficult or challenging part was to understand what is the information we get out after measuring the state, and it was somehow implicitly encoding the subgroup and giving constraints. Now, this is, this gives rise to an efficient procedure in the Abelian case, but for non-Abelian groups, we really don't know how to recover the group from this state. However, in principle, it should be possible to recover the subgroup, meaning that this algorithm is actually very efficient. So, if you prepare this uniform superposition of the group elements and apply this hiding function, actually the information is there. So, there is some fancy measurement that you could do and recover and learn which is the hidden subgroup. So, in terms of query complexity, you should be fine, but we just have no idea how to implement these measurements efficiently. So, the idea is that there are only so many possible subgroups, and each subgroup will result in a slightly different state. And if these states are in principle distinguishable by some measurement, well, just devise that measurement and implement it. We just don't know how to implement those measurements. But some special cases of non-Abelian groups nevertheless can be solved efficiently, and here I just listed some of them, like normal subgroups, solvable groups, nil-2 groups, and certain semi-direct product p-groups of constant nil-potency class. These are examples where you can find efficient algorithms. And a notable example is also Cooper-Berz algorithm, which is kind of in between. It is not efficient in strictly speaking, because it still has some exponential complexity, but it's much better than the naive approach. So, Cooper-Berz algorithm solves the hidden subgroup problem in the dihedral group, which is the symmetries of some n polygons. And the complexity here is exponential, but in the square root of the logarithm of the group size. So, this is nice, because naive algorithm would work with exponential time in terms of the group size. So, that would be 2 to the log g, but it's a square root under in the exponent, which is much nicer. And, of course, this non-Aberian hidden subgroup problem is very important, so people looked at it a lot, because, I mean, it would break additional cryptography, which is currently only weakened by Cooper-Berz algorithm. And also important examples are, is it starting again? It's a new mic. Okay. I don't know what to do. Okay. Let's hope that it's not starting to misbehave. Although it does. I think the battery is running low or something. Do you hear me now? Is it good? Great. Okay. So, yeah. And one nice example is, for example, graph isomorphism. And here, it is just a cute example showing that, for example, you could solve the graph isomorphism problem as well, using non-Aberian hidden subgroup problem. Here, the group is just a symmetric group, all the permutations on two n elements, where your graph has n vertices. And the function is just permuting the vertices of your graph and the other graph, which you want to decide that they are isomorphic or not. So, you just take the disjoint union, these two graphs. And while the subgroup, which is hidden by this, is just the automorphism group of this disjoint copy of G and G prime. And, well, there is this automorphism group contains an element which exchanges vertices in two graphs, if and only if they are isomorphic. And here, I just assumed that G and G prime are connected. If they are connected, the only way that element can be exchanged in the two graphs is that they are isomorphic to each other. And so once you get out, if you can get generators of this automorphism group, then you just check whether there is any generator which interchanges vertices between the two graphs. Okay. So, this is just a cute example. All right. So, hopefully, this covers the questions about this, what works and what does not work for the non-labelian case. All right. Then, I will return to questions related to the talk yesterday. No, not this one. Yeah. So, I realized that, yeah, I got some interesting questions and also some of the questions, I guess, I could have given a better answer to. All right. So, this was all regarding this continuous versus discrete Fourier transform. And if you remember, I spent the second half of the lecture explaining this connection between the continuous and discrete Fourier transform. And for this, well, this is a continuous Fourier transform is here. And we define this periodic wrapping of the function and its Fourier transform. And so, some connection between them. And so, this was the main result is, I think, some scalar factors were slightly off, but now I put it right. So, if you take your function and with period T, you are wrapping around the discretizing it endpoints, then you get the vector. Now, you take this vector and take its discrete Fourier transform. That's going to be the same thing as if you are first Fourier transforming your function continuously. Then you are periodically wrapping with a period which is 2 pi n over T. And again, discretize it endpoints. And if you are multiplying it with this right scalar value, then actually the wrapping of this Fourier transform is the same as the Fourier transform of the wrapping. And this is this commutative diagram that I showed you, that here we have the continuous functions f and f hat. The continuous Fourier transform is transforming this to each other. And we can wrap each of these periodically. And then, for these vectors, we have the discrete Fourier transform. And it's the same if we go one or the other way in this diagram. And it's nice because what is a bit more messy is this discrete Fourier transform. So, you want to avoid this arrow. Instead, you want to use this continuous version and then wrap. So, basically, this is the part that we want to avoid explicitly computing, because that's lots of summations, discrete sums, and things that can go wrong. Okay. And so, here I wanted to show you this version when you choose your function to be Gaussian. But maybe for explaining this picture, I first go back to Vanilla's estimation, because this is probably more familiar to all of us. So, here I represented this 16 mesh points on the surface of this cycle. So, my zero would be at this mesh point. And it's one, two, three, and so on. This is the 15th mesh point, which is 15, because we start with zero. And initially, we prepare a uniform superposition in Vanilla's estimation. And so, the distance from the cycle at a given mesh point would be this amplitude, absolute value of the amplitude. So, these green dots just represent the uniform superposition over all these mesh points that are my binary strings. So, I started uniform superposition. And if I don't apply anything, if the phase that I try to estimate is zero, then, well, that's an integer multiple of my meshes. So, then the Fourier transform, which is now denoted by these orange values, that will be, with certainty, the value zero, because the phase was zero. Okay? So, this is the case when there is no phase shift. And so, what happens when you do apply a phase shift? So, I just put here some phase, which is certainly not an integer multiple of one over 16. It is certainly not. And I try to see what happens. So, I still prepare uniform superposition, these green dots. And I apply this phase, as in phase estimation, and then Fourier transform. And then the red dots correspond to the output distribution, sorry, the absolute value of the amplitudes, which, by squaring, you get the probability distribution of the Fourier transform vector. And you can see that this 6 over 13, that's roughly pointing in this direction. So, we get, with high probability, some estimate like this. And there is something happening around. But even these kind of far away like opposite, almost the opposite phases, they still have some non-negligible amplitude. They don't lie nicely on the cycle like they do in the case when we had this exact multiple of one over 16. So, this is what happens in one of these estimation. It's really annoying that we get these far away points. And as you, as you shift your phase, you move from being perfectly aligned and getting a deterministic outcome to this mess of everything is happening with non-negligible probability. And this is what you want to avoid, because you don't want to get far away estimates. Okay, and now, so, this is what happens if you are using Gaussians instead. So, what I did here is I just choose some Gaussian function, f of t. And I shifted it so that it doesn't, its peak is not at zero, because, well, that's what I will get back to that. But so, I'm just wrapping this function around this unit circle for, with choosing the period length equal the number of mesh points equals 64. So, this is a Gaussian which is kind of peaked at this point, but slowly decaying first and then when you get to the opposite side of the cycle, it's already decayed. So, the variance, the standard deviation is roughly the arc length of this upper half of the cycle. So, this is the initial amplitude distribution that you start with. And now, if I don't apply any phase, so, again, the phase is zero, then I just get the discrete Fourier transform of these amplitudes. And that will be this, which is now, again, a sub-sampled Gaussian, but now concentrated around zero. So, even in the case when my phase is an integer multiple of 1 over 64, zero is certainly that, I get something which is not a deterministic outcome, but it's highly peaked around it. It has this nice symmetric Gaussian shape. Okay. So, now, let's see what happens if I apply a phase after preparing this superposition of these initial states, and then Fourier transform. I get something like this. So, again, I apply this same funky 6 over 37 phase, which is not an integer multiple of 1 over 64. So, my Gaussian is now, because it's sub-sampled, it's not exactly symmetric, but still quickly decaying. So, I don't get far away estimates with high probability. And it actually, this follows a Gaussian bell curve, but now sub-sampled at some certain points. Okay. So, previously I was asked whether this works for arbitrary phases, and it does so. So, the reason why I shifted this Gaussian is that it is already decaying around zero or around n, if you wish. And that's where your phases would wrap. So, you have your ticks from 1 to n minus 1. You apply u to the n accordingly, and you get the phase to the power n there. And so, when you wrap around and go from zero to n, well, then u to the power zero gives a different phase than u to the power n. So, there's a discontinuity in the phase, but it doesn't matter, because there the amplitudes are almost zero. So, the discontinuity is happening, where the amplitude is already decayed, and it doesn't matter if it's discontinuous or not. So, this is the reason why I chose the Gaussian that way, that it's peaked not around zero, but somewhere else. Okay. And now, okay, and I would like to explain why it works once again. So, the idea is the following, that this is my Gaussian back of this function f. If I restrict it to the interval zero t and wrap it around, that's almost the same as if I would just not truncate it, but just wrap around infinitely, because it has already decayed. So, this is an approximate thing due to the decay of the Gaussian. Now, the second equality, that is this connection between discrete wrapping and continuous Fourier transform, that's an exact relationship. This holds exactly for the wrapped around functions. Here, I removed the truncation, because it holds for the infinitely wrapped versions. And now, look at the other side of this equation. I have the wrapped around version of this Fourier transform of this Gaussian, which is another Gaussian, but now with a smaller standard deviation. And once again, it holds that this Gaussian is almost the same as if you would truncate it to this interval minus pi pi and then wrap around. So, truncation and wrapping around, they are almost the same if you are not truncating or whether or not. So, this is an equality and these are two exponentially close approximations. And if you just go through, since the Fourier transform is a Fourier transformation, it also holds for, I mean, this approximate value also holds for the Fourier transforms. And ultimately, we get that the truncated functions, discretized Fourier transform is almost the same as the truncated functions discretization. So, we have the exact inequality when you consider the infinite or the infinite parts of your Gaussian, but that's almost the same as just truncating. And after truncation, you can discretize and it means that the truncated discretized functions are basically each other's discrete Fourier transforms up to exponentially small error. Okay, so this is just, okay, and maybe I explain you why this... So, the reason why I choose my initial Gaussian to be shifted away that this continuity in the phases is happening where the amplitudes are already extremely small, but when you shift a function, then in the Fourier transform picture, you just get some phase multiplication pointwise. So, if it would be just a non-shifted Gaussian, its Fourier transform would be another Gaussian with inverted standard deviation. Now, I shifted it, so I get still this inverted standard deviation Gaussian, but now with some phases applied, phase multiplication pointwise. But that's okay, because I just care about the amplitude absolute value squares, the probability distribution, so it doesn't matter if there is some phases added. Okay, but it also works the other way around. If I prepare my initial amplitudes as Gaussians and I apply the phase, so remember that for a label t, we applied u to the power t, which my phase kickback applied t for this phase, unknown phase e to the 2 pi i phi. So my initial amplitudes, which were Gaussians, got multiplied by this pointwise phase multiplication, and this is the function which I denote by f of phi. And so I can tell the same story for this phase multiplied version. Once again, if I'm truncating it and then discretizing, that's almost the same as if I'm wrapping and then discretizing it, then I have the exact equality for the infinitely wrapped around versions and the discretization with the discrete Fourier transform. And so now I need to understand how this wrapping happens in the Fourier transform picture, in the continuous Fourier transform. So remember I did some phase multiplication here, so its continuous Fourier transform will be shifted by this value 2 pi i phi. And so my new Gaussian will not be piqued at zero, but it will be piqued at the actual phase that I want to estimate around that. So in particular, I need to truncate this function around that point, so it will be truncated around 2 pi phi and plus minus pi. And so here you might be puzzled. So what's happening here? My phi, my phase, that is only defined up to integer addition. If I add one to my phi that adds a 2 pi phase and doesn't change anything, that's the same answer as before. So what would happen if I would replace phi by phi plus 1 that would not change anything in my quantum states? However, if I am replacing phi by phi plus 1 in the continuous case, then it really matters. And so its Fourier transform will be not shifted by this phi value but phi plus 1. But that's okay because I'm wrapping periodically. And the shift is exactly, I shifted exactly by that much, which is the periodicity of my wrapping. So it doesn't matter if I shift it by plus minus 1 or any integer number, I will get the same thing and it holds. And in particular, this argument shows that it doesn't matter what is your phase. So this estimation procedure works for any phase. You don't need to assume anything that it tries in some subinterval of the potential phases. It works for everything. So that was a question that I got last time and I realized that actually you don't need any promise on that. Okay. Any questions for this? Yes? Does this one work? Can you hear me now? So yeah, you can see the desperately need quantum computers to design better batteries. So it's an important workshop. Okay. So your question was about, your question was whether we applied the same shift always. Yeah. So the shift is something that, so the shift in the initial amplitudes, that's something that we engineer and that's what we have control over. So that's a fixed shift and that only after Fourier transform, that's only some phase multiplication which doesn't matter from the perspective of the probability distribution of the outcomes. So the shift is something that we choose and the unitary which is unknown to us that defines the phases. But we have absolutely no knowledge about these phases. So we couldn't adjust the shift according to what phases we get. That's not possible. So this phase multiplication is just the unitary does that for us. Okay. So here's the image. You prepare this green thing is the amplitudes that you prepare and that is shifted by such that it's basically in this image it's the peak is opposite at zero. Okay. And this is what you prepare. This is just superposition with these amplitudes. And then you apply your unitaries and that phase kickback will multiply every of these points with a phase which is like of this form e to the two pi i phi t. Well, I don't know which is the Fourier domain. It is the beginning of the story. So here you are in this picture. We have the Hadamard's but instead of the Hadamard's you prepare this shifted Gaussian amplitudes. That's what you have here. No longer this uniform superposition but the new amplitudes. Then you apply your unitaries and that will apply the t power of the unitary given the label t. And that is by phase kickback is just going to multiply your label t with this amplitude e to the two pi i phi t. This is phase kickback because psi was an eigenstate with the eigenvalue e to the two pi i phi. Okay. So basically all the amplitudes that you prepared got now multiplied also with a phase that is depending on the eigenvalue of your unitary. Okay. And therefore you no longer have just the amplitudes defined by the initial Gaussian but also multiplied by the phase. And this is in the time domain or the original domain and then you Fourier transform. And after Fourier and then because it was a multiplication by phase at least in the continuous picture that's exactly corresponding to a shift in the Fourier transform domain. So your Gaussian which is more peaked will be now peaked around a two pi a two pi phi shifted point. And now it's peaked so it will be peaked around the phase that you want to estimate and that's the good thing you want that. So you prepare your initial state, apply the phase and after Fourier transform your protein distribution will be peaked around the true phase that you have. And this can be understood by this property of the continuous Fourier transform that shifting in one domain corresponds to 0.5 multiplication in the other domain. Yes. Any more questions for this? Okay. Well, I promised you Gaussians, right? So I told you that this is a nice way of doing phase estimation because it's kind of boosted you will not get far away outcomes with high probability. But I also told you that from statistical properties you often like that distribution has Gaussian noise. But this is, I mean, okay, you can recognize that it's sort of that this is something of a Gaussian, but it still doesn't look like a true Gaussian. It's just a few points. Okay, so you might want to have a better approximation of your continuous bell curve. How can you do that? So now remember that we choose always the periodicity to be the number of points in the Fourier mesh. And the other domain is always going to be minus pi pi. And so the more initial points I have the finer will be the mesh in the estimation. If you remember that if you have, yeah, this we have seen that basically as how many points you have initially determines how precise are the representations of your estimate. Now the thing. Do you hear me now? Does it work? Okay, so maybe this battery was a bit in rest. We'll see if the other one recovers a bit. In the meantime, okay. So somehow the fineness of the mesh depends on how many points you have. That's very natural. But the funny thing is that actually you can increase the number of points in the mesh without changing your query complexity. Okay, so now I placed my, is it off now? No, it's not. So I placed my Gaussian so that it was opposite of the zero thing, because that was the most economical way of putting it. But I could just increase these number of mesh points enormously and ensure that that this Gaussian speak then somewhere close to the zero so that it doesn't pick up very large d values. I can do that. And I only need to apply the phase where the amplitudes are non-negligible. So basically my query complexity will be determined by the standard deviation of this Gaussian. But I can place it anywhere in the cycle. I will place it close to zero. Just increase my number of mesh points. Once again, working. Okay, good. So maybe I can go to formulas and I don't know if it helps or not. So the main idea is that we will choose a Gaussian which has some standard deviation, which is something like square root the logarithm of the of the success probability we want to achieve over epsilon, the precision that we want. So there's two parameters, how precise we wanted, how certain we want to be about the estimate we get. And you just prepare and just start with this Gaussian state with this standard deviation. And then this Fourier transform will be inverted standard deviation. So it will have now standard deviation about epsilon over square root log one over that. This is just in the continuous image. We know that standard deviation exactly gets inverted by Fourier transform. Okay. And well, there's some Gaussian tail bounds. And this tells you that indeed if you choose your parameter such way, then you are basically that close to whatever is the exact picture you have in mind. But this is just telling you how big is your standard deviation and it doesn't tell you how big is n. It's completely independent on how you discretize your things. So you can just have this Gaussian and add a lot of dummy points with zero amplitude. They're exponentially small amplitude just completely ignore them. If a point has zero amplitude, then you don't need to apply the face correctly because nothing happens there. So you only need to apply the phases where the amplitudes are nonzero and those are nonzero when you are roughly below the standard deviation. But what happens is that the resolution of your Fourier domain will increase. You didn't change your query complexity at all. You also didn't change the accuracy, the standard deviation of your estimator, but you changed how many samples you get. So just by increasing the number of points and don't change and if you don't change the number of queries, you can make sure that these Gaussian curves that are now kind of very hard to recognize because very few points are on them. You can just increase them by, I don't know, make 24 new points between any two of them. Now that will look very, very continuous and you didn't change your complexity. The only thing that changes is that the quantum Fourier transform that you are using will use two to the, so 64 is two to the six, so six more qubits. Your quantum Fourier transform will use six more qubits, but it's super efficient, so you don't care. And this way, just by slightly making your Fourier transform bigger, you can make sure that you are extremely tightly sampling this Gaussian curve and then really for all practical purposes, you have a Gaussian noise for your estimator. Okay, so hopefully this convinces you that this approach is basically superior in all matters to the previous one, where even though you could ensure some fixed distribution, that was this ugly shape, and now you replace this distribution with a beautiful Gaussian back curve. And of course, I told you that this was a continuous curve in the unbiased case because you choose a uniform random phase, but that you can't do. You anyway would pick some finite number of width representation and then do that, do that way. So in practice, this would be also a sub-sampled version of this continuous curve in the previous case, and now you can achieve the same thing, but get a nice bell curve just for your distribution of the estimates. Okay, and all of this you get by this fundamental quantity, this fundamental relation between the discrete and the continuous Fourier transform. So basically, you understand that picture and come up with the best ever phase estimation just by understanding this nice abstract picture. Yes? Now, that's a good thing about it, that you don't need any median trick. Well, if you just want to ensure that with extremely high probability, so up to exponential small failure probability that you can safely ignore, you get an estimate that is epsilon precise. This is your goal, and this epsilon will be just the standard deviation of this Gaussian, which you control. And so the whole point of this is that you want to avoid this median trick and repetition and stuff, because that introduces new Ansela qubits. They are going to lie around. They will, you know, hinder your interference and garbage and so on. There are ways to solve it with other advanced quantum energy techniques, but that's still kind of ugly. And this is, you just do phase estimation once, but with the proper amplitudes to begin with, and you get a Gaussian noise, the best that you can hope for. You cannot hope for, so there is like this fundamental energy time uncertainty principles that if you are using your unitaries at most one or epsilon times, then you will never be able to get a precision estimate, which is more precise than epsilon. There is this fundamental variation, and this achieves that uncertainty, but also makes sure that far away estimates are exponentially unlikely, just like you would be, just like you did with the median based computation, but you don't need to take any medians, do any extra computation, just prepare your amplitudes, apply your control unitaries, Fourier transform and you are done. Your estimator will be already perfectly fine for any purposes that require this Gaussian noise. Okay, all right, so this was about this continuous versus discrete functions. Any questions for this? Because I will move to the topic that I intended for today and tomorrow. Okay, then let's move to the new material. Okay, so now this is the new stuff for day three and four, which is the Bernstein-Wazirani algorithm and quantum gradient computation, which is a new flavor of application of quantum Fourier transform. So let's start with the Bernstein-Wazirani algorithm. You are given, the problem is the following. You are given some function, which is a linear function over z2 to the d, and more precisely, you have some unknown s, which is a n-bit string, and your function is promised to be s times x modulo 2. So what it means is that take the product s times x coordinate-wise and sum them up modulo 2. And the goal would be to find s with making as few queries this function f as possible. And we assume that the function is given by an oracle, which given input x and some bit written in the output register, it basically writes the output of f in this output register the usual way. So this is the input that we need. It's a completely analogous to classical way of computing the function, so it's perfectly fine to assume that. But for the purposes of understanding this algorithm, it will be more useful to convert it to a phase oracle, which just applies a phase to the bit string x, which is minus 1 to the f of x. And so this is how you do it. You had a mart. You start with zero strings here and one bit here. You had a mart everything, applied the oracle, and then had a mart again. This is the algorithm. But so I wish to explain why this had a mart on the last bit oracle and had a mart on the last bit is actually a phase query. And I'm sorry, this bottom line should be a single qubit wire. That's a mistake. Okay, so I start with one and apply the had a mart transform, and I don't measure it afterwards, so I just can imagine that it's in the minus state, which is a superposition of 0 and 1, but with the minus sign. And so what happens if you apply this bit flip there? So if your function value is 0, then if you have 0 minus 1 divided by square root 2, now if you add 0 to the bits, nothing changes, you get back the same state. So if the function value is 0, then of course, nothing changes. So the zero phase is applied, but if your function value is 1, then you are flipping these two bits, and then you will get something which is 1 minus 0 divided by square root 2, and that's exactly minus times 0 minus 1. So indeed, your phase gets a minus 1 sign, and with phase kickback, it gets transformed back to the bit shrink x. This is a standard way of converting the binary function to a phase query. So from now, we can imagine that we have this phase query. And so the reason why it works is that simply applying this phase according to this s times x is the same thing as applying the had a mart transform on the bit shrink s, and therefore, after had a marting again, you get the element s with certainty. So I will put some exercise about this to the exercise sheet. If someone didn't see this algorithm, then it can work through this computation. It's a really online computation. So once again, we use the kickback and the power of Fourier transform. And here, well, it's a very nice property of this algorithm that it seems like it does something impossible. Because if you think about this classically, what you have to begin with is a classical function, which only gives you one bit of information. So if you query it on a single shrink x, then you get the function value f of x, which is a single bit. So with one query, you seem to only learn a single bit of information about your s. But it has d bits of information. So classically, it's trivial that you need at least d queries to this function to learn all the bit strings. And the key thing here is that this oracle in the quantum case, since it acts on all the d input strings as well, input bits as well, it is actually an operation on d bits. So in principle, it can transform d bits of information to you, and it indeed does, as this algorithm shows. So this is surprising from a classical perspective. And we would like to transform this nice observation to something which is more related to real-life problems, because this problem is rarely occurring in practice. And as a warm-up to the Jordan's algorithm, that is a version of basically multivariate phase estimation, I wanted to generalize this Bernstein-Wazirani algorithm to the case when you don't have z2, but zn to the power d. And you have a zn linear function, which is as before f of x equals s times x modulo n, where s is some unknown string in zn to the power d. And once again, we assume that this function is given as a phase oracle, which for given input x puts this phase to your state. And while this function value f of x, because it's now number between 0 and then minus 1, it divided by n, that is to make it consistent. And since my function was s times x, I can just write it this way. My phase oracle applies the phase to e to the 2 pi i s times x divided by n. And the algorithm is the same as before. You start with all zero strings, and you apply, I just wrote here, quantum Fourier transform inverse to the tensor power d. But what it really does, it just prepares the uniform superposition over everything, so you could in principle use Hadamard or anything that prepares uniform superposition. Then you apply your phase oracle, which applies this phase, and apply the Fourier transform default. And now we need to recall that what this quantum Fourier transform does, it maps the basis state j to this superposition. And this is the phase that it picks up. And these phases are then multiplied together for different coordinates. So basically, the state that you get after applying this phase oracle is a product state, a product of such states. And when you apply on every coordinate of the Fourier transform, then you learn all the coordinates individually. So this is a generalization of the Bayesian-Waziani algorithm over not z2, but over zn. And the same thing works because Fourier transform works that way. And now we want to, as usual, we want to change our viewpoint from something discrete to something continuous, so replace this zn by real numbers. And so we want to now approximate coordinates of some vector, which is a real vector, but to some precision. And so for this, we again define a grid, a grid discretizing the interval 0, 1, and 1 over n should be thought of as the precision that we want to achieve epsilon. And so if I work with this grid, with this scaling, so my numbers are no longer strings between 0 and n minus 1, but are strings between 0 and 1, both x and k, then the Fourier transform works as before, but now the phases should be not divided by n, but multiplied by n. That is just how you translate between the two representations of the numbers. And so here is Jordan's algorithm, which assumes that you have some phase oracle, now for some real function, which maps some d-dimensional vector to a single number. And so you assume that you have this phase oracle, which for given input x puts the function value to the phase for all of these mesh points, so all this d-dimensional hypercube on 0, 1. And you hope to recover an epsilon-core-analyzed estimation of the gradient of your function. And well, for this to work, you need to assume that your function is nicely differentiable, meaning that it's roughly linear. So what is the gradient of the function that supposedly gives good approximation of the function by linear function shifted by the original value? So if you look at your function closely and it's nice and smooth, then in a small neighborhood around x, it should be looking as the function value at 0 plus the gradient times x. This is what the gradient does, approximates your function by linear function. And so here is Jordan's algorithm. Once again, you prefer uniform superposition over all the grid points in this discretized hypercube. Then you apply your phase oracle, but you need to apply it n times to put the appropriate phase because we rescaled things. And so here I need to apply some approximation. I approximate my function by its linear approximation that I wrote before. So that will bring us a global phase depending on the function value at 0. And it will be a phase which is depending on x, but linearly. So the global phase, that doesn't matter, but here this linear phase that is exactly something that we have seen before. And this is now a product state, product of each individual coordinates and their sum. And therefore if you are applying a quantum Fourier transform joined together in parallel in all the coordinates, so qft over n by the d times for each coordinate, then you will approximately get back the gradient of the function. So this is Jordan's algorithm. And once again, it seems that it gives you exponential speed up, just like Bernstein-Wazien algorithm gave you. In Bernstein-Wazien algorithm, you want it to learn d bits with a single query. Here you want to learn gradient, which is a d-dimensional vector. It's only a single evaluation of your function. So it seems like a wonderful idea and a huge quantum speed up. You only need to compute your function once to get the gradient. And this is how this was originally described and motivated in Stephen Jordan's algorithm. But then it was not used that much by other quantum algorithms. And the problem is once again that actually classical algorithms are just too good. And so if you notice, here you had to apply your phase oracle n times, which if you can compute your function, that's fine. Just compute your function to a few bits, few more bits of precision and apply the phase accordingly. That's not a difficult task. However, this kind of assumes that you can evaluate your function to high precision. When can you evaluate the function to high precision? Well, when you have a good recipe for computing its values. And now that's where classical algorithms kick in. Actually, if you have a good recipe for computing your function, then there is something which is called the cheap gradient principle that under fairly generic assumptions, if you have a good recipe for computing the function, then computing the function together with its gradient is at most something like four times expensive as computing the function itself. And so this is something which is related to like back propagation in quantum machine learning. It's basically a clever application of chain rule. So somehow your function is built from elementary steps and you are basically applying the chain rule for every step and then back propagating from the function evaluation to the gradient computation. And while this is a huge deal because people want to train neural networks and so now they are even large software packages that do this automatic differentiation for you. So it's really a practical thing. And so we started with hoping an exponential speedup and we ended up with a speedup which is no more than four times, which is a four-fold speedup and the four times considering, you know, the difficulty of building a quantum computer is not something that you will see in practice. Okay, so this was maybe not the best motivation for Jordan's algorithm, but we shouldn't dismiss it because it has large potential, but for other purposes. It's a bit similar to Grover's algorithm. Actually, Grover originally described his algorithm as an algorithm for database search. Now, what is the single application where Grover search doesn't help you? It's basically database search. It doesn't help because if you imagine that you have a large database, you want to query it while we know that because quantum is noisy, you probably want to do this in an error corrected fashion. But if you are touching all elements of your database, now it means that you have some quantum gate sitting at every element of the database. So a single query to your database in this model will actually cost as much as many elements you have in the database. So if you have n elements, it will cost n. And on top of that, your Grover algorithm will run square root n times. So it's like n to the 1.5 times to find a single element. Whereas classically, you could just go through all the elements one by one and that's complexity n. So if you are describing your problem as a database search algorithm, you would have a very hard time selling it today when people care about this fault tolerance and error corrected costs. But fortunately, this is not the case. Grover's algorithm can be applied in situations where you have more implicit definition of what you are looking for and then you can prepare this on the fly without touching a physical database. And the situation is similar for Jordan's algorithm. If you have a function which is perfectly well defined, you can compute it with some arithmetic steps, then cheap gradient principle tells you that you have a very good quantum or classical algorithm for computing this gradient. However, if your function comes from something which is maybe even quantum mechanically defined, maybe some property of a quantum state or something else, then you have no chance of accessing that thing classically. And there you have a chance to get some quantum speed up. And so I don't think it makes much sense to go into these details. Maybe I will talk about applications tomorrow and I finish by one more explanation on what is the issue here with this linear approximation. So you remember, we wanted a linear phase for all the mesh points in our hyper grid. But your function is somewhat smooth but not extremely smooth. And one way you can make sure that your function is looking more linear is zooming into it. By Taylor's theorem, we know that if you are zooming into your function, then it will be closer and closer to being linear. And so let's try and see how it works. So that would mean that I would not look at my function in this zero one interval but I am dividing it by a large number r, which in practice just zooms into a smaller region of this function. But then also my linear phase factors get reduced by r. So I need these phase factors r times more to get back the same gradient. This is just a function transformation. The gradient of f of x is the same as r times f of x over r. This is just a plain derivative rule. And it means that if you want to make your function more close to linear, as it should be for Jordan's algorithm, then you can zoom in. But zooming in r-folds actually will make means that you need to apply your phase or a call r times more. And so this is not very efficient and you would want to do something different. And that is a nice trick that I don't have time to talk now, but we can start with this tomorrow. And until then, enjoy your free afternoon.