 I think I'm going to start. So this lecture I'll be talking, and probably next lecture too, I'll be talking about quantum inspired algorithms or like the quantizing algorithms. And so the set of here is that we establish all of these quantum linear algebra algorithms. And what they were capable of doing is they were capable of taking, for example, you know, a blocking coding of say matrix A and a vector B and then transforming it to blocking coatings of some say like P of A times B. And the thing that people notice is that this part, the sort of the overhead is, I guess like the overhead is the degree of the polynomial. And so you can imagine that like I have my A and B that are blocking coatings. They're say like size 2 to the n, but they're being implicitly represented by n qubits. And so if I'm able to do this operation with only incurring some overhead D, then maybe I could use this tool to get faster algorithms for linear algebraic problems more generally. So for example, like as I discussed earlier, if we have A as like some sparse matrix or say it's like a sparse with say like computable entries or something, then what we can do is we can actually, and same with B, I guess, then I show that we can actually construct these blocking coatings and these blocking coatings are efficient, meaning that they only scale in like the sparsity. And so we can actually use this and maybe somehow get something from the output like some inner product of some like estimate here. And if you combine all the pieces together, you get something that takes time, say like polylogarithmic in your input dimension, in the dimension of A and B. And so, you know, if you, if you can imagine if you gave me A and B and gave me a classical computer and told me, okay, estimate like the first entry of P of A times B, what I would do is I would compute, I would just perform the polynomial multiplication and then this would take time linear in the dimension of A and B. But somehow using this, we're doing significantly better. And so there's been a lot of work trying to see how, how, see whether that you can, you can instantiate this to give speedups for real-world tasks. And so one of the notable examples of this is this algorithm by Harrow, Hasidim and Lloyd for sparse linear systems. And it is basically what I just described except you have this, okay, the choice of this polynomial is going to be the one that is approximately the inverse. Or I suppose you could say it's like the inverse or pseudo-inverse. And this will scale with like, if you worked it out, what you'll get is like a runtime, a gate complexity that scales with your sparsity, the condition number of your matrix, and like log of your dimensions. So here I'm saying these are, I'm going to call it here M by N now. And so this is, it's going to scale like this. And this is, this is much smaller than the runtimes that we know how to, than the best runtimes we know for these classically, which is, I think it's, first time. That's right. Yes. So I'm being a historical in my retelling of this. I actually don't remember. Does anybody know what the polynomial is for the original HL? I don't. I mean they definitely wanted A inverse B. Yes. Yeah. They might have used a, actually, okay, I don't want to say something wrong. I was going to say like, they might have used like a Fourier series, but I'm not sure. Yeah. Yeah, it's a good point. So I'm saying, I'm talking about this as if these blocking coatings are already listed at that time, which is not the case. But so over the course of this sort of quantum machine learning literature that began to develop after this, after this work, there was one other strategy. People were wondering like, okay, what I need is I need some way to get these blocking coatings. And so how can I get these blocking coatings? Or they didn't say blocking coatings, but that was the question that they were not knowing that they were trying to answer. And so people started using this other notion of having your input A and B in some sort of data structure. So the idea is that what you could do is that if you have A and B in like some state preparation data structure, then what you can do is in the same way that you're getting a blocking coating here of, I should say like, you get a blocking coating that sort of looks like A over S and then B over the normal B. Here you're able to get blocking coatings of A over the Frobenius norm of A. So imagine A as a vector, then this thing would be unit norm as a vector. And also B over the normal B. And so there, then what you can do is you can get your blocking coatings of PSV of A and then do whatever you want with it at the end. And the nice thing about this is that in this setting, now A and B can be arbitrary. So they don't have to be sparse, which is convenient if you're working with, you know, you want this thing to be applicable. So you want this A and B to be data that comes from anywhere or some data sets, some machine learning data set or something. And so the idea is that now instead of incurring some overhead, I was being a little bit sloppy here. There's some overhead that's like the degree of D, but once I rescale, I need to pay some additional factor that's like S. And here I need to pay something that's like the degree of P times like the Frobenius norm of A here. And if my Frobenius norm, if my matrix A has low stable rank, so it's like, essentially it's being close to low rank in some sense. Then this Frobenius norm is going to be close to the spectral norm, which we're taking to be one in this explanation. And so basically this algorithm is efficient, provided that your stable rank is low. And for example, these data sets you might expect to be low rank because they're coming from the real world and you're hoping that if your data is like explainable then it can be explained with some few, it can be explained with a few features or like some small number of linear features. So here S is the sparsity, yeah, A and B are S sparse. Okay, P of X, cap is the condition number of A, yeah. Okay, and so essentially there are basically these two regimes for how you try to like apply your QSBT knowledge to machine learning tasks. And there's a bunch of different proposals that use one or the other to try to claim exponential speed up. And what these quantum inspired algorithms are, what they're able to show is that in this context, in this second diagram, these algorithms don't give exponential speed up. So yeah, so I should say this doesn't give exponential speed up. So when I say it doesn't give an exponential speed up or like de-quantizing, this is a pretty subtle notion because at any point, in your quantum algorithm, if you wanted to mess with me and give yourself an exponential speed up, you could. So for example, say my algorithm has an output state, then what I could do is I could take the 4A transform and then measure some inner product with the 4A transform. And then suddenly, your algorithm is still paying quality logarithmic time. But then my algorithm is going to have to pay linear time. And so this notion of de-quantizing is somewhat delicate. So feel free to ask questions. But basically how it works is that for all the applications, this doesn't happen. And so indeed, this is what I'm saying here is true. Okay, so okay, I am not claiming to de-quantize this like sparse HHL version. But the state preparation thing is like it's very close to being like fully de-quantized. Yes, yeah, that's right. Right, okay. And so the way that we do this is that we have this like state preparation data structure. And what we're able to do is we're able to construct some analogous notion to a blocking coding, which I'm going to call, I guess, Sampling Query Access. And it holds that these state preparation data structures also, okay, so what's happening if you're using these data structures is that you're imagining like, okay, I'm getting data. And then as I'm getting data, I'm storing it so that my quantum computer can use it later. And when I'm storing it, I might be storing it in some quantum random access memory, which is just some speculative type of hardware such that I can query to it in superposition, which is different from like your classical RAM, which I don't know how to query in superposition. And so the idea is that I have my quantum algorithm that does a particular thing with this data structure. And then what I'm going to do is I'm going to access it classically. I'm only going to query it classically. And I'm not going to use any of the quantum parts of this quantum RAM. And so this that I'm going to develop is I'm going to develop a classical algorithm that gets a similar runtime and it uses the same input data structure. Right, so I'm going to define this notion called sampling and query access. And then I'm going to show how you can get sampling and query access to something that's like approximately P applied to A. And this approximately is like, it only holds only if the Frobenius norm is small. So you can imagine this as being like the error is like epsilon times Frobenius norm or something, Frobenius norm squared. And then I can use this to, for example, estimate these inner products as desired or do more linear algebra. Okay, and so I'm going to give this classical algorithm and this classical algorithm runs only polynomially slower for some possibly large polynomial. For what task of? Yes. So I believe if this error was in spectral norm instead, then you would decontize HHL. Yeah, so that's the important thing. Yeah, yeah. Right, okay. So I'm going to talk a little bit more about what the quantizing means and then I'm going to get into the sampling and query access stuff. Right, okay. So hopefully, okay, I'll give an example that hopefully will motivate sampling and query access. And that example is the swap test. Okay, so what the swap test is, is it's this algorithm that can estimate the overlap of two quantum states. So let's suppose that we have two vectors and then they both have unit norm. And what we want to do is we want to compute their overlap here. So they're inner product, magnitude squared. Then there's an algorithm that can do this with copies of the corresponding states. So I'm like, for a vector I'm defining the corresponding state to be equal to the state with the amplitudes corresponding to the entries of the vector. And what this algorithm is, is we start with states corresponding to each of our input vectors. I perform a Hadamard and then a controlled swap. And then I measure this output. Okay, and if you do the math about what the circuit outputs, what this measurement is, is it's one with probability, one half minus one half. So it depends on this overlap. And so if you ran this circuit enough times, then what you could do, so if you like, with one over epsilon squared runs, you get like an epsilon-good estimate of the overlap. Okay, and in order to implement this controlled swap, you need, I guess, here, this is like log d gates or two cubic gates. So the total runtime will be gate complexity log d over epsilon squared. If you're not trying to like improve this epsilon squared. And so you could imagine someone saying, okay, I have my phi and my psi and I'm able to estimate their overlap and I'm able to do it in time that's only log d. Instead of, if you give it to me classically, I would compute the inner product and it would take d time. So you can ask, okay, does this give me an exponential advantage for my quantum linear algebra algorithm, my quantum machine learning algorithm. And this is like, I mean, it sounds far-fetched that you would expect some sort of speed up from this simple like, I guess, like this overlap estimation task. But for example, like, if I, there's like, certain minor points of this that would make this, if you change this in some minor ways, then it would actually genuinely become hard, right? If I described what phi and psi were in terms of quantum circuits, and then I gave it to you, then you wouldn't be able to answer this overlap question. You wouldn't be able to answer this overlap question efficiently, classically. And so there would be like, some genuine speed up. And similarly, like, there's been like, proposals in quantum machine learning that essentially boil down to like, construct some fancy states based off of my input data. And then I compute their overlap and this will be how I get advantage. Right, so here I'm just noting that, okay, like, my quantum algorithm is like, log given epsilon squared and my classical algorithm in order to estimate the overlap, I need to read it, I need to read all of the entries. I need to know all of the entries. And so this is like, omega of d. But this is sort of not a fair comparison, because I gave my quantum algorithm this ability to prepare these states here of phi and psi. And this is a very powerful assumption. If I had this vector and I wanted to prepare the states, it would cost me d time. And so the fact that I can get these in like, o of one time or o of log d time is very significant. It's like a pretty powerful statement. Okay, so how you would instantiate this in this data structure setting is with the following data structure. So I'm going to give an example for size four vector. And so the point is that, like, if I actually wanted to run this on classical data, then I would need this sort of data structure. And what this is, it's basically just a binary tree. And the point of this binary tree is that if I'm able to... So what I'm doing is I'm computing all of these values in some pre-processing. And I'm storing every value that I've written down here in some piece of my memory. And then if I can do this, then there is an efficient way to prepare the state corresponding to phi. And here efficient means few number of queries to my quantum RAM. And this would... You could imagine, like, doing this in a way... You can imagine in the same way that we imagine our random access memory being fast, even though technically it's polynomial time to query entries. Imagine that our quantum RAM is really fast and that we could do these queries in, like, O of one time. Right. And so if we have our entries here, then we can prepare our quantum states and then actually perform this swap test to estimate these overlaps. And what I'm going to claim to you is that if you give me the same data structure... Yeah, given phi and psi in my data structure, we can estimate the overlap in, I guess, like, also one over epsilon squared time. So throughout the talk I'm going to be, like, ignoring log D factors. There's some minor annoyances because the convention in classical algorithms is that log D size operations costs one. This is like a word RAM model. But in quantum you really care about these bits. So if ever I'm off with the log Ds it's because of this, but it's not important. Okay. And the way that we can do this is noticing that, given this data structure, what we can do is we can classically measure phi in the computational basis. So in other words, there's some algorithm that I can run that runs in log... Actually, there is a log D that can run in log D time. And then it outputs some i with probability phi i squared. And the way that I can do this is I look at my data structure. I start at the root node and then I decide, okay, if I'm going to try to sample with probability proportional to the magnitude, am I going to sample something on this side or sample something on this side? And then I look at these entries and then with probability this value I go in this direction. And then so let's say I move along here. This was probability phi 1 squared plus phi 2 squared. Then I look at these two values and I say, okay, do I want to go in this direction or in this direction? And then I flip a coin with probability proportional to each value. Probability proportional to the value and then I recurse again. So let's say I go this direction. And that would occur with probability phi 2 squared over phi 1 squared plus phi 2 squared. And so when I hit a root node then I output the corresponding index, too. And the thing to notice is that if I multiply these probabilities together I get that I output 2 with probability phi 2 squared. And this holds generally. Another way you can think about it is that I'm computing each bit at a time and then sampling from the posterior distribution. So in this way I haven't touched my quantum data structure at all. But just with this classical thing I can get these measurements. And this will be useful for us. Any questions so far? It is based on this one data structure. However, I do not know of any instances where this is violated for the generic problem of you give me some data structure that can give me the state corresponding to V. Right. Yes. So what I'm saying is that I'm more fuzzy on these details. But if you asked me, if I gave you QRAM and I told you, divide some data structure that allows me to prepare a state efficiently, then for all of the data structures that people have imagined you can actually do this thing of measuring classically. And in fact a lot of the algorithms, a lot of the data structures people devise for the state preparation task directly come from data structures for sampling according to the probability distribution. Yeah. This is also true. If you're allowing me to prepare states it seems eminently like, okay, it seems very reasonable to allow me to get classical measurements of the state as well. There are slight subtleties because you could divide these other versions of this which are sort of in between my sparse model and this model. And because the sparse model is hard to simulate, it's sort of like this intermediate model also inherits this hardness or something. But yeah, all of these data structures are running in like, you can pre-process them in linear time and so it's not really clear whether if you give me one of them this is like the one it has to be and I can't just use my own if I'm competing against you. Okay, any other questions? Yeah. There are other ways to create these, there are other ways to satisfy these state preparation assumptions. So one thing you can imagine is that I do indeed have some like, magic oracle that gives me the ability to query entries of my V and then this V satisfies some property, like it satisfies that all of the entries are about the same magnitude. And then I can prepare the corresponding state and then if you gave me the same oracle I could also sample, perform the sampling using like rejection sampling. So yeah, so I'm being concrete here for the purposes of exhibition but you could think about this more generally. Okay, so what will my classical algorithm be? Well, it'll just be like a sampling algorithm. So what I do is I can imagine first of all, I sample i with probability of phi i squared and then two, I consider some estimator, I compute psi i over phi i and I'll call this guy Z. And the thing to notice is that the expectation of Z is equal to, okay, it's the sum over i from 1 to D. This is the probability that I sample it times the value. And if you, yes, this is phi psi. And similarly, if I look at my variance, or the second moment I guess, you can compute this and this is, let me see, so this probability times, yes, so this is equal to the sum of my psi i squared, which is equal to 1. Okay, so I have some random variable whose expectation is the right thing and my variance is 1. So then I can average 1 over epsilon squared copies of Z and then get some estimate. And because I'm estimating this value, I can estimate its magnitude squared, which gives me the right thing. Okay. And so what we've just shown is this argument that, okay, you could perform a swap test to estimate these overlaps in time log D, but if you gave me the same input, I could also get it classically with log D. And this principle will extend more generally. And the idea is that we're going to be able to de-quantize algorithms if we essentially get the set of assumptions that we need to run this algorithm. So I'm going to define sampling in query access now. So for vector v, we have sampling in query access to v if we can answer the following kinds of queries. So first of all, what we want to do is we want to be able to query for entries. This is like performing the map. You give me an index and I give you the corresponding entry. Secondly, I can sample from the state corresponding to v. So that means that I'm producing measurements, producing samples where the probability of sampling s is equal to vi squared over the norm of v squared. And finally that I can query for the norm of v, okay? And so something that you can notice is, so what this is is I'm just defining some abstract access model and then this access model can be satisfied by various means. So as I mentioned, there is this data structure here and this data structure satisfies all my assumptions. I can query for the entries, I can query for the norm and using this procedure I discussed, you can sample. So if ever it gets confusing, the sampling in query access you should just think of as like, I can simulate this data structure. And the reason, okay, the reason, okay, this might get a little bit confusing but it turns out that this sampling in query access satisfies extensibility properties in the same way that these blocking coding satisfied extensibility properties. So for this I'm going to define sampling in query access to a matrix. So now for a vector, or sorry, for a matrix, it's like m by n. We have sampling in query access to the matrix if we have sampling in query access to all of the rows. So these are the rows of A. And secondly, we have sampling in query access to little A where A corresponds to the row norms. So we have sampling in query access to the vector of row norms. And the way that you should think about it is that this corresponds to, so like if I imagine like my, I have a two by two matrix, this sort of corresponds to having in my data structure A stored as a vector. So here I'm having like the Frobenius norm of A squared at the top and then row one, row two. And then, so it's the same sort of data structure but basically I'm just combining the data structure. I can just say I have one data structure for this like row norm for the vector of row norms, right? Because this is norm of A squared, A1 squared, A2 squared. And then I'm combining this with corresponding these data structures for the individual rows. So that's just it. And then what I can show is that there's extensibility properties. So for example, if I have sampling query access to A1 through AK, then I have sampling query access to some linear combination of them for I equals 1 to K. And if I have sampling query access to A and B, then I can get sampling query access to their product A dagger B but like approximately. And this approximately you incur Frobenius norm of A. So the same extensibility properties hold. And so you can imagine like for any of my like blocking coding algorithms that are computing polynomials, I could look at what the polynomial they're computing is, I could look at my classical algorithm and I could just piece things together to match the quantum algorithm. And the point is that these steps are all efficient. They're polynomial time and they're polynomial time in parameters that do not depend on dimension. They're polynomial in Epsilon, one of Epsilon, and they're polynomial in my rank notion. And so I guess more morally you can de-quantize all of quantum singular value transformation if you just take these and then you can de-quantize all of QSBT if you're assuming that your input has this data structure, is given as a data structure. Okay, so now I'm going to explain how you can do these, but any questions? Okay, I do not know whether I can get to this now, but if not, I'll get to the next lecture. Get to these like properties. There's some minor issue that I didn't discuss here, which is that there's some overhead cost that I'm going to say is, I'm going to call phi. And I'm going to define a corresponding notion called like oversampling inquiry access. And this basically just says like sometimes I might not have access to the exact distribution that I want, but I have access to some distribution that is close enough to it that I can use it. So, for some vector, we have this SQ phi of V, which I'm calling oversampling inquiry access. If, first of all, we can query for entries of V. And secondly, we have sampling and query access to some V tilde, which we're thinking of as like an entry-wise upper bound to V. So, the two properties that need to be satisfied are that V tilde i is always greater than V i. And also, the norm of V tilde is equal to phi times the norm of V. So, if phi is equal to one, then V tilde has to be V. So, this is just the normal notion of sampling and query access. If phi is large, then this will be correspondingly, this will create overhead in the same way as the scaling parameter alpha for the blocking coding came out in the runtime. This is like the same, this is like the parallel between the two. And something that you can note is that if we have sampling and query access to V tilde, then we can get samples from V. So, in this setting, if we give in sampling and query access to V tilde, we can sample from V and I think like expected in expectation. So, phi, I guess, phi queries. So, this is where we're considering queries to V tilde. And so, the way that we can do this is via rejection sampling. So, basically I'm just trying to justify why it's okay to consider sampling from a different distribution. I'm just trying to say, like, morally you should view these as like similar. And the approach here is to modify my distribution V tilde to get to V via rejection sampling. So, my protocol will be, first of all, I sample I from V tilde. So, this will happen with probability V tilde I squared over the norm of V tilde squared. And then I look at V tilde I and then I look at VI. And then I output I with probability VI squared over V tilde I squared. And then otherwise I repeat. I retry. And so, you can see that if I output, if I'm able to output something, it'll be with probability, right? Yeah, so if I output something, it'll be with probability proportional to VI squared, right? Because the probability that I was able to perform this whole algorithm would be the product of these two, which is like VI squared over V tilde I squared. And so, condition on outputting, a condition on success, sorry. Then this does output a sample from my distribution V. And correspondingly, like, I can see that because I'm using this because of this property, right? This is VI squared over phi times norm of V squared. And so, if I sum over all the possible i's, I can see that with probability 1 over phi, I'm going to succeed. This is like, I'm being sort of messy. Or hand wavy, but. Finally, the thing to note is that this probability, this fraction, by my entry-wise, upper bound assumption, this is at most one. So it makes sense to be able to sample with this probability. So, altogether, this gives me some algorithm that with probability 1 over phi, outputs a sample from V, given this one sample from V tilde. And so, with phi queries and expectation, I'm able to get these samples from V. Now, finally, I'm going to discuss how to get linear combinations. By the way, are there questions about this? So, what I'm going to prove is that given sampling and query access to a bunch of vectors, V, T, we can get sampling and query access to their linear combination. But with some oversampling constant. And this oversampling constant is going to be tau times... So here, this is for T from 1 to tau. And the oversampling constant is going to look like this. And then the runtime, the overhead, is going to scale with tau. So, every time... So, what I mean by this is that you give me sampling and query access to my input vectors, and I'm able to do these in, say, like, O of 1 time. Then, if you ask me a query, like, okay, sample from... If you ask me, like, sample from this vector U, then it'll take me time tau, or it'll take me, like, tau queries and, like, tau overhead, tau additional time to compute whatever I need and then output the sample to you. So it's like... So, you know, for example, these might come from a data structure, but if I want to sample from this, I'm acting as, like, the data structure, and I'm being the intermediary here. I'm simulating these queries for you. And the way that you should think about this is that my... If I wanted to produce a sample, it would cost me, like, phi times tau, so, like, in total, which is, like, tau squared. And there's this factor here, and this factor is going to be close to... It's going to be small whenever... My linear combination is not too much smaller than I expect. So, if you give me a bunch of vectors and then ask me to take the linear combination, and the linear combination is zero, then, obviously, I'm not going to be able to sample from zero. And the thing to note here is that if you asked me to... If you gave me a bunch of blocking codings and asked me to take the blocking coding of the linear combination, then I would incur... I also need to incur this, like, sort of scaling factor. Right? Because if I take a linear combination of blocking codings and I get zero, then I'm not going to be able to output a sample of that blocking coding times a vector. It's going to, like, come out in the runtime. So... Okay. So how am I going to do this? Well, I need to answer a few types of queries, right? Okay. So I want to get sampling and query access to this U. And so I need to answer queries to U. And so what are these queries? These queries are lambda t. And then I want to compute things of this form. This takes me... Sorry, not the tau. I just need to query all of the vectors in their particular entries and then take the linear combination, and then I output this. This takes me... tau time to compute. The second thing I need is I need sampling and query access to some upper bound on U. Okay. And what am I going to use as my upper bound? Well, I know that my entries take this form and I want to compute some upper bound to this that I can easily sample from. So if you were doing this, think about it for a second, but I'll just tell you the answer, which is that you can take this to be your value. So I'm taking... This is the linear combination of these entries, and I'm sort of taking the norm. And this is upper bounding this quantity, which is Ui by Cauchy-Schwarz. And... Okay, what are the two properties that we needed? We needed that this magnitude is greater than this magnitude. This holds by Cauchy-Schwarz, as I said. And then we need also that we can sample from this. And so the probability that we need to sample from is... We need to sample from i with probability proportional to this sum, and I'll write it in a slightly different way. And what this distribution is, is it's a linear combination of the distributions for each of the v's. So what I can do is I can sample... I can pick a distribution such that if I sample from the distribution and then sample from the corresponding vector, it would give me this distribution. Okay, that might have been a bit quick, but it's in the lecture notes if you want to look at it. And if you look at the norm, the value of the over-sampling would be this. Okay, I think I'm done for today.