 talk some more about quantum inspired algorithms and I'm going to show how you can perform polynomials of matrices in time independent of the input dimension if you are given your input in this like quantum inspired access model that I covered last time. And just to recap what this model was, we had this quantum data structure that I defined that we said okay if this was in some quantum RAM then you could get efficient state preparation of whatever entries you're storing in your data structure. And then we noticed that if you could access it classically in order to devise like a classical algorithm that tries to match the runtime then you could get this access that's called sampling inquiry access. And based on this we defined this notion of oversampling inquiry access which is basically that we have oversampling inquiry access to a vector if we can answer these particular queries this is a VI if you give me an index and I can give you an entry. And also I have sampling inquiry access to VTILD where this means that I can query for entries VTILD i I can also perform measurements of VTILD as a state and I can also learn the norm of VTILD. So and this VTILD is going to have the property that it's like a reasonably small upper bound reasonably reasonably good tight upper bound on my entries of V. So all of my entries of VTILD are going to upper bound my entries of V and then you know the whole magnitude the whole the whole mass is going to be just five times the mass of the original vector. And with this we saw that you could get similar extensibility properties of this sampling inquiry access that you could with the blocking coding. So with sampling inquiry access to U and V we can get access to some linear combination and then the amount of overhead here is essentially is some the number of things in your linear combination times sort of the sums of the L2 masses. So you might compare this to the blocking coding setting where you get blocking coding so U and V and then you can get a blocking coding of alpha U plus alpha V or alpha U plus beta V and then you divide this by alpha plus beta. So here implicitly what's happening is like this is like an alpha norm of U plus beta norm of V. So this is sort of like you can see how these are sort of similar maybe they're off by some factor in terms of the number of elements in your linear combination but there's some resemblance there. And what I'm going to show today is that you can get this statement about sampling inquiry access being closed or under taking products. So if you have sampling inquiry access to A and sampling inquiry access to B and you can get sampling inquiry access to something that's like approximately A, I think A dagger B. Okay and okay to do this I will it'll be useful to define the corresponding notions for matrices. So okay if you recall like our sampling inquiry access to a matrix was sampling inquiry access to the rows of A along with sampling inquiry access to the row norms. Yes, yes there is a reason why you need the dagger. There's some nuances in comparison if you are considering these like, like for example if I could give you both sampling inquiry access to A dagger and A and then be dagger and B and then be done with it there's some subtleties because typically if you want to actually like compare this to a quantum algorithm you're only going to store one of these guys in a data structure but in some sense like you're going to get a block encoding of A and then the inverse is going to be a block encoding of A dagger just immediately. And so something like that does work. So what you said just now would incur some error that's like scaling with the Frobenius number B so that would be an issue but there is a way to make this work. It's just, it is way more, it's like more complicated. Yeah but basically you can get like an approximate singular value decomposition and then you can like take the, or yeah you can just invert it in this way. Right. But yeah this has like, this is very slow I think the best way we know how to do it is very slow. Okay, good question. So right, so I was defining the sampling query access to A and it's just sampling query access to the rows and the vector of row norms. Similarly you can define sampling query access to over sampling and query access and what this is is it's just you have query access to A and then you have sampling query access to some A tilde where it satisfies these same properties. So here you have, basically here you just replace like, so A tilde, R I J, is an upper bound on A I J and then the Frobenius norms are, is your quantity, is your mass. Okay. And so with this you can say things like that given sampling query access to U and sampling query access to V, you can get sampling query access to U V dagger, they're out of product. And the way that you do this is just saying like, what I need to do is I need to compute entries but I can do this from computing entries of U and entries of V. I can compute norms by computing norms from V and then norms from U or like entries from the two. And then I can sample just by saying, okay, a row of U V dagger is a rescaled row of V dagger or a rescaled version of V and so I can use samples from V here and the row norms are, the vector of row norms is a rescaled version of U so I can sample from U to get these like row norm samples. So this is something that was in the problem set yesterday. And similarly we can extend this notion of, we can extend our linear combinations arguments here as well. And the value of phi is what you would expect it to be here. I'm just going to write it for completeness. But basically it follows from the same protocol and okay, so this will be all of the statements that I'll need for later. And now we're going to talk about how you actually get this, how you actually get this product and you'll see that it's like an approximate statement so we're going to use some approximation. We're going to need to use some other tools other than the one I've just been describing and what that is is this idea of sketching and the, if you haven't seen it before the idea of sketching matrices is that suppose I have some matrix A and it's like M by N and then M and N are too big and I can't compute anything on that. Then imagine I want to compute the top, I want to compute the spectrum or something, but this is too large so you can imagine I can multiply it left and right, multiply it by some sketch matrices and then bring this down to something that's much smaller, something S by T and then my hope is that the properties of A will be inherited by this sketch, sorry which I'm calling SAT and for a particular nice choice of S and T. So one case of this is like Johnson-Lindenstrauss where these S and T's can be like Gaussian and for us these will be sampling matrices. So if we have some distribution we're going to define a corresponding sampling matrix which we're going to get by taking each row to be some, selecting some row independently or random and how this works is that, so we're going to call this a, we're going to say this is sampled from P, if we have that, okay, the rows of I are independent, chosen independently or random and then this row in particular is going to be some computational basis vector and then it's going to be rescaled by the probability, by the root of the probability and then this will be the value of this row with probability Pk. So what your S matrix is going to look like is it's going to look like, okay say it's like here this is 1 over root SP1 and then here it might be 1 over root SP5, so it might look something like this and all of these other entries are zero and furthermore what's happening if we apply it to a matrix A is that if you look at what's happening we're selecting rows, yes, so this row will be A1 over square root of SP1 and then the other rows will also be the corresponding rows of A, so this is just what's happening when we choose our S and the thing to note is that if we have a sampling query access to A then we can form some sketch S officially and then what we do is we sample it from little a which is our vector of row norms. So in particular like our P, our probabilities are going to be the row norms rescaled by the Rupinias one, row norm squared, okay and you can sort of see how to do this, so this will tell you how to do this and then this will take O of S queries and sort of see how to do this because we just need to pull all of these samples from A which we have because of our access model and then we need to compute these probabilities themselves and we can do these with queries to our A, we might need to query the norms of A and then in total we can just describe this S as like a list of the non-zero entries here, so we can just describe all of these and then we can, this is our description of S. And secondly the thing that I want to observe is that if we have our A and then S sampled as above, so in this setting when we have this essay we can also sample again to get S-A-T. And how we can say this, how we can show this is that we say that we have sampling query access to both S-A and it's conjugate transpose and then the overhead will be O of 1 and O of S and so if we can have this then we can go from sampling query access to A and then we can construct S and T such so we can sample S from A and then T from S-A dagger and then we can get some sketch S-A-T with probability or sorry in time S-T or in S-T queries I guess. And if you wanted to try to prove this I won't go through the whole thing but the main thing that would give you trouble is the prospect of trying to sample from the column norms of S-A. Everything else is like you can do it in a sort of obvious way. Like for example all of the rows of S-A are all of the rows of S-A are rows of A so you can sample from them as normal and all of your row norms of S-A if you look at what the values are all the row norms are actually equal so all the row norms are equal to the Frobenius norm rescaled by S and so you can sample from the row norms of A. You can hear it right? So as I said before all of the row norms of S-A are the same and so you can sample from the row norms by sampling a row uniformly at random. How you sample from the column norms. Here what I want to do is I want to sample J with probability I guess the row or I guess the column of J over the Frobenius norm and there's a trick to doing this which is that we have access to S-A so what we can do is we can sample so this is what we want and what we can do is we can sample I with probability proportional to the rows and then we can sample some J with respect to that row that we sample from. Okay and if we do this procedure where we sample a row and then sample from the corresponding row then this will give us an index IJ and the probability of this IJ will be proportional to the entry squared so it's like we're sampling from S-A when written out as a vector and then if we discard our I then we're sampling J with probability proportional to the sum over I of these of these entries. Okay and this is precisely the thing that we wanted here so we're sampling an entry and then ignoring the row and this will give us the right thing. Right so I've now argued that you can efficiently create these sketches but it's I haven't yet told you what approximation properties this will have like what the point of this all is and I guess I'll demonstrate by saying okay consider we have some S and I'm going to consider a one by M matrix. So here we're just thinking about this as like a single row I guess it's a selecting a single row of my matrix and then if I look what this value is is it's what this is is it's some row AI over root of S-P-I with probability P-I and so what I can note is that if I have some matrix product sorry if I have some matrix product and I look at the sketched version of this product where I add a sketch in between so there's a I put like two sketching matrices in between A and B then what this is is it's the outer product the outer product row of A times the row of B I guess there's a darker on one of these so it's an outer product and then over S times P-I but here S is one so I guess we can ignore this over P-I with probability P-I okay and the thing to observe is that in expectation this is the sum of the sum of these outer products and this is going to be equal to A dagger B so when we sketch things down we can create an unbiased estimator of products of two matrices and so far I haven't used any properties of my probabilities probabilities and so far I haven't used any properties of my probabilities probabilities but what you can observe is that if we were sampling this P-I let's say P-I was sampled 50-50 from the row norms of A and the row norms of B so what I mean by this is that P-I if we take P-I to be equal to one half of okay we flip a coin and then based on that coin we either sample from A the row norms of A or the row norms of B and by AMGM this is going to be at least the geometric mean of these two which is this product here then what you can observe is that this imply these two imply that the norm of S-A dagger S-B is always upper bounded by this Frobenius norm product right because here what's happening is that these two row norms are cancelling out with the row norms here so in some sense all of the all of my estimators always going to be bounded and this allows me to use concentration inequalities to show that if you take a bunch of samples of this and then average them it'll be an unbiased S it'll be a it'll converge quickly to A times B A dagger times B okay. Okay so the I'm going to state some version of this that I for concreteness I guess and what that is is if we have some again A and B and then we have some S sample one half from A and then one half from B then if I take so here this is a size S sketch and then if I take S to be up to a log factor it's needs to be the I think this value then it's true that my sketch approximates my product up to operator norm and here you can think about these factors as being some rank quantity of A and B okay. With this I can show you why we can get our extensibility property so with this I'll just sketch it out how you get this. So we have sampling query access to X and Y and so what this allows us to do is it allows us to produce this sketch then with this sketch we know that this product is approximately our desired product to estimate and okay now if you think about how you get this you can figure out what this product is as I mentioned before this is an average of outer products of the rows so this is an average of the outer products of the rows or I guess a sum and each of these are rescaled rows of X and Y respectively so I have sampling query access to them and so I have sampling query access to SX and SY which means I have sampling query access to these outer products and finally this means I have sampling query access to this whole expression because it's a linear combination of X and Y respectively these outer products so I have the sampling query access to the whole thing up to some oversampling constant and this is approximately X times Y. Now why do we need to sketch things down it's because that when we are taking these linear combinations the overhead that we incur I didn't write this down above but when we do linear combinations we incur some overhead that's the number of sum ends in our linear combination and so naively we're taking a linear combination here without the S it would be a linear combination of some like large dimension thing so we need it to be size S where S is something here that's like independence of my input dimension in order to then be able to have this linear combination protocol be efficient and it's in the lecture notes and eventually what you get is that you can get this value of phi to be the Frobenius norm of X squared times the Frobenius norm of Y squared which is sort of I guess what you what one might expect from just combining upper bounds. Right so are there any questions about this? Right so in a similar way I have this in the lecture notes also but we can also use this to de-quantize the simple blocking coding protocol I had before which is that if I had a blocking coding protocol I would have a blocking of A and the state psi then I could get a copy of the state A times psi with probability proportional to the norm of A times psi squared. When we have this as given as input in a data structure then this A is going to be rescaled by the Frobenius norm and so this you're going to see some Frobenius norm scaling up here in the probability and so you can think about this as having a runtime of I guess it's the Frobenius norm square times the norm of psi squared over the norm of A psi squared. What I give a more thorough argument with the parameters and stuff in the lecture notes but in the same setting if I have sampling query access to A and sampling query access to psi then what I can do is I can sort of approximate this by with a sketching matrix as I mentioned before and then using the same ideas as again up here this allows me to get sampling query access to this sketched matrix which is approximately A times psi and so you can imagine like okay if I want a sample from the output then I could use this to get a sample from my output as well and so I'm sort of matching what the quantum algorithm is doing and if you look at the runtime of this algorithm the runtime will be in order to get a sample it'll be basically proportional to get a sample is let's see so it's equal to Frobenius norm to the fourth times so you can see that sorry you can see that this is the same but it's there's this issue of there's this issue of there's an additional factor that appears here this is like a polynomial factor namely this factor is smaller than this factor so it's like you're only losing quadratically and then you might be wondering that there's like you might be wondering that there's like an epsilon here and there's no epsilon in this runtime and whether that's an issue and the resolution to this is that for these like sampling questions you can see these like gaps of like log one over epsilon to epsilon I don't know of a formal argument that this is like inherent it might not be but if you wanted to use these samples to actually like construct some estimator like for example like learn some properties is this true or false then this epsilon one over epsilon will appear because you're trying to distinguish two states that are maybe epsilon far apart now I'll move on to like general polynomials but any questions about this alright so now I've shown how to give these extensibility properties for both linear combinations of of my input and also products and using this I can sort of you can imagine me chaining all of these together to get sampling and query access to more complicated objects and in the same way I mentioned before where this is like sort of not it doesn't necessarily give you everything here we also have to do some more trickery to get the ability to perform any polynomial on a matrix. And and do this efficiently. So in the rest of the time I'll sketch how I'll give like a sketchy description of how you get a classical algorithm for the singular value transformation task and what this task is is that if we have sample and query access to A and sample and query access to B you want sample and query access to some I guess I'll say call it Y where Y is approximately P of A times B and here we have P which is bounded between minus one and one and also the degree of P I'm using D for now sorry the degree of P is small and what we want is we want some runtime that's we want a runtime that's polynomial in one over epsilon and D and then the Frobenius norm over the spectral norm here so this is like our rank notion is that it I think so and also yeah we should use that we should normalize things so the spectral norm of A is at most one I guess we can say let's say it's equal to one and so we can just drop this term and say Frobenius norm and so the idea is that a quantumly if you give me a blocking coding of A and copies of my state B then I can do this and it'll take time that's it'll take time I think D times Frobenius norm of A and again this epsilon doesn't appear but if you wanted to do something there would be a one over epsilon that appears okay and I'll tell you I'll tell you first the idea for monomials and then I'll extend it to general polynomials okay so consider some monomial I'll just say actually I guess it's like yeah okay what this is is it's this is a dagger cubed times B and if we wanted to sketch this down what we could do is we could say we have these constructed we can construct these S and T and these are sketching matrices that we can sample efficiently and they have this property that if you put them in between matrix matrices they can approximate the products and specifically what happens is that it approximates these products of A times A well, S approximates A times A well and then T approximates SA times SA well so and this is because we sampled our T from SA we sampled our T from the column norms of SA and so it's going to approximate SA well and so with these approximation properties in hand we can just say that this polynomial this monomial is you can approximate it by okay first of all we can say that it's here we can just approximate each one each thing by its product okay and here this SB I didn't state this property but you can show that it holds if you have like a matrix vector in your sampling you can sketch down this matrix product as well and then you can sketch it again to say that okay this is approximately SAT dagger SAT so I'm sketching this product here I'm sketching this product here to get wait no no I'm sketching this here so what I'm doing is I'm sketching this interior right so I can approximate all the way down to this like fully sketched on matrix and the thing to note is that this these dimensions are now very small so this SAT is S times T this SAT dagger is T times S and this is a S by one matrix and so I can compute all of these in O of TS time and what I'm left with is something that's like of the form SA dagger times V and what this is is this is a all the columns of SA dagger are rows of A right SA is rows of A and then this V is some small vector so what this is is I'm taking a linear combination of a small number of rows of A and this is my solution and so this means that I have sampling query access to the output because it's a small linear combination of rows of A and I have sampling query access to my input okay so so in this setting it basically becomes a game of I take my input and I just hit it with as many sketches as I can until things are independent of dimension and then when things are independent of dimension here my S and T are going to be size I don't know so I think S and T here need to be size for be near norm squared over epsilon squared and for this choice then I can then everything small enough that I can just multiply everything through and then there will be a final matrix on the outside that I can't sketch away and this will lift my small vector back into the space of the space of my output okay so this is the approach for a monomial how I want to do general polynomials is I use Chebyshev polynomials and specifically what I want to do is I want to be able to evaluate a polynomial by with operations that are like multiply and add so so if I have some polynomial that's again it's like bounded and it's degree degree is small then as I discussed before we can write P of X as this linear combination of Chebyshev polynomials and here you can think of my AKs as all being relatively small so right these AKs are bounded by two from something from previous lecture the question is how do I evaluate this P you might think that you can evaluate P in the monomial basis by saying P of X is equal to a sorry maybe a different CD X to the D plus CD minus one X to the D minus one and so on right because I already showed how you can do these monomials and so you can probably just change them but the problem is that these CDs can be exponentially large and so if we have some algorithm that incurs epsilon error in my approximation here then when I multiply it by CD it can be blown up by two to the D and this will break our dreams okay so an alternative is we want to use something some recurrence or some description of my polynomial that is going to be stable the entire time it's never going to blow up and this is what this clencha recurrence is so I'm going to write it one way and then I'm going to write it a different way so the version that you'll most commonly see is this version where you start with zeros and then you can evaluate this polynomial and then you can evaluate this polynomial by saying it's similar to the recurrence relation defining the championship polynomials themselves plus a K and then so the idea is that I start off with zeros and then I run this recurrence relation maybe I should do it for and what this will do is it'll sorry I start from QD plus one and then I construct QD and then QD minus one all the way down to Q zero and then what you can show is that P of X is equal to one half Q zero minus Q two so we are evaluating these we can evaluate this and what we are doing is we're just taking this recurrence and then what you can show is that all of these QKs are now to be bounded and so whenever you're taking your approximations you're never scaling it by too much and so if you so you should think about like this being better in a way that might become more clear in a second. Now formally what we're going to do is we're going to do a slightly different thing for some technical reasons which is we're going to evaluate this recurrence here which is just the version that works for it's the version that works and it maintains that all of your iterates are odd so now I'm going to restrict to the case that P is odd and this means that all my even coefficients are zero and what my iteration is going to be is the following and then if you do this all the way to zero then your polynomial is going to be Q zero minus Q one and okay you're asking what does this have anything to do with matrices? Well all these scalar recurrences lift to the matrix setting naturally. If I define some other recurrence and say that if I define this recurrence where I replace all of the x's with like a's and I replace this here with is this right? I add this B here on this thing that I'm adding to make it a vector. Then what I get is that my final outcome will be equal to P of A times B and right so if I wanted to compute this and I didn't care about the runtime what I would do is I would just compute this every for all my iterations and then get my output and the output would it'll be so if we're imagining A is M by N then this will take O of D times M N time or M N can be like the number of non-zero entries of A if you want. Okay so this is going to be our recurrence and the idea will be the same as before which is that we have some approximation properties of our sketch and then we can use this and so maybe I'll just maybe I'll do some slightly bad form and I'll edit the equation itself. But what we can do is we can say that we maintain instead of maintaining Q K which is something that's large we can maintain it as a linear combination of small a small number of rows of A. So by this I mean we can maintain instead of instead of maintaining these U's what we can maintain is some V where V is small. Okay and then what our recurrence will be is equal to A times V K minus okay and then what we can do is we can and then what we can do is we can start hitting things with sketches. So this will be approximately SA dagger SA minus oh actually what I'm going to do okay specifically what I'm going to maintain is this. So this will actually enforce that my linear combination is small because it's going to just be linear combinations of the S things in my sketch. Okay so then what we're going to have is and so on. Oh actually I want to also approximate this final thing too and with this I can pull every pull out this SA here. Okay I think I'm almost I think I'm out of time so I'll just say here that you get some expression in here and this expression is going to be some vector some matrix vector computation and it's dimension S by 1 so we've already shrunk down our dimension and by doing some like one more sketch in here we can get something that we can compute efficiently and then when we compute this efficiently what we sacrifice is that okay we're able to compute this in time independent of dimension but we're incurring some epsilon factor and then you can use an argument about the scalars how the scalars errors propagate to show that the matrix errors also do not blow up too badly and so this will give you an algorithm that has this desired property has this desired run time. Okay that's it thanks.