 So, as I discussed yesterday, the topic for today is going to be state discrimination. So the question is, I have two or more states and I would like to distinguish between them. And so, what is the optimal way of doing this and what are the different regimes of interest for this question, this is what we'll look at today. So we'll start with something that I'm guessing also most of you are familiar with, is maybe the most famous distance measure between states, it's the trace distance. Okay, and we'll see how this relates to state discrimination. Okay, so let's do that. So the trace distance between two quantum states, rho and sigma, is defined as the following. So I'll denote it by delta between the two states, rho and sigma, as one half of the one norm between rho and sigma. So what I mean by this is I take the sum of the absolute value of the eigenvalues of rho minus sigma. Okay, you can also write it like this. So let's see a few properties of this trace distance. So it's between zero and one, and it's zero if and only if rho is equal to sigma. So it's a valid distance. It's invariant under unitary transformation. So if I apply the same unitary to the two states, the distinguishability, the distance between them doesn't change. If I restrict myself to the classical case, so in other words, if rho and sigma are commuting and there is a common eigenbasis, so I'm denoting a here. Okay, and so the eigenvalues are probability vector P of A and Q of A for sigma. Then this trace distance just becomes the following. So one half of the sum of P of A minus Q of A, absolute value. Okay, and this is again very well used in, used a lot in probability theory, and it's called sometimes the total variation distance or the statistical distance between the two distributions P and Q. Okay, what other important properties are satisfied by this distance measure is what I would call the data processing inequality. Okay, and this is an inequality which is really fundamental for any distance measure in general, and this may be the most fundamental inequality of a distance measure, and we'll see it, we'll see this inequality several times for the different measures we'll introduce during this lecture. Okay, so what do I mean by data processing? It just means that if I process further my two states with the same map, then the distance between them can only decrease, okay, or their distinguishability can only decrease. Okay, so what is a valid physical process here? It's a quantum channel as we saw yesterday, so it's modeled by quantum channel that I denote E, and the trace distance, when I apply a channel E, the same channel E to the two states can only decrease. Okay, good, so why is this total variation distance or the trace distance playing such an important role, and why do we use the one norm and not, for example, some other, the two norm or some other notions of distance? Okay, so the main reason is that it has a very natural operational interpretation, in terms of distinguishing between the two states, rho and sigma. Okay, so what is the setting for state discrimination or distinguishing between states? The following is that I have a quantum system A, and I know it's in one of two states, right? Or I have two hypotheses about the state of the system. Okay, so I know that the hypothesis which I'll call zero is that the system is in the state rho zero, okay? Or the hypothesis one is that it's in the state rho one, okay? And the question is that I would like to know in which setting I am. So is it hypothesis zero or hypothesis one that is true? Okay, and I have access to the system A. Okay, so what is my task here is to try to achieve this distinguishing with a minimum error probability. Okay, so what is a strategy? What is the thing that I can do in order to achieve this task is that because I have access to my system A, I have to do some sort of valid quantum operation which allows me to say, am I in hypothesis zero or in hypothesis one? Okay, and so it turns out as we saw yesterday, the most general way of modeling that is a POVM was just two outcomes, zero and one, right? Because my objective is just to output either zero or one, hypothesis zero or hypothesis one. Yeah, so this model by POVM is zero and E1. So as you saw, as you remember, so E0 and E1 should both be positive operators that sum to the identity. So okay, so there is various different error probabilities you can look at. And so for now as a start, I'll look at a simple measure which you can call the average error probability, which is that I'll put a prior on my hypothesis that I assume that with probability a half it's H0, which is correct, and with probability a half it's H1. As we'll see later, this is not necessarily the case. It depends on the applications. But let's think of this natural setting where we'll look at the average error probability. Okay, so in this case, if I put this prior, then let's see what is my total error probability. Okay, so my error probability is one half. So this one half corresponds to H0 being true. To the fact that I'm in hypothesis H0, which means that my state, the correct state, is rho zero. Okay, so what is the error probability in this case? The error probability is that I output one. Okay, so that's why I say it's the trace of E1 times rho zero. Okay, so this is the probability, again I wrote it here, that the state is rho zero, but my procedure wrongly says that it's H1. Okay, this is sometimes we call it the type one error. Okay, this is one of the two types of errors. And then there is the other type of error, that the true state is rho one, but I say zero. Okay, so this again is the type two error. Okay, so I should say that often, so here H0 you see in the expression, even in the expression that H0 and H1 play a symmetric role, but often the case that H0 and H1 do not play a symmetric role. And this is actually what we'll see in Stein's lemma in a bit, that H0 will treat H0 and H1 differently. Okay, so let's go back to the trace distance. So what's the relation between the trace distance and this question of discriminating between two states rho zero and rho one? So it turns out that the minimum error probability that I can achieve, so if I optimize over all possible E0s and E1, over all possible POVMs, then the minimum error probability is given exactly by this trace distance. Okay, and the quantity is exactly this quantity. It's one half minus one half, the trace distance between rho zero and rho one. Okay, so I won't prove it, this is one of the problems in the problem session. Okay, good. So that was for the trace distance, so let's now get to the, let's say the more general setting, where now I want to understand the fundamental trade-off between the two types of errors. Okay, so now I want to not just understand what is the minimum of the average of the two errors, I want to understand really the trade-off. Okay, so if I fix, let's say, the type one error to be at most epsilon, what can I say on the type two error? And so in order to understand this, it will be useful for this lecture to introduce a quantity that quantifies this, and it exactly quantifies this thing, is given a fixed type one error, what can I say on the type two error? Okay, so it's parameterized in some way, but to make it look like a divergence, but it's only saying that. Okay, so let's go over this definition. So I will call it, the hypothesis testing relative entropy. Okay, or also divergence. I will use divergence and relative entropy interchangeably. Okay, so it has some parameter epsilon. Okay, you should think of this parameter epsilon as exactly the type one error probability I aim at achieving. Okay, and it's defined by the following expression. Okay, so for fixed row and sigma, it depends on row and sigma, and it has some parameter epsilon, and I just maximize or minimize the type two error probability, but here the way I'll parameterize it is that I'll maximize minus log of the type two error probability. Okay, so yeah, so this condition corresponds to the fact that the type one error probability is at most epsilon. Okay, my POVM, remember, E0 and E1, it's given by just E and identity minus E, and yeah, E corresponds to E0. Okay, and so this exactly corresponds to the minimum type two error I can achieve. Okay, so a few remarks on this. So notice that this quantity is between zero and plus infinity. Okay, because this is between zero and one. Yeah, so if this is zero, then I just, this is counted as plus infinity. And as I previously said, multiple times even is that two to the minus this hypothesis testing divergence is the minimum type two error if I fix the type one error to at most epsilon. So let's look at some like border cases. Okay, so if I fix the type one error to at most, if I fix epsilon to one, then this is not putting any constraint and so of course I can get plus infinity in this definition. If I pick epsilon equal to zero, right, then I have to have here the trace of E times rho is equal to one. Right, so what I have to do for E, I have to pick E to be to be the whole support of rho. Right, at least the whole support of rho. And so this gives me this quantity. Right, so minus log of trace of the projector onto the support of rho times sigma. And yeah, so something maybe the support because this will come back again. So when I say the support of rho or the projector onto the support of rho I just mean the sum of the projector onto EI where EI is an eigenvector with a nonzero eigenvalue. Okay, so yeah, another setting of interest is when rho is equal to sigma. Okay, so this is supposed to be a distance measure, right? When rho is close to sigma we expect this quantity to be small. Okay, so for rho is equal to sigma it gives you exactly zero but it gives you something which is close to zero as epsilon goes to, as for epsilon small. Okay, so, okay, another remark is that if rho and sigma have orthogonal supports then it's very easy to distinguish them. You just take the projector onto the support of rho, let's say, for E, then you get plus infinity. Okay, so this gives you an idea of this quantity. So let's make a few further remarks about this. So yeah, in general, so I wrote the way I define this quantity is via an optimization problem, right? And this, in general, this optimization problem does not have a closed form expression, right? So you have to optimize over all these different E's but you may notice that this is a nice problem in the sense that it's a convex optimization problem. And more precisely, it's even a semi-definite program. Right, so the constraints are some operators being positive semi-definite. Okay, so this can be computed efficiently in general as a function of the dimension of the systems that are present. Okay, good. So to get maybe a little bit more intuition, maybe it's useful to consider the classical case, right? So if I take rho and sigma to be diagonal in the same basis, okay, I call it x here, okay? So and p and q are the corresponding probability vectors. Then what would be a natural test, right? So I want to test whether I have p or q, okay? What will be in this case a natural test? We didn't talk about tests so far, but so maybe this is a good point to talk about how I would test whether I get, I have distribution p or distribution q. Okay, so I claim here that the natural test, and this is used a lot, is to, if you get the sample x, okay, because what is a test doing, right? It's getting a sample x and should output, do I have p or q? Okay, so a natural test, and this is what actually we'll use in a minute, is that we compute this ratio, p of x over q of x, okay? And so the natural thing to say is that if this ratio is bigger than one or bigger than some threshold value, okay, which we'll determine depending on the problem, I will output p, and if it's below some threshold, this same threshold value, I'll output q. Okay, so I hope this gives a general idea of this quantity. So, okay, so let's prove, so remember I told you that an important property of a given distance measure is the data processing inequality, right? So saying, for example, that if I forget some part of my system, then I shouldn't lose in terms of distinguishability. Okay, and so here for this quantity, it does satisfy the data processing inequality, and it's relatively simple to show. So, yeah, the statement here is that this satisfies the data processing inequality, and again, I remind you that this just says that for any quantum channel, if I apply this channel to rho and to sigma, then the distance measure can only decrease. I said this several times, so I'll repeat it. Okay, so how do we prove this? This is very simple. So I will take an optimal E, remember our divergence is defined in terms of an optimization problem. So if I define E to be the problem corresponding to E of rho and E of sigma, right? Then by construction, so the hypothesis testing entropy between E of rho and E of sigma is minus log trace of E times E of sigma, right? And I have, of course, as a constraint, it should satisfy the trace of E times the channel E applied to rho as at least one minus epsilon. I just use the definition, okay? And so, okay, so now what I will do is I will construct an E prime, if you want, for the optimization problem corresponding to rho and sigma. Okay, so how will I do that? There is a very natural choice, and so this uses this idea of the adjoint of a quantum channel E, which you saw in the problem session yesterday. For those who didn't reach this, so I can see the set of linear operator as a Hilbert space itself, so I can define an inner product, and so I can use exactly the same definition, remember of an adjoint that I defined for a regular operator. I can also do it for a super operator, okay? And so, so by using this property of the adjoint of a map E, I can take this, the trace of E times the map E applies to sigma, remember this is the quantity which appears here, okay? Sigma, so okay, what's the natural thing to do is that I just put this operator, apply its adjoint on the operator E, okay? So by using also the fact that E is her mission, you get that the trace of E times the channel E applied to sigma is equal to the trace, the adjoint of E applied to the operator E times the operator sigma to rho as well, right? And so if you remember from the problem session from yesterday, so if E is completely positive, then the adjoint is also completely positive, and this is if and only if, and for the trace preserving property, this is equivalent to the unital property if I go to the adjoint, okay? And remember, unital just means maps identity to identity, okay? So what do I have to show? So now I have a natural candidate for the optimal or for one feasible choice for the operator E for the problem between rho and sigma, it will be just the adjoint E star applied to the operator E. I should have chosen a different name for her, but okay. So, yeah, so I choose this E star applied to E and it satisfies the property that it's between zero and identity. Remember, this was the constraint in my problem. So if I go back, I should now verify that I satisfy these constraints. I satisfy this and I satisfy this, okay? That E star of E satisfies this. And so I just check these two conditions, right? So E star of E is positive, right? Because my map is completely positive and it's at most E star of identity, which is a self-identity because E star is unital, okay? And so, yeah, also the condition of being bigger than when I run this epsilon follows immediately from here. And so I get that E star of E is a feasible solution for the program for the program that defines this quantity, okay? And so I have that this quantity is at least the value which is given by this fixed feasible solution that I gave you, okay? Okay, so this proves the data processing for this hypothesis-testing relative entropy. Okay, I hope this was clear. I tried to go relatively slow so that you follow this calculation. Okay, good. So now I will look at a special case of interest. So here I just gave you a quantity which tells me what is the type two with respect to the type one. We just saw a way of writing it and a few of its properties. So now we will look at a specific setting which comes of interest and whether we can say more about this quantity, whether we can say more about what is the tradeoff between type one and type two. Okay, and so the specific setting of interest that I will look at is imagine that I have multiple copies of rho and sigma. Okay, so imagine that I have a box, right? Which when I press the button, it either gives me rho or it gives me sigma. Okay, and I want to know if I can use this box several times, right? So I can use it n times and think of n as being very large. Okay, and I want to know whether I'm in the setting where the box outputs rho or it outputs sigma. Okay? No, no, no, it's not half-half. No, no, here. So from the start when I defined this hypothesis testing relative entropy, I look really at the two quantities. I don't put any prior on rho zero or rho one on the two hypothesis. I just say that I have rho zero or I have here I call it rho and sigma and for any strategy, I have a type one error and a type two error. Okay, so I have these two numbers and of course there's a tension between these two numbers. I cannot get both of them to be zero in general. Okay? And my, okay, so again, my trace distance was saying that if I want to minimize half of this number plus half of this number, okay, what is the minimum that I can achieve? Now I will ask a different question and this is what we'll do in Stein's Lemma. I assume the first number is at most epsilon. What can I say on the second number, on the second error probability? Okay, so as I told you, so this is a setting which is not symmetric between the two types of errors, right? So, and this does happen a lot, because the two types of errors, like the false positives and the false negatives, they're not always symmetric, sometimes on this to be much smaller. And so, yeah, so here I will fix one of the two errors to be a constant epsilon, constant at most epsilon, and I look at the other error and I will look more specifically how it behaves as a function of n, right? So remember that I look at the setting where I have rho tensor n and sigma tensor n and I fix one of the two errors to be at most epsilon, which is just a fixed parameter, which I see as a constant, and I look at the other one, naturally it will go down with n, right? So as I have more copies, as I can use my box more times, I can distinguish more easily, right, between the two situations. And so the other kind of error probability will go down as n goes to infinity and the question that we'll ask exactly is how quickly okay, as n goes to infinity. Okay, and so this is exactly the statement of quantum Stein's lemma, right? So it's... So again, I fix one... I fix my epsilon, I fix constant and I take rho and sigma, okay, and I look at rho tensor n and sigma tensor n, right? And so remember this quantity is... So dh of epsilon, I take it as the log of the error probability, right? Or minus the log of the error probability, right? So here, what I'm doing is that... what we'll show is that the second error probability goes down to zero exponentially and I want to know exactly what is the rate of it going down exponentially, right? So again, recall, so the first type of error is that most epsilon, the second type of error, type 2 error goes to zero exponentially in n and I want to understand what is the exponent, exactly, okay? And what Stein's lemma is saying is it exactly characterizes what is this exponent, okay? And it's given exactly in terms of the quantum relative entropy, which I'll define in a minute, okay? So again, the 1 over n is to say that I look at the constant in front of the... like the error goes down as 2 to the minus some constant times n and so I look at this constant and it will be given by this quantum relative entropy. Okay, yes. The rate of convergence so... Yeah, of course, as epsilon goes to... as epsilon goes to zero, the convergence is slower but we actually understand quite well the convergence in the sense that so we even understand like the second order term, right? So we know that here it's completely independent of epsilon. The first order term is completely independent of epsilon and then if you want to be more precise here and you just don't look at the limit but say 1 over n, this is equal plus 1 over square root n times some quantity which depends on epsilon here. Okay, so in some sense the first order term does not depend on epsilon and this is what appears here because I said it works for any epsilon but the second order term depends on epsilon so the dependence on the dimension I mean so the second order term has so I mean but it depends on actually the states themselves here so so but this quantity can be infinite, right? So or so yeah the okay so here usually in these expressions the dimension does not appear explicitly, right? So we look really at the fixed row and sigma and we want a quantity which really depends on just row and sigma and so in the first order term what appears is this quantum relative entropy and in the second order term there is a term which depends on epsilon and a term which depends on some variance between row and sigma but the dimension does not explicitly appear, okay? Any further questions about the statement? Because okay so what we'll do in the remainder of the lecture is to try to prove the statement okay so I'll go a little bit quickly probably over the proof but yes so yeah so the classical one yeah good question so the classical one is exactly the case where row and sigma are probability distributions or if you want row and sigma commute okay so they are they can be diagonalized in the same basis and so yeah you take to it's exactly the same question right so you have the box and now it outputs a sample okay from P or from Q and you want to distinguish between them and yeah further questions okay good so okay so now I have to tell you what the quantum relative entropy is I didn't say that yet so okay so what is the quantum relative entropy so I take two states row and sigma actually it's useful to have sigma not necessarily be a state operator okay the definition works equally well for the case row it's quite kind of important for it to be normalized otherwise there's many definitions you can come up with if row is not normalized but for sigma is not normalized this is fine okay so I take a finite dimensional Hilbert space and yeah the quantum relative entropy is defined as follows so trace of row log row minus log sigma okay and so if you see here if you look a little bit at this quantity you will see that if the support of row is included in the support of sigma then this is well defined and there's no problem but if it's not the case then you see there is a problem right between this log sigma and the row and so that's why we set it to plus infinity okay so just yeah recall that what do I mean by the log of an operator several times but let me repeat it in case so I take the log of an operator is just I take the log of its eigenvalues and I neglect or the zero eigenvalues I completely remove them okay good so the yeah so the classical case it's useful to look at the classical case so yeah when row and sigma have this form so they written as the sum of px and sum of qx and projector on x then the relative entropy is just the sum of px log of px over qx and this is as you might expect it's very well studied in classical say probability theory and statistics so it's called relative entropy or sometimes it's called cool back library divergence and you can see this quantum relative entropy as a quantum generalization of the cool back library divergence okay yeah I can say here that this is one one generalization one non commutative generalization of the relative entropy but there are actually others but this one is the most most famous or most well the most used one reason I would say it's the most used is exactly this shine lemma that I that I will show right so because the shine lemma does not have any divergence it has this specific one okay then this is why or one of the reasons let's say it's the most used even okay so the paper that proves this it's called the proper the proper generalization of the of the relative entropy or something like this it's a paper by pets and here I believe okay okay good so yeah before getting into the proof let's see a few properties of this quantum relative entropy so first so if I take two states right so if row and sigma are both states then the relative entropy is is non negative it's almost non negative so it's a valid measure and we have equality if and only if row is equal to sigma particularly if row is equal to sigma then it gives zero and zero only in this case okay and so and again as I told you what is the most important inequality is what the data processing and this also holds for this for this for this relative entropy and so if I apply a channel on the two sides then the quantity can only decrease okay so as I said so I won't prove this even though we'll use it actually we'll use this in the proof of the Stein's lemma I will only prove this tomorrow this will actually be the objective of tomorrow's lecture will be to prove this inequality okay so this quantum divergence plays an important role also because you can derive from properties of this quantum relative entropy properties of the von Neumann entropy which you might be more familiar with so yeah this you can see this relative entropy as a sort of parent quantity from which you can you can get the von Neumann entropy okay so how do you do that so for a state for bipartite state row AB okay you define the von Neumann entropy this is the way I define it you can check that it corresponds to the usual definition you define it as minus you put a minus sign between the relative entropy of row A and identity so you might be wondering why this minus sign what this minus sign is doing we just saw that it's positive but remember it's positive only for if the states are normalized if the operators are normalized so it's an identity so in general this quantity actually will be negative so when I take minus sign it gives me the the usual von Neumann entropy okay so I can also define the conditional entropy as minus the relative entropy between identity on A tensor the marginal on B and I can also define the mutual information which is the relative entropy between row AB and the product of the marginals okay I mean in the I won't discuss this more in this lecture but in the tutorial you'll play a bit with these entropies this will be useful for the following lectures okay good so I hope yes of these no I there are but this is it will come I mean so I mean we'll see one maybe not very direct one but in the fourth lecture on Thursday but now I won't discuss okay so the mutual information at least it has a natural one in terms of right so in terms of the Stein's lemma if you say I want to distinguish a state row AB and the product of its marginals and you can ask about the type 2 error for a fixed type 1 error then this gives an operational interpretation if you want but here these also have other interpretations in terms of compression and things like this like as in the classical setting but we won't cover this in this course okay good so I hope this is clear so now what I will try to do is to try to do a proof of Stein's lemma so now I will go a little bit more quickly maybe if you don't follow the all the details of the calculations it doesn't matter so much but maybe if you get the overall idea of what the structure of the proof is and then you can look back at the notes if you want to see the details okay so again recall the statement the statement is as follows so it can be written in the following way so it's a limit okay good so I have to show the two inequalities right so I have to show this inequality which I'll call an achievability result which is to say that I can make the type 2 error sufficiently small I can make the type 2 error of order 2 to the minus the relative entropy between row and sigma and then I so this is what I will start with okay and this is what we call achievability is that I have to give you a strategy okay and so what we'll do is we'll start with the case the classical case where row and sigma commute okay so this is then a statement about just probabilities so okay so yeah let's take the setting where I have row and sigma which have eigenvectors eigenvalues p and q respectively and I take now tensor copies so I take row tensor n and sigma tensor n okay then this naturally corresponds to the product distribution where I take yeah so the eigenvalues are just the product of the corresponding eigenvalues and then so then I will have to define a test right so what is my test for this problem okay so I can see it classically so it's useful here to think classically it's maybe simpler to think of a test in this case I see n samples x1 to xn again either they come all from p or all from q okay so I will compute as I discussed before I will compute this likelihood ratio so I will compute r and compute what is the probability that I have of seeing this particular sample I saw right so what is the probability it's p of x1 times p of x2 etc times p of xn okay and I divide it by what would be the probability if the distribution was q okay and I just multiply them okay and now I will have to just set the parameters so okay so I will have to put my threshold right so remember I told you that we will put a threshold if r is bigger than something I will say p if r smaller than something I will say q okay so here it turns out that the right threshold to put so that we have the right errors is this okay so I will just say 1 over n times the log of r is bigger than this quantity okay and delta is a small parameter that will let go to 0 at the end of the proof okay so if this r is bigger than the threshold I say the samples are from p otherwise I say that samples are from q so if you prefer to okay so this is a strategy that is explained in classical terms if you prefer to use the quantum notation and the definition of the optimization problem that I gave you for this d h epsilon the corresponding e has the following form right it's the sum over all the x1 xn the project around to x1 xn of the particular x1 xn that have the right such that the ratio of the probabilities is bigger than this threshold that I chose okay good okay so of course yeah I wrote here that d depends on n I didn't put it explicitly but of course it depends on n okay so now I have to analyze this test and to say okay so what does it mean to analyze this test is I have so given a test it defines type 1 air probability and it defines type 2 air probability so I just have to compute these two numbers now and to check that it satisfy that the type 1 air probability is more than epsilon and the type 2 air probability is smaller than 2 to the minus n times the relative entropy okay and so yeah so I have to compute these two things so let's compute it so it's simpler here to compute the probability of success rather than the probability of error okay so if the samples are from p so again I mean hypothesis 0 then I just it's useful here to use the just the probability notation okay if the threshold is bigger than d and now I just use the fact that the p and the q are is a product distribution so and I write this as the sum of the logs of the probabilities okay and now you see that now these random variables are independent and identically distributed and they have an expectation right I mean remember that I'm in the case where the samples are from p here they have an expectation which is given by the relative entropy between p and q okay and so again by the now by the law of large numbers so this is a sum of independent random variables which have an expectation which is given by the relative entropy the probability that it's bigger than the relative entropy minus a little bit goes to 1 as n goes to infinity okay so yeah so even the type this or here I should say the type 1 success probability goes to 1 as n goes to infinity so in particular for large enough n this will be bigger than 1 minus epsilon okay good okay so this was the first type of type 1 air probability now I have to analyze the second type of air probability and now this I have to show that it it goes down very quickly to 0 as a function of n okay so right so yeah I have to analyze this quantity and so I just write it like this so it's the sum over x1 xn of the probability that these specific x1 xn satisfy this condition okay so I wrote it in a slightly different notation here where I put the condition here okay and if you just rewrite this condition it just tells you that the product of the q's is at most sum number which is directly related to the threshold which is exactly the threshold that I fixed or 1 over the threshold I fixed times the product of the p's okay so and then what I do is I replace this product of q's with this bound that I have okay and so this bound that I have stays and then the sum over the p of p of x's is at most 1 here even because I forgot the constraint that I had okay and so I have that this type 2 error probability is bounded by 2 to the minus n times the relative entropy okay so here I just manipulation I just take minus log of that it gives at least n times the relative entropy and so I get basically the desired statement okay so I have for large enough n 1 over n the hypothesis testing relative entropy is at least the relative entropy between p and q okay and then I let delta go to 0 okay good so that was we did the classical case achievability okay classical achievability case okay so now to finish the achievability we have to I only give you the classical case okay so this is how do I go to the quantum case now okay so for that we'll use a technique which is quite useful sometimes I mean useful in quantum information theory it's this technique of called pinching okay and it's a general way of reducing the non-commuting case to the commuting case okay as we'll see it's I mean we get some losses but at least for this particular problem these losses we can handle them okay so what does pinching mean okay so what does pinching mean it means so pinching map is depending on some her mission over let's say positive operator here sigma and I just do a spectral decomposition of sigma okay so let's assume its eigenvalues are lambda okay so and pi lambda is just the projector onto the eigenspace corresponding to lambda okay and what I do here what this pinching map is so it's a quantum channel it's a bad quantum channel what it does is it takes an operator s and it maps it to the sum over lambda of p lambda s times p lambda of course here lambda is just in the spectrum of sigma I didn't rewrite it again but yeah so p lambda is this one okay so in some sense I'm so if I take these eigenspaces what I'm doing is I'm removing these two diagonal terms that are corresponding to this to these eigenspaces okay so for example if we imagine that pi lambdas are rank one projectors so okay they correspond to to some one dimensional projectors then this would mean what what p sigma of s is doing is that it's removing all the off diagonal terms okay but notice that and this is important they are not necessarily rank one they can be of much bigger rank and in this case I'm removing the off diagonals corresponding to so I have blocks I have diagonal blocks and I'm removing the off diagonal blocks okay good so why is this useful why is this map useful so the thing which is useful here is that notice that p sigma of s commutes with sigma okay this is easy to see you will discuss this in more detail in the problem session and so of course it's easy to construct an operator that commutes with sigma alone but we want another property that it retains some properties of the operator that I apply the map to right and so this is quantified by this operator inequality right it says that if I apply the pinching map relative to what was to the state or to the positive operator row then I cannot lose too much in the sense that this operator is at least some quantity times row and this quantity what does it depend on it depends on the number of distinct eigenvalues of sigma okay or the size of the spectrum of sigma yes yes I include it yes but yeah so just if an eigenvalue appears several times I don't include it multiple times this is the key fact yes yes I mean her mission so that you have to diagonalize but yeah yes okay good so this is what I'll use okay I see I'm a bit late on time so let me go quickly the important maybe is to see just the overview of this part so okay so I start with row and sigma which in general don't commute so what I'll do is I'll use the data processing inequality for dh epsilon and I'll apply pinching to the two sides okay so I apply pinching relative to sigma tensor n to the two sides okay so if I of course if I pinch a state relative if I pinch relative to a state and apply to the same state it gives me the same state right so it doesn't do anything okay but of course on row it does do something okay but now the advantage is that here these two states they commute okay so now I'm in a classical case right so I can pick an eigen basis where both are diagonal and so now everything is diagonal and I can apply the the classical theorem the classical result that we saw before okay so so yeah so this is what I wrote here so this because these two commute then this corresponds to the hypothesis testing between two probability vectors p and q okay so now if I use the classical result at the classical achievability result that we have just seen now if I take the limit as m m is different from n okay so I have two of these parameters so I take the limit as m goes to infinity and I take multiple copies of p I will get that this goes to the classical relative entropy between p and q okay good so now how do I but okay now this is not exactly what I wanted right I don't I don't care about this this row after which I apply pinching I want to get back row itself right so okay so what do I do what we will do is this so I will take I will apply the pinching to n copies of of row and then I take that m times okay and then I use this result to get or okay here I just I just rewrote the statement I just replace this quantity by p and this quantity by q and now I use this result here to say that this is at least I mean here it's just this inequality at least this is what we're interested in this goes to like the one over n stays and the relative entropy comes from here this inequality okay and now I can again go back from the the probability vector to the to the corresponding quantum states commuting quantum states okay and then I want to get back now I have a relative entropy so it's good I don't have a hypothesis testing right of entropy but but the usual relative entropy but what I want to do is I want to get rid of this pinching operator and so now I here is where I will use this pinching property the second pinching property is that if I apply this pinching to rho tensor n then this is at least one over the size of the spectrum of sigma tensor n times rho tensor n okay and now the important thing to observe is that what is what can I say on the size of the spectrum of sigma tensor n right so notice that that the eigenvalues of sigma tensor n are of the form I take a product of eigenvalues of sigma right but how many different eigenvalues are there right so I can count for each eigenvalue how many times it appears right and so how many times it appears is between some number between 0 and n right so I have n plus 1 possibilities and this is for each one of the eigenvalues okay so that's why the number of different eigenvalues is at most n plus 1 it's because it's 0, 1, 2 up to n and I have at most d of them d where d is the underlying dimension so it's to the power d or d minus 1 because the last one you can infer it from the others okay good so that's good because this quantity now is that I have one over some polynomial in n times rho tensor n and this is what is the crucial here is that it's polynomial in n not exponential okay and so now so if I look back at now my relative entropy between the pinched rho tensor n and the sigma tensor n okay I can do some basic manipulations with this pinching and the operator monotonicity of the log right to use now okay so yeah let's go directly here okay so I have log rho tensor n times log of p sigma rho tensor n and I want to replace this by rho tensor n but remember yesterday in the problem session you saw that log is operator monotone right so the I have so this this operator inequality if I apply the log to it it still holds right and so this is where I can replace this p sigma of rho with just rho tensor n but I will lose some factor this I mean here it should be divided by the spectrum of sigma tensor n but of course yeah log of the product is the sum of the logs and so I can get this minus log outside okay so globally what did we do so if we put this together we have 1 over n the relative entropy after apply the pinching is at least 1 over n the relative entropy before apply the pinching minus sum lost term here but this lost term is very reasonable because this thing is polynomial in n and I take the log and then I divide by 1 over n so this thing goes to 0 as n goes to infinity okay so notice that one thing I used here that I didn't that you might wonder about is here I use the fact that this quantity is additive so the the quantum relative entropy on tensor powers it's the sum of the relative entropy it's relatively easy to check so I included the idea here of how to show this okay so okay so I hope this was clear so I have two minutes remaining so maybe it's not worth doing the the converse or maybe I do it very quickly the converse let's say in two minutes okay so for the converse actually we'll prove something slightly weaker we'll not prove the full result I only prove that this was sometimes it's called a weak converse I'll just prove that if I take the limit as epsilon goes to 0 then this gives me the relative entropy but it is true that it works for any epsilon this is what is called a strong converse but here it's I mean we need to introduce some new tools in order to do that so I thought it's simpler to just focus on this okay and so the main idea if you want to prove that is just applying data processing so it's a very simple proof the main thing that you do is just apply data processing in equality to the right map so what is this map so that we'll apply the data processing in equality 2 it's the following map so right okay so yeah maybe I should so what do I mean by converse right so I want to prove the opposite inequality right so and what is a converse it's a sort of no-go result it's saying that if I take any valid strategy then the type 2 error cannot be too small it cannot be too small so what I have to do is I take an arbitrary strategy E okay which has to satisfy this this is by construction by definition of what the strategy is okay and I will apply now the data processing inequality for a well-chosen quantum channel which depends of course on the strategy and so how what is this channel doing is it's mapping an operator T into the following so it will of course depend on the strategy that I look at so I will just output 0 with probability trace of E times T and I will output 1 with like 1 minus this probability okay so you can see that if I apply E to this channel then I will observe easily all the types of error probability and the success probabilities right and okay so and what will we do now is we'll say that on one side I'll just compute this quantity right the relative entropy between rho tensor n and sigma tensor n okay so it's just by additivity is equal to n times the relative entropy between rho and sigma and on the other hand I can find a lower bound on this quantity by using data processing right so I apply this previous map E and what this allows me to do is it allows me to give a lower bound on this relative entropy as a function of the type 1 and type 2 error probabilities right because remember this is the case and so I just do that and yeah by just then doing some simple computations we get the desired result okay let me just finish with a very quick remark on what this quantity is so the reason I wanted to define a specific quantity for this task and call it a divergence this d eps on h because this is done a lot in information theory and such quantities are sometimes called one-shot entropies or one-shot relative entropies and they're usually measures that have an operational interpretation for any states for any choice of states so for example here it has an operational interpretation directly in terms of this hypothesis testing problem okay and there are many others that are particularly relevant in settings where you want measures that work for any states not necessarily for specific classes of states but a particular one very important one is sometimes called the mean entropy or the smooth mean entropy which is particularly relevant in cryptography okay and these these entropy measures that you might be more used to like the von Neumann entropy and the corresponding relative entropy they usually have an operational interpretation when we're in a setting when we're in an IID setting like exactly what we saw in this lecture remember we have n and sigma tensor n and we want to characterize the type one type two error probabilities in this case okay in this IID setting or in the average setting okay and I'll stop here for today sorry for being a bit long