 The proof of Shannon's theorem in the classical case, but also generalized to the classical quantum channels And then in the second part of the lecture. Yes Okay, so so let's recall what we did yesterday. So remember yesterday we characterized the optimal number of messages I can send over a noisy channel over a very general noisy channel in terms of Hypothesis testing and namely hypothesis testing between the The state which is given by the joint state between the input and the output of the channel and the product of the marginals Okay, and so we had this this converse bound Okay, and then we had an almost matching achievability bound, okay and so what I would like to do now is to Consider the setting where The interesting setting where I have n independent copies of a channel right and for that we define the capacity which you worked with yesterday in the exercise session and So let's see how we can apply this general theorem to the setting where I have an IID channel Okay, and independent copies of the channel. Okay, so I went over it a little bit quickly yesterday. So the idea is this is yes, so it's very simple. I just apply it directly to W tensor n and So I get an upper and the lower bound as a function of this hypothesis in relative entropy Plus some small terms here and then what I do is I multiply by 1 over n Remember I look at the rate of the number of bits. I can transmit Okay, and then I take the limits as n goes to infinity and then epsilon goes to zero and so it gives me this expression Okay from the capacity So and the question is okay, so this expression Is not we want an expression which is simple to compute right? We want simple properties of the of the channel and so So, okay, so let's see so in this corollary what I stated as I stated an upper bound a lower bound and an upper bound Okay, so the upper bound the lower bound is simple on the capacity. It looks simple. It's just it doesn't involve any limits and so Let's see how this How this is proven this is this can be proven directly using Stein's lemma Okay, so we have this lower bound Right, we characterized the capacity in terms of this limit right limit of DH, but with a limit as n goes to infinity But this is exactly if you look at Stein's lemma if you fix The distribution px to be product Right, so if you pick pxn to be a product of px times px times px times px then This state becomes a product state and this becomes exactly Stein's lemma Okay, and so we can apply directly and we know that this expression is equal to the quantum And as you remember the the relative entropy between the state and the product of its marginal is nothing but the mutual information Okay, so I hope this is clear so the But in general we can't say we can't use Stein's lemma in general to just say that This quantity is equal to the relative entropy because we're not guaranteed that the distribution pxn is product Right, this is a difficulty. So this state is not necessarily Of product form right and remember for Stein's lemma. It's really I want to distinguish between two product states Okay, I can nonetheless replace this d epsilon h in terms of a quantum relative entropy By using the inequality that we used what in the proof of the converse of Stein's lemma So, yeah, if you if you remember in the proof we used this inequality which relates The hypothesis in relative entropy to the quantum relative entropy and this was proved using the strong sub-editivity Or sorry, I should say here the data processing inequality for the quantum relative entropy and so yeah in this expression I can replace the Hypothesis in relative entropy with the quantum relative entropy If I pay this price of dividing my one minus epsilon Okay, and then okay, so the relative entropy between the state and the product of the marginal is the mutual information as usual and Yeah, so the only thing I did here as I replaced the the limit by a supremum over n and This is just by observing that this quantity If of n right, which is the supremum over all possible joint probability distributions over n Copies of the input of the mutual information between the input and the output See that it's super additive and this you did as an exercise in the exercise session, hopefully This just simply follows from the fact that when you when you pick p xn to be product Then the mutual information is additive Okay, but then the the why can it be really super additive and not just additive is if the optimal probability distribution is not product Okay, good Okay, so we have this General expression now for the capacity. It's still not satisfactory right because Okay, so this lower bound is nice, but the upper bound has a limit as n goes to infinity. So what is really the use of this of this quantity might wonder and So Here this is a setting where sometimes so this argument that I did here this kind of always works Right, so it works for any task that you consider as long as you have a one-shot nice Characterization you can just take the limits in the same way. I did it okay, but what what The thing that is really nice about this specific task of point-to-point channel communication for a classical quantum channel is that there is There is something very nice which will happen is that actually this will be equal to this So or in other words this function in other terms this function f of n will be actually exactly additive Okay, and this is the thing which does not always happen right so this is really we'll use very specific properties here of These classical quantum channels Okay, so this is exactly what I wrote here. So we want to evaluate this expression and so we'll see here that for classical quantum channels this reduces to Just applying this on one copy, which is equivalent again to showing that this Px can be chosen to be product Okay, excellent. So Okay, so let's show this okay, this is a simple lemma But really crucial I think which it is the crucial thing which makes Shannon's theorem work in the classical case is the following is that Yeah, so if I take for any n I consider the mutual information between the inputs and the outputs of n copies of the channel Okay, and I optimize over all possible possibly correlated distributions over the these different inputs Okay, and so this turns out to be just equal to the supremum over just Choosing a distribution over one copy of the mutual information between the input and the output Okay, so one side is very simple as usual. So this is the this side So the super additivity this falls from the super additivity of fn. This is the same as the super additivity of fn So and you did it in the exercises. So I won't do it the part which is non-trivial is the is the fact that The left-hand side is not bigger than the on the right hand side Okay, so yes, it's this inequality. So okay So we have to start with a with a distribution on on the inputs, which is arbitrary not necessarily product Okay, and so let's construct row xn bn as usual, right? So I construct I have the x system and the corresponding output of the channel so the mutual information between xn and bn is Okay, so I'm dropping the rows here everywhere because it's always wrong so So the mutual information between xn and bn is by definition the entropy of bn minus the entropy of bn condition on xn Okay, so And now remember my objective is to relate this to the mutual informations Of just one single use of the channel. Okay, not not any uses, but I want to decompose this into single uses Okay, so the natural thing to do for this term is just to say, okay I have sub additivity so the entropy of B1 to bn is smaller than the entropies of the The sum of the entropies of the individual systems B Okay, by the way, I notation. I'm not sure I fully introduced it is when I use a notation like this I just mean x1 x1 to xn Okay, good Okay, so what is the entropy of now the conditional entropy bn conditioned on xn So this is also simple to write because the xn system is classical Okay, this is important here is that this can just be seen as an average of The entropies of the conditional states Okay, so here it's an average over x1 xn over the values of the system I'm conditioning on of the entropy not of the marginal state, but of the conditional state So conditioning on having x1 to xn my state on b is this, right? And so I look at the entropy for the system bn for evaluated on the state wx1 tensor wxn Yeah, so I recalled here This is I use the property of the von Neumann entropy is that conditional entropy is the average of the entropy of the conditional states This is an easy exercise Okay, so now what do we use we will use the fact that the entropy on product states is Additive right this I think you have seen in One of the exercise sessions that if I take a product state the entropy is is additive Okay, so the entropy of bn here is just the entropy of b for the state wx1 plus the entropy of b for the state wx2 etc Okay, but then of course you see that this term only depends on x1. It doesn't depend on x2 to xn Okay, so I can just sum for when I look at just this term I can sum over x2 to xn and so what I get here is For every I the only term I get is this term right where of course by definition Pxi of little xi is just the marginal of this joint distribution Pxn And so now what happens now? Now this is nothing but the entropy of bi condition on xi And so I have the sum for i equals 1 to n of the entropy of bi condition on xi Which is exactly what I wanted Because okay, so now I get back let's get back to the to the mutual information between xn and bn So I have the I have upper bounded by the sum for i equals 1 to n of H of bi remember this we obtained it by just subeditivity of the entropy and This we obtained it by this is was actually an equality this we obtained it by the fact that the channel is a product channel and the fact that And this is again now the mutual information between xn and bi and so it's just upper bounded by n Times the sum over where this so n times sorry the supremum over all possible probability distributions of the mutual information between x and b Okay, so it's a it's a simple proof, but I wanted to go through it to Because yeah, sometimes I remember when teaching this people think that it's trivial that this holds But it's it's not trivial and and oftentimes it doesn't hold And the the crucial thing is to show that the the optimal px is not is can be chosen to be product Good, so if we put all of this together, I can just summarize this As Shannon's theorem for c for classical quantum channels and so it just says that for any channel w the capacity is just given by The supremum over all probability distributions of the input of the mutual information between input and output Okay, so I should say also that this is actually a convex optimization problem So you can compute it efficiently if you have a description of a channel you can compute this quantity efficiently Yes So because the way I define the channel I define the capacity is I take epsilon go to zero Okay, so I I define the capacity in the optimal rate when the The error goes to zero. Okay, but that's a good point. It turns out actually that you can show that This capacity is the right is So even if we had taken just epsilon to be a constant it would give us exactly the same capacity Exactly the same number like exactly like for Stein's lemma. Remember, so we only prove the case epsilon going to zero Okay, but it turns out that for any epsilon you still get also The relative entropy here also, it's the case right here We have what is called a strong converse in the sense that if I fix any epsilon imagine I just want the air to be smaller than 0.5 Okay, then I I can also define the same quantity the capacity and it will give us the same number Yeah, so no the proof I gave only works for epsilon going to zero So you would have to to to get to do a more complicated proof Yeah, yeah, so so this part of the proof is exactly the same. Yes This this additivity is the same the part where we lost is this thing. Okay From DH epsilon to D and I have this one minus epsilon Okay But so and also this is the thing that made us lose in the in the Stein setting and But yeah, but so one way to handle this in a in a more efficient way is to use another Relative entropy rather than the quantum relative entropy to use alpha rainy divergences or alpha Entropies because they're more sensitive in some regimes and so and then taking alpha going to 1 you would get the desired result But it's a little bit more complicated Okay, so Yes, one thing I wanted to to point out here is and I think you had a similar related question in the exercise session as well is a Kind of very nice fact about this is that it's saying that So of course if you take a channel which where the input is independent of the output then this is completely useless This is not only the capacity is zero, but you you can't transmit anything through it and And so this is showing that these are the only channels with zero capacity Okay, so if you have zero capacity or so this is obvious that if the channel is trivial Then you have zero capacity, but the converse is also true if you have zero capacity then the channel is just this trivial channel Okay, so as soon as you have some correlation between the input and output then the capacity is positive Okay, and this is I think quite surprising and a very nice insight that that channel had Okay, so I hope this is clear So what I would like to to get to maybe a bit quickly Because now the theory is a bit less clean Is for general quantum channels? Okay, so with was a was a classical with a quantum input now Okay, I'm still want to communicate classical information between The sender and the receiver, but now the channel is not necessarily having a classical input, right? It has an Hilbert space as as input and I can choose arbitrary quantum states there Okay, so there you can define very similar definitions you can define an M code in the same way, so the only thing which is different is now my encoding function instead of mapping a message s to An input of the channel an x like a classical x as an input of the channel now it maps it to a state Right, so to a valid quantum state which now gets into the Input of the channel Okay, and then the for the output is the same thing still a p of m and Yes, what happens here? So again, it's the same the way to define the air probability is very similar The only difference is now I apply the channel to E of s and now E of s is a quantum state, so Okay, so actually if you look back at the proofs we did for CQ channels You see that you can find you can basically read up the same proofs by but just you say that instead of optimizing over the input distribution which was over the the Inputs of the classical inputs of the channel now you also optimize over these Quantum states sigma a x Okay, so now x is not really part of the channel, right? So x is just a set of states that I choose Okay Okay, and so the set x is an arbitrary set even of arbitrary size Okay, and I will optimize over them Okay, so if I do that and corresponding look and look at the corresponding CQ channel where on input little x it outputs Wx, okay, which is the the genuine quantum channel applied to sigma a x Then we get the following Right, so I summarized both the converse and And the the achievability in one statement So, yes, so here I Instead of just optimizing over the px. I also optimize over these signal states sigma a x Okay, and I optimize over now over the set x as well right, so this becomes a more complicated optimization and Yes, and the converse is also the same Okay, yeah, so I say I take the supremum over arbitrary large Set the set x but yeah, I mean in the final expressions will get later. It's possible actually to bound the size of x Okay, so yeah, as usual we look at the important special case where I have n independent copies Okay, and I define the the capacity in the same way as the limit as epsilon goes to zero n goes to infinity Okay Okay, and yeah, I can also use the same analysis as before by using Stein's lemma And so now it's I will introduce this quantity. This is called The whole Avo information of the quantum channel W. Okay, so this is the supremum over all possible Not only the probability distributions But the inputs to the quantum channel of the relative entropy between row xb and row extensor will be Okay, so I just say that yeah, so this is the whole Avo information So I won't discuss it much more, but if you want some more properties I invite you to look at these text books and Also the whole Avo information is used quite a lot not not necessarily for channels But just for a family of states for an ensemble of states right so for a set of probability distributions and correspond and States and it's just one way to define it It's just the mutual information between the x and the corresponding system a so I Define the state row x a which is just the sum of p of x project on extensor sigma a x and I look at the mutual information Between x and a Yeah, so I wrote here that was this notation if I have a cq channel You can see that that we what we showed before is that the capacity Of a cq channel is given by the supremum now only over the probability distribution px of the whole Avo information between the the for the ensemble which is given by the probabilities of x and The output of the channel for an input x. Okay, so this is a valid ensemble And so you optimize the whole Avo information over the distribution Okay, good, so Okay, so now let me state what the the main theorem for for general Quantum channels is as far as the capacity is concerned. So Yeah, the capacity is just here we'll just say that it's the limit of As n goes to infinity as of 1 over n times the whole Avo information of the channel tensor n Okay, so this is the equivalent of the of the expression we had of this limit that we had at the Before having the simplifying lemma Okay in the in the cq case, okay So in the in the in the fully quantum k in the quantum channel case Then we still have this result and basically was the same proof as what we did in the cq channels You just have to carry around this optimization over the sigma a x's Okay, but as I said before this expression is not super useful because it's different It's difficult to compute it also involves the limit Okay, as as the definition of c of w. So you might wonder why is this useful? And Indeed it's not it's not the most useful expression and you might wonder whether it also simplifies like in the cq case Like does it also simplify to I can just take n equals 1 and I get the capacity Okay, so this was Pose I mean this was a conjecture for some time that it was it was thought that it's the case that it's additive but This was disproved and There were constructions of channels showing that in general This chi expression is not additive for tensor power channels Okay, so there exists Channels w such that if I take two copies of it It's strictly bigger than two times the whole ever information of just one Okay, and or in other words what this is saying is that This ensemble that we optimize over here Will not be product Okay between the different channel uses okay, and you can achieve better by doing that Okay, so I this is what I wrote here is that you can choose These ensemble of states so just so that these states will not be of product form Yeah, so this was if you want to if you're interested in this the construction the first construction was done by Hastings in 2009 and It's a quite complicated construction right so the W. It's not the natural W for which this It's not the natural channel for which this holds you really have to construct this channel in the very Delicate way and by using a random construction and and Delicate argument Okay, if you're interested in more about Like the use of Geometric analysis in quantum information. I also recommend this this book which also has this argument. It's called Alice and Bob meet so Okay, so this is the on the negative side So in the sense that we can't really simplify this expression to just n equals one But so the reason this expression is sometimes useful is that for some classes of channels You can actually show that you have additivity Okay, so there are some natural class of channels for which this is additive and so you we can just assume n equals to one Okay, though I should say that this expression. Okay, unlike the classical case even for n equals one This expression is actually hard to compute in general I mean, it's a fine. It's then a finite problem But in terms of the efficiency in terms of the of the channel W This is not a not an easy problem. It's NP hard actually in general unlike in the in the in the classical case where it's it was a convex problem okay, so I can even give you an example of of one let's say one frontier in this area is To understand the capacity of the amplitude damping channel Okay, so this is the definition of the amplitude damping channel and its classical capacity is unknown Okay, so it could be additive or it could be non-additive. This is unknown This is an interesting question. I mean from some numeric, I mean Example I did some numerical investigation of it and it seems to be additive So, yeah, this is an interesting Question, this is a let's say a frontier in this area of classical capacitance. Yes, sure I think this is a very interesting question. I I I don't know. Yeah, I don't think It seems like a very hard question to tackle. I mean just to show this like like to show that for So I would be interested. Yes. Yes. No, no, but this I agree. I don't care. Yeah, it's finite anyway, so Yeah, yeah, yeah exactly Yeah, yeah, yeah, I really don't know anything in this direction. Yeah, yeah, okay good any other questions related to this Okay, so in the remaining time I'll try to Okay, so I know that what I advertised was quantum communication, but So yeah, I decided to do something else at the end because in half an hour it would be difficult. Sure. Yeah, sure Yes Yes, I think But it was reduced after hastings I Think it's of the if I remember correctly I think there was an example of the order of a hundred like where the dimension was Think by Okay, maybe let me not say name because I might make a mistake on I can tell you after Okay, so now I want to try to present sort of another Angle to looking at optimal channel coding Which I've been interested in in the last few years which is to look at things instead of Looking at things from a statistical point of view where I take Like n independent copies of the channel and look at what can I what can I relate this to entropic quantities? I look at the question from an algorithmic point of view. Okay, so I assume that the The channel is given as input. Okay, and I and my question is I would like to find the optimal codes and Yeah, I would like to analyze this and see if it gives us further insights on to this question Okay, so Okay, so here I mean here I will look only at the classical case Okay, so fully classic classical input classical But we'll see still there are some quantum motivations to looking at this question Okay, so what is the computational problem is I have a description of a channel w and an integer m so this is the the number of messages I would like to transmit and Okay, so it's it's more convenient to give it in this way here and what I want to output is an encoding and a decoding which maximizes the success probability Okay, so or I want to just compute. What is the optimal success probability? Okay, over all m codes for the channel w Okay, so and here I consider really a setting where w is general Okay, completely general arbitrary channel and my objective is to compute the optimal success probability Okay, and so actually the the reason I came to this question My motivation for studying the question in this way was That I wanted to understand how much entanglement between the sender and the receiver can make a difference in terms of for a classical channel Okay, can it for example significantly improve my My possibility for for sending information through a classical channel Okay, but we'll see this in a bit. So let's just define the problem right now Okay, from I'll write it as an optimization problem. So again the inputs are w and m and what I'll what I'll What I do is I just maximize over all the encoders and the decoders and I took the same notation as what I had Yesterday So he is just a function which maps to the channel inputs. So I map e of s See what is the probability that goes to y and what is the probability that I decode y to s? So here the constraint on On the decoders just that it should sum to 1 okay for every y it should sum to 1 okay, so It's very simple to rewrite this expression in terms of a more combinatorial problem Where what I want to to the the objects I'm optimizing over our state our sets Okay, so I can rewrite this expression as a maximum over some subsets of the inputs of the channel So I look over all subsets of the inputs of the channel And and these subsets are have a size which is bounded by M So this is the code I even defined the C if you remember in the last lecture and I want to so for all these I want the C which optimizes a function f w of C And how what is the definition of this f w of C? It is a simple function, which is just the sum over all channel outputs of For each channel output. I see what is the input in the code right in in in the subset Such that the w of y given x is the largest definition is clear and Okay, let's see you very quickly why this is the case So, let's just look at the expression. This is the expression for the success probability. Okay, I removed just the one over M Okay, so I just remember that DS of y is a probability distribution, right? So I have a convex combination of these terms, right? And so there it's upper bounded by the maximum term Right, so I have just a convex combination is always Upper bounded by the maximum. Okay, so I just take the maximum over S and M get this and Yeah, so you see that this only depends on The range of E right so when I apply E to S is so I can just forget E of S and just look at the The set C That is defined as the set of X is such that there exists an S that maps to X Okay, so I just get this and and so so the our success probability will be at most this quantity But it's easy to observe that you can achieve it by just picking DS of y equal to 1 at the S Which maximizes w of y given X? Okay, so you can see that it's easy to achieve this by choosing D of S DS of y Appropriately okay, and this is what is called a maximum likelihood decoding if you would like Okay, good. So now we have this expression now. This looks like a very combinatorial problem, right? so I I have a function and I want to a function on subsets and I want to optimize over all subsets of a given size so So you can make one observation here that this function has a nice property in particular it is It has a property which is called submodular So a function on on subsets is called submodular if it has this diminishing returns property, right? So if I take two sets One is included in the other Okay, then and if I start adding an X to the large to to one of these sets I can see what do I gain okay by including this extra person, okay? So I can look at this for C. So when I add X to C. I get f of C union X minus f of C. This is what I gain and I can look at what happens when I add it to C prime. Okay, so this is the submodularity property saying that For a smaller set I gain more by adding X Then if my set was already large Okay, and this function also okay if you look at this function It's also a monotone function fw. Right. So if I add more elements to to C then it's all it obviously increases Okay, and so there is a there is a very famous result in combinatorial optimization quite old Which says that if you have a monotone submodular or non-negative function Then there is a very very simple algorithm That achieves not the optimal but close to optimal Okay, so What is the greedy algorithm so what is the greedy algorithm is just that I start with an empty set and I keep adding elements the ones which Maximized my function the most right so I can yeah, so I start with the empty set and I add an element which maximizes the function when I add the element X Okay, and I keep adding it until I have M elements This is a super simple algorithm the most simple you could come up with and then yeah So they showed that you can achieve this approximation ratio one minus one over E Okay, so and you'll do this actually in the in the exercise session Because I feel it's a very nice argument Okay, so what does this tell me on my on my Channel coding problem Okay, so it tells me that I can actually With this greedy algorithm find a code Which is maybe not optimal maybe not the optimal one, but it will it will achieve a success probability which is One minus one over E of the optimal Okay, so if there was a code for example that achieves the success probability of 0.5 then I Will using this greedy algorithm. I am guaranteed that I will find a code which has a success probability, which is at least 0.5 times This number which is roughly 0.6 Okay Good, okay, so that's a very simple algorithm that achieves this you might wonder whether you can do better than this Whether we can for example find the optimal exactly I mean, maybe the problem is easy. I can find the The optimal code efficiently so it turns out it's not the case and It's even not possible to improve this one minus one over E right and because you can do a simple reduction to the maximum coverage problem and which is Hard to approximate to a factor which is better than one minus one over E Okay, so of course, this is assuming P is different from MP Okay, so kind of the the question is relatively is settled here in the sense that we understand fully the Complexity of approximating this question of the success probability So let's get back to this this motivation. I mentioned at the beginning About does entanglement help? Okay, and this is Because my objective is to now consider upper bounds. So can I find efficient upper bounds on the success probability? Okay, can I say that can I run an efficient program which at the end tells me that the best success probability? I can hope for is this Okay, so yeah, again, let's get back to to The question of how much entanglement helps here So okay, so consider the setting right where I have exactly the same setting as before Right, so I have this noisy channel in coder and decoder, but suppose for some reason The the sender and the receiver they share some entangled state Okay, so this is some pre-shared entangled state, which of course doesn't depend on the message that that will be sent it's some fixed state of arbitrary dimension and The both both the the sender and the receiver can try to use this their their their part of this entangled state in order to try to improve the success probability Okay, so let's write down this Problem of the optimal success probability in mathematical terms Okay, so Corresponds to this here Okay, so yeah, I wrote it was a cue here the optimal success probability that I can achieve with entanglement Okay, so I said I didn't put any restriction on the Hilbert spaces of the sender and the receiver So this is an arbitrary Hilbert space H, okay, and I have a joint state I can assume it's pure between the two parties, okay, so psi is a bipartite state and I have a Measurement a POVM that I perform on the sender side, okay, and the POVM that I perform on the receiver side Okay, so and and what is the procedure now? So the the sender he wants to set to to transmit the message s Okay, so he looks at his part of the entanglement and does a POVM which depends on this s Okay, and the output of this POVM will be an x which you input into the channel Okay, so yeah, this is exactly modeled by this for every s I choose a POVM And similarly for every y I choose a POVM for the receiver side And so the the success probability is just given by this right, so this is the probability of so having s this is the probability of The POVM outputting x and then this x go through the channel the channel outputs y And then I see what is the probability of getting the say the same s as here when I decode Okay, so yeah, of course, it's obvious here to see given this way of writing that entanglement can only help right Having it because you can take for example h to be one-dimensional, so you wouldn't have any Any state right and so this is obvious, but the question is is this strict Okay, so can there be a channel w and an m for which this the entanglement can strictly help and The answer is yes that there can be channels for which this is strict and this is one of the exercises I put in the in the sheet and And this actually if you look at it, it's it's basically the same as if you're familiar with two-player games right and Belly inequalities violating Belly inequalities, and this is exactly that or let's say it's it's a very close setup okay, so the the only slight difference here is that The usually in a game you say either I win or I lose the game right here It's not that I either there is some winning or losing configurations I just have some coefficient here, which depends on the inputs and and outputs which tells me whether I I I mean which gives me a sort of utility for every way for every inputs and outputs of the two parties, okay, but It's the same setup Okay, so yeah, so there there are games for which this is different and The question you might I was interested in is is how much can entanglement increase the success probability? Is it? Can we have like arbitrary gaps or? Not of course, we know that for general games we can have arbitrary gaps between the class on the quantum value And it's even in this case Quantum strategies are very complicated In general But this is a specific kind of setup right where we have this channel coding problem where it's it's not an arbitrary game So you can wonder in this case what happens and so this motivates the this question of looking at upper bounds on the on the Success probability and more specifically even upper bounds on the quantum success probability Okay, so we'll introduce a natural upper bound and we'll see it's also an upper bound on the quantum success probability Okay from so from a combinatorial optimization point of view the natural thing to do here is to look at Convex relaxations and in this case in particular linear programming relaxations Okay, so if you think about it for a little bit you can come up with this with this relaxation right where Yeah, so I have still a linear function in w of y given x and now the I replaced this This maximum basically remember I had this maximum over x in the code by this By this variable rxy and I have these conditions on rxy and I have another variable px Okay, so yeah, just if you think about it for a moment. This is the really the natural linear programming relaxation Okay, and it's an easy observation to to to see that this LP is also a relaxation for the quantum value right so for every quantum value For every quantum strategy here I can construct feasible solution for this linear program Okay, so let's see why this is the case So let's take a quantum strategy right was a psi e of x given s d of s given y Etc and so whatever whatever the strategy is I can define rx y in this way I just take the sum over s of this quantity and I take px to be the same except I don't put d of s condition on y Okay, and it's simple to verify that if you take the sum over x's of rx y's you get one Okay, by using the normalization conditions and also it's easy to see that rxy is at most px Like just using the fact that d is smaller than the identity and also similarly the sum over px is equal to m Follows easily by the POVM normalization conditions Okay, so so we had this quite complicated problem, right where we have to optimize over arbitrary Hilbert spaces so And but but this is an upper bound which is tractable, right? So this is a linear program which you can in principle compute in polynomial time Okay in the various in the channel Okay, so Yeah, so just a remark here is that for those of you who know about non-signaling correlations, so these are super set of Quantum correlations, which where you just put the non signaling constraints between Descender and the receiver between the two sides Okay, and these are linear constraints And so it happens to be that this LP corresponds exactly to The maximum success probability you can achieve with non signaling correlations between the sender and the receiver Okay, and so of course from this observation and the fact that the quantum theory with measurements is non signaling You have also this Okay, good. So yeah, so let's recap now. What what do we know? So remember the the central the question of interest was this right? So what is the optimal success probability for a channel w and we were also interested in the version with What was Entanglement okay, and so we had the lower bound On this quantity, which is a construction. It's a specific construction of a code Which is given by this greedy algorithm And now we have an upper bound also using this LP So these two extremities here are are efficient, but in the middle here they're It's hard to compute in general and we have seen already from the from the previous theorem that The ratio the maximum ratio between these two quantities one minus one over e Okay, and so what the theorem I wanted to present here is that to say that actually the ratio between this and this is One minus one over e Okay, so all of these quantities are within one minus one over e of each other Okay, and so you can even refine it a little bit more in the sense that Okay, so if you're not happy with this one minus one over e ratio and you want a better ratio You can get it up to just decreasing the number of messages. Okay, so you can have a trade-off between these things So yeah here. I mean best is maybe to look at an example So yeah, so if I compare just was the number was the same number of messages I have M here and M here then I have this factor one minus one over e and this actually cannot be improved But if you're now willing to lose a little bit in terms of the number of messages, then you can get a factor which is better okay, so An immediate consequence of this is that entanglement cannot help buy more than this constant factor Right, so Yes, so even though there are examples where where it helps it cannot help buy more than this factor And even non-signaling cannot help by more than this factor Okay, so again it was using this the specific properties of this Sort of game right sort of to play your game You can show that for this class of games the The the difference between classical and quantum is at most this Okay, so another consequence you can get from this result is that now if we look at the iid setting We let n go to infinity and we okay So you can also define the capacity for a classical channel if you allow entanglement between the sender and receiver Okay, this is a valid definition you can give and you might wonder whether this gives you a quantity which is larger or not okay, and It turns out it's it's not right using this result Right, so here. This is why I presented the version with the L and M Because here you need to use a version which is where you get a better ratio But it's also easy to get using this This result that it's the same capacity whether you put entanglement or even non-signaling correlations between the sender and the receiver The capacity will not change still the same formula for the capacity good, so Yes Okay, so yeah, so the proof is relatively simple actually of this So let me just very briefly say what what how it it goes. So yeah So here I'll assume just L is equal to M for simplicity. And so the way what we have to do is And oh, okay, so actually I won't prove that I won't prove the relation between The LP and the greedy I will prove a relation between the LP and this success probability directly So I will start from a solution of the linear program and I will construct from it a code Right, so I will construct the code see so a feasible solution for this original problem. I started with Okay, and so the natural thing to do here is to I I picked this code Using the the values of the linear program. Okay, so the natural way here is that if you remember I had this variable px which some to And So px over M defines a probability distribution remember we had that the sum over px is The sum over px is was equal to M Right so if I divide by M I get the probability distribution and what I will do is I will just sample from this distribution M times independently Okay, and then I will have to compute the expectation over this choice of code of this quantity fw of C Okay, so okay, let me not go over this calculation. It's a relatively simple calculation using basic convexity and this Yeah, if you compute the this expectation you show that it's at least one minus one over e times You relate this to the value of the LP, right? because I mean you will you will you will use these px's because they're the probabilities of X appearing in the code and then you relate this to the Rx is using the various inequalities and then here You see that this was exactly the objective function. We had Okay, and so this And this concludes the proof of this Of this theorem Yes No, actually no because actually even the examples are the non-signaling value is one and you can construct example with non-signaling value is one and Success probabilities one minus one Okay, so that finishes the proof and so yeah, I can mention here that one question I I thought about but couldn't solve it is the Setting where can one generalize this to classical quantum channels? Okay, so where the inputs are quantum where the outputs are quantum because one expects something like this will hold as well Okay, so this finishes the lecture so I hope I gave you an overview of some aspects of quantum information theory and Some I showed you some of the techniques which are used and I hope this can be useful in your research. Thank you