 So it was all about the second day, there were short questions. So the first one was asking if the data processing for the trace distance can be seen as just sort of saying that the trace distance contracts under quantum channels and yes, this is true. It's exactly this. The second question was about an operational interpretation of the various representations of quantum channels. We saw like the Choi operator or the Kraus operators. So the Choi operator has a very clear operational interpretation, right? It's just when you prepare a maximally entangled state and half of it, you send it through the channel and then you get the state. So the Choi operator is nothing but that, properly normalized. For the Kraus operators it's a bit less clear, but let's say in the case where the Kraus operators are unitary or scaled unitary, then it's very clear. It's just a mixture of unitary that you're applying. In the more general case, it's a bit less clear what the operational interpretation is. So now let me move on to today's question. So yeah, I can't hear it very well. In some sense, yes, you can extend your Kraus operators as we saw when we saw the Stein-Spring dilation, right? So you can see the Kraus operators as part of this isometry and so the Kraus operators the part that you don't trace out. So yeah, you can see it. Okay, so yeah, maybe I briefly recall what we did in the last few lectures. So the main thing that we did on the second day, so the first day we did review of basics, the second day we saw a very basic task of state discrimination and we considered the question where we have n independent copies of rho and n independent copies of sigma and we characterized in the regime where you fix one of the type 1 error to epsilon to a constant epsilon and you let n go to infinity and you look at what is the type 2 error. You see how quickly it goes to zero and we saw it goes to zero at a rate which is given exactly by this quantum relative entropy. So this gives an operational interpretation of this quantum relative entropy and yesterday we saw a proof of the fundamental property of the quantum which is data processing, right? Which is the fact that if you apply quantum channel it can only decrease or it contracts under quantum channels and we saw a full proof of that and its various interpretations. Okay, so today we'll try to apply these insights to the problem that I discussed at the very beginning as one of the motivations is the problem of channel coding, right? And so I remind you what the problem is is we have a noisy quantum channel, W. Okay, we'll see exactly how I model it and I think there's a problem with the mic again but okay, so we can try but I think I speak sufficiently loud that people can hear me but maybe for the recording it's not optimal. Okay, there's, I don't see anyone in the room upstairs but okay let me try to continue and then we'll see. So yes, so I would like to construct an encoder and a decoder for this quantum channel and in this lecture I'll focus on transmitting classical information, right? So this classical information is going to be modeled as a message that I'll call little s, okay? That will be just, we'll have a label between one and capital M and the decoder is supposed to retrieve this message, okay? And so our objective here is to understand the trade-off between this probability of error, the probability that the recovered message is the same as the message I sent and the size of the number of messages I can transmit. Of course, for example, if M is one, then this will do and so I would like to make M as large as possible while keeping the error probability small, okay? And so my objective here is to understand the trade-off we need to. And okay, so I'll start with special kind of channels and this will be actually the main focus of today is the setting where this channel has a classical input but the quantum, okay? So the input is classical and then there is some noise that happens and the output is genuinely quantum. Okay, so what is the classical quantum channel? So here I, everything is finite so I see my channel as just a list of states, right? So my input set is going to be denoted by the script X, okay? So it's a finite set and the output Hilbert space is, I'm called B and so my channel is given by a collection of density operators WX, okay? One for every input, okay? So the special case of a classical channel in this case is we sometimes denoted with this conditional probability notation where this is the input and given an input X, the probability of seeing Y as output is little Y, okay? And then a typical example is the binary symmetric channel where I have some parameter F and I, so I have as input just a bit, zero one and the output is also a bit and I flip this bit with probability F, okay? This is an example and yeah, just to make it very clear so we can see this, these classical channels as classical quantum channels by just taking a Hilbert space of dimension the output and considering the corresponding density operators which are diagonal in a fixed basis of this Hilbert space I introduced and the eigenvalues are W of Y given X, good? So of course you can go beyond this so just this is an example to show that not all classical quantum channels are classical channels, if W0 and W1 don't commute then we have a genuinely classical quantum channel, okay? So now you're not, it's not working now at all, it's working? Okay, okay so W is a classical quantum channel, I can see it as a special case of a general quantum channel where I start by measuring in the fixed basis X and then I prepare the corresponding state WX, okay? So if I have that my quantum channel W starts by measuring in a fixed basis and then preparing conditioned on the outcome preparing WX then this is the same as a classical quantum channel, okay good? So another thing that we'll be doing in this lecture and that we do a lot in information theory is take n independent copies of a given channel, right? Like we did also for states we studied rho tensor n and sigma tensor n, the distinguishability or how well we can distinguish them will do the same for channels as well is we'll take a setting where I have n independent copies of the channel, what do I mean by this? I mean that the input is the Cartesian product of the input set so the script X to the power n and the output Hilbert space is the tensor product of the output Hilbert space and the corresponding for an input X1 to Xn, so a couple of inputs X1 to Xn the output density operator will be the corresponding tensor product, good? Okay so I hope the definition is clear so now we want to understand how well we can code, okay? So let's introduce some basic definitions so yeah here I introduce some notation so I'll label with this bracket m as just labeling the messages, okay? So which are just our distinct labels from one to m from of size m so for example one to m, okay? So what is a code, right? What is a code for this channel w so I have to give you an encoder and a decoder, okay? So what is an encoder? An encoder the input is classical, right? So it's simple the encoding is just a classical function okay? So it maps my set of possible messages to the inputs of the channel, okay? And what is the decoder? So the output of the channel is quantum, right? So and it's so it's the decoder should be a valid quantum operation which maps the quantum state on the Hilbert space b to the messages from one to m, okay? So what is that? We saw it's a p of m, okay? This model by p of m, okay? So that's the definition of a code for now I didn't say how good it is this is just the definition of what the code is, okay? So it's useful to think of the classical setting, so what does so in the classical setting the encoder is clear but what how do I interpret these these ds's now? So now in this case we may assume I mean there's no need to to go beyond the diagonal and ds can be seen as a diagonal matrix, okay? With entries ds of y, okay? And the way I interpret this number ds of y is the probability of decoding to s if when I see y, okay? So you imagine that my my decoder is a probabilistic operation so it takes as input y and it can toss some coins and as a function of these coins and y it outputs an s. And so of course the the p of m condition ensures that this is a valid probability distribution. Okay so now what are the figures of merit of a code that we we want to look at? So it's the the the main thing is the error probability, okay? So what is the probability that it retrieves the the the message that was intended correctly? And so let's just write that as a function of the encoder and decoder is fully defined now, okay? So I'll denote it as p error probability for an encoding and decoder is just so the average so I'll choose to look at just the average error. So I take the average over all possible messages, okay? Of the probability that when I send s, okay? So when I send s what happens? So I get it gets encoded to e of s and then the output of the channel is this this quantum state w e of s and then I see what is the probability that the output of my decoding operation is exactly s and this is nothing but the trace of the p of e m element corresponding to s times this operator, okay? So this is the success probability so if I want to talk about the error probability I just do one minus that. Okay so I hope this is clear. Okay also a piece of notation I'll often instead of just writing the error probability of e d is less than epsilon I just say it's an m epsilon code, okay? Okay so you might be wondering about something here is why did I take the average over all messages I might want that to get a small error for all messages not just for for on average, okay? And so indeed this is a very valid thing is that you can you can want the the maximum error probability that I want for all messages that the sender tries to send the error probably should be bounded by epsilon, okay? So this is what I call a pr max and it has this expression and so the reason that we look at the average is that it's much simpler to analyze, okay? And also that it's very easy to get from the average error probability to the maximum error probability and this is an exercise simple exercise in your problem session, okay? So yeah from now on again in this lecture I'll focus on this average error probability, okay? So here I put just some basic example so that you get used to the notation. Yeah so let's take the most trivial thing, okay? Which is the the the channel which does nothing, okay? So imagine the noisy channel does nothing, right? So it it takes as input x and it outputs x, okay? So what would this be in this notation that I introduced it would just be that the density operator wx is just the projector onto state x, okay? And then in this case there is a very trivial code which you can which has zero error, okay? And which can send the number of messages which is given by the the size of my set, right? Where I encode e of s to ah sorry this shouldn't be because the the input is classical so I should have said just s. And the decoding p of m ds is just I project onto s, okay? And it's very easy to check in this case that the probability of error or the probability of success here is equal to 1, okay? Good, okay? So that was one end like a perfect channel and the other end is a completely trivial channel where I completely cut the link between the input and the output. Right? So there imagine that for every x the output density operator is just a fixed state rule, okay? So this means that the input is completely independent of the output. This is a completely useless channel and so we don't expect the error probability, we expect the error probability to be quite high and let's just see that with the math. So we have that if I take yeah so this is the output of the channel if I take the sum over all s is in m of the trace ds row I give I get one and so the error probability if you plug this back into the the error probability expression you get there's just one minus one over m, okay? Because we're doing things on average so you can be correct on one of the messages but you'll be wrong on all the others, okay? Good. So, okay so now that I think hopefully the problem is clear my objective is to understand the following. So remember we want to understand the trade-off in this notation between epsilon and m, right? So now I fix epsilon so this is my allowable error probability and now the question is what is the largest m or the largest number of bits I can I can transmit such that there is an m epsilon code for the channel w, okay? So yeah if we want to put a name to this I can call it mopt of a channel w with an error epsilon this is the maximum overall m for which there exists an m epsilon code for this channel. Okay and my objective here and the objective of channel theorem that we'll see is to characterize this quantity mopt of w epsilon as a function of some simple, relatively simple entropic properties of w, okay? And so the important special case where Shannon formulated his famous theorem that characterizes this trade-off is the following special case, okay? I don't look at a completely general channel which can have an arbitrary structure I look at channels of a specific form right that are of this form that are of the form some basic channel think of it as a small channel for example from a bit to a bit and I take n tensor powers of it, okay? So I for example consider the binary symmetric channel and I take n independent copies of it and I think of n is very large, okay? We'll take the limit as n goes to infinity and I fix epsilon to be a small constant, okay? Okay like in the in the like in the Stein setting right that we looked at we looked at a small a fixed small type one air probability and we look at the setting where n goes to infinity and so my objective is the following the following quantity is I want to look at what is the maximum number of messages or I look at it in terms of bits so I take the log of that the log of mopt, okay? And it makes sense to look at this to normalize this by the number of channel uses of course because the more channel uses I have the more different bits I will be able to send so it makes sense to look at this per channel use, okay? So we divide by n and we take the limit as n goes to infinity, okay? And also we'll take at least in this lecture we'll take epsilon going to zero as well but after n and this is important this order of limits is important, okay? So okay so this is what we want to to characterize we want to try to characterize this limit does it have a simple expression and as we'll see so quite amazingly it is it has a quite simple expression at least in this classical quantum case, okay? So let's think before getting into the the actual quantities quantifying this what what is the intuition of what should characterize this mopt, right? So I mean when is a channel good at transmitting information, right? So when is it good it's when we want to say that there is a good correlation between the input and the output right? It doesn't matter if it's if it's renames it or something like this but the important is that there is a good amount of correlation between the input and output the specific names of the outputs don't matter but the it's the correlations between the input and output, okay? So okay so in order to quantify this correlation between the input and output it will be useful to introduce a joint state on the input of the channel as well as the output, okay? And we look at this joint bipartite state and look at correlation measures for for that, okay? So so remember that the input is classical x and the output is quantum b and so I'll define a classical quantum state so for a fixed for when when the input is x then it's natural that the output will be w of x will be the output of the channel on the input x, okay? But we have some degree of freedom we can choose here is what is the probability I will choose on the inputs of the channel, okay? And so this we will see this is a parameter that will that will come back and will optimize over it in in all the different expressions we consider and so yes so for now think of having a probability distribution on the inputs of the channel and depending on this probability distribution I define a joint state for on x and b, okay? So here I just recall this definition of partial traces especially row b will come so I think the average and so okay now I want to characterize this quantity mopt and for that as usual as we did for for Stein's lemma we'll have to show two parts like the converse part the no-go part and the achievability part okay? And so yeah just to give the structure so what we'll do first is that we will handle the one-shot setting the completely general channel, right? So I take a completely general channel w which is not necessarily as an IID structure and you'll see that this mopt is very related to a hypothesis testing question this is why I introduced Stein's lemma and so yeah this mopt is very related to some hypothesis testing question and so we'll give upper bounds and lower bounds which relate this mopt to hypothesis testing okay maybe I can ask you a question here so do you think so hypothesis testing is between two states at least the version I introduced for Stein's lemma is between two states, right? So you have a guess on what are the two states that will be relevant to distinguish between for this problem like so I claim that the channel coding will be related to hypothesis testing between two states okay that of course depend on the channel what do you think these two states will be exactly right yeah thanks so it's the joint state and the product of the marginals okay but what is what is the precise quantity yeah and we'll see it's related to hypothesis testing relative entropy between these two quantities okay so here's the theorem okay so we'll start with the converse right so remember the converse means that if there is a code so if there is an m epsilon code for a channel w then I can construct then or then let's say the the number of messages or the number of bits that this code can encode is upper bounded by some quantity and this quantity will depend on how well I can make the type two error for distinguishing between the two states rho xb and rho x tensor rho b okay okay and so as I discussed so remember this rho xb is not only dependent on the channel it also depends on the probability distribution that I choose on the inputs and so for this reason I take a supremum over all the possible input distributions on for the channel okay so recall that d epsilon h what is it so it's so I want to distinguish between rho xb and rho x tensor rho b okay so I have remember recall I have two types of errors okay I type one error and type two error so I fix the type one error to be at most epsilon and I look at what is the minimum type two error that I can achieve okay and the dh is minus log of this probability of error so it's it's a big number and you can interpret it in terms of bits okay so just a comment here I wanted to make is that so notice that here we're in the what is called sometimes the one-shot setting where the channel w is completely arbitrary okay so it's natural to relate the like to characterize how how well we can do it in terms of what I called one-shot entropies right and not the usual von Neumann entropies that you might be more used to okay and we'll see and this this should be now natural is that now when we take w of a specific kind when we take w to be iid this this one-shot entropy will converge to a relative entropy this one-shot this one-shot relative entropy will will converge to the usual quantum relative entropy okay so let's let's prove this it's just an application of the definitions so okay so it's a no go so we start with a with an m epsilon code ed okay and then from that I want to construct a distinguisher between row xb and row x tensor row b okay so I'll define the following like c okay it will be a set of inputs that's sometimes called the code as well it's just the set of inputs such that there is a message that gets encoded to this input okay and I'll define some probability distribution on x uh on on the on the on script x which is just uniform on this set c okay so notice one thing is that the size of c is at most m right and typically it will be of size m but it could be slightly smaller if I choose my encoding this would be a strange kind of encoding where I pay I map two messages to the same input of the channel okay this is a very bad idea because then of course you will not be able to distinguish them but uh I mean here we're we're doing the general setting so it could happen this is allowed by the definition okay good so okay so I'll I'll I'll I'll define states now so now my if for this definition of px I have a row xb which is one over the size of c times the sum over x is in my code of x tensor wx and what I'll do is that I'll use the decoding for this code to give you a distinguisher between row xb and row x tensor row b so yeah so this is row now and I want to construct a p of m right so f and identity minus f which and f will tell me I'm row xb and identity minus f will tell me I am row x tensor row b so so how do I construct this f so it's quite natural and so but here's the definition so I take the sum over all possible x's in the code of x tensor um on on the on the b side I sum over over all the s's that map s to x okay so all the messages that are consistent with x I sum all of these okay and so now um yes so because ds is a p of m then f is a valid uh operator between zero and identity okay so now I have to check I have to compute the two types of errors uh right so I will compute the type one success probability so if I compute trace of f times row xb um and I just expand this uh you find that uh I get um because I I I forced e of s to be equals to x okay so I I this this expression simplifies to just the average over all s's of trace of ds times w of e of s and if you recall this is exactly the success probability of the code okay and this is by construct by definition of the code this is one minus epsilon okay because we started with an epsilon uh it was an epsilon m code okay okay so the the the type one uh error probability is good okay so at least this this epsilon part is is good now I have to show the relation between m and the type two error probability okay so now let's look at now this is really the type two error probability uh it's trace of f times row x tensor row b okay I just wrote down the definition of f here uh this and uh row x tensor row b remember row x is just the average over all uh elements in the code and row b is the marginal okay and so of course here um uh the x on this part should be the same as the x on this part so I can get the sum over all the x's outside and then what remains is this right so is the trace over all over the sum over all x's in c of the sum over all s's such that e of s is equal to x of ds times not row x now but row b the average row okay so this is a state row which is completely independent of x right and so now I use the fact that ds is a p of m so this thing is equal to identity okay and so now I have just a trace of of row b which is one okay so the the crucial difference between between the two terms here and here is that uh in one case I have uh wx right and in this case I have the the average one okay so that's why in this case it's just equal to one over m okay so again the type two air probability is one over m and so um by definition of uh by definition of of my hypothesis testing relative entropy it is at least minus log of the of the air probability which is just log m here and so of course okay so here what I did is I constructed a px such that log m is is at most d epsilon for this px but then if I take the supremum over all px it's even larger so it's clear that I have this inequality okay is this converse clear good okay so now we that we have shown the converse and how it's tightly related to the hypothesis testing might ask but okay are we losing by going to hypothesis testing or does hypothesis testing between row xb and row x tensor row b characterize in some sense the this m opt and so here we see that it is basically the case up to some small air parameters okay and so here's the here's the achievability statement okay so um I have I fix an epsilon so this this epsilon would be the the m epsilon the epsilon for the m epsilon code I would like to construct okay so ideally I mean if it was if things were exactly tight then I would have that for any m that satisfies that m is not too large without this delta and without this parameter then this would show that the previous result is exactly tight okay so it's not here it's not exactly tight right you lose a little bit in the sense that there is this tunable parameter delta okay which you lose a little bit here and okay so you can say why don't I pick delta equals to zero but if you pick delta equal to zero then this term blows up okay so it's a it's a trade-off between the air probability and the number of messages okay so okay so it's not exactly tight but approximately but we'll see that for example on the iid setting then these won't matter they will they will go to zero and so this characterization will be tight okay good so let me um go over the proof of this if the statement is clear okay so now I should do the opposite I should say that given that I have a good tester between row xb and row x tensor row b I can construct a code actually okay so yeah so this is what I just said um okay I can I can say this from the beginning now is that I'm not sure how many of you have seen already the proof of Shannon's theorem but so as you'll see the proof will not be explicit so I'm supposed right this is an achievability result so I'm supposed to construct an encoder and decoder and so I mean as you'll see in a minute this is one of the very nice applications of the so-called probabilistic method where you choose the object that you're interested in in creating at random according to some well-chosen distribution and then you analyze for example the expectation of some of the quantities you would like to to uh to show to bound okay and you show that in expectation small so it means there isn't such an object okay and so this is so this is just to say that the construction will not be explicit okay so it will be randomized and it says that there exists such a construction okay good so what is this construction so let's uh so okay so for now so as I told you e will be chosen at random later but what I will start by doing is I will start by constructing the decoder as a function of this encoder so suppose someone gives me this this encoder and I will give you a generic construction for a decoder okay and here we'll use a well-known construction that actually was was also used in I think in the first day in the learning theory course is the pretty good measurement okay and so remember what what what is this pretty good measurement so imagine that I have um a set of different states right here the different states I would like to distinguish are is w for e of s right so I have an encoder which maps my messages to e of s and then after I apply the channel I get w of e of s okay and I want to distinguish between these the set of m density operators okay so um one good choice of measurement is this pretty good measurement is to say that I'll take um the the p o v m element corresponding to s I mean I would like it to put weight on w e of s and then but then I have to renormalize it of course so that it's uh it's a valid p o v m so I renormalize it by just the sum of of these operators right so so I want to distinguish between these various operators and so um I write the their sum and if I divide w e of s by the sum then I get something which I get a set of operators which map to the identity okay and here what I mean by divide is I sandwich it with lambda to the minus a half okay so these are very positive operators and it's obvious that the sum to the identity okay so perhaps it's useful to see what is the interpretation of this strategy of this a pretty good measurement uh in the classical setting because this also makes sense right in the classical setting um also in the classical setting imagine I have a bunch of distributions and I want to discriminate between them what is what strategies can I use and so what is the interpretation of this pretty good measurement it has a very simple interpretation it's just saying that um I uh okay so I take I get as I get as input y right so y is a sample from one of these probability distribution and I and I would like to know which one right okay so that what what this pretty good measurement is doing is it saying that I will output s with a probability that is it's it's a randomized strategy right so it's not deterministic so depending on why I'll output an s I will toss some coins and depending on these coins I will output an s and I will output s with a with with a probability that is proportional to w e of that particular s applied on y okay so uh or in other words it's uh was probably w e of s of y divided by the sum overall possible s is a prime here okay so um so yeah so the the the more standard way of discriminating which is uh which is used and sometimes called maximum likelihood is to say that on sample y instead of outputting s with a probability that is proportional to uh what would to the probability I would see why if the input was s uh I instead just output the s that maximizes probability okay so this now would be a deterministic strategy um but this is usually harder to analyze okay whereas this is a bit simpler to analyze so we'll focus on this okay so now given that I have an encoder a fixed encoder um and a decoder let's just rewrite what is the the probability of success or the probability of error so uh yes so the probability of error is uh here again the average overall s is of the probability of uh outputting s prime okay so the probability of outputting s prime is just d s prime of w e of s okay so and now let's write the definition of d s prime um okay d s prime is just w e of s prime divided by uh the sum over all the the w e of s okay and then of course I sum over all s prime different from s okay and so I noticed that I wrote the lambda in a particular way here where I uh I I wrote it as w of e of s plus the sum over all s primes different from s um okay so how do we analyze this expression uh so let's imagine for now that these are scalars okay how can we easily bound this expression um so yeah note that for scalars if I have uh non-negative scalars uh a and b I have I have a times b so imagine this is a and this is b so I have a times b divided by a plus b okay so this is simple to see that this is uh at most the minimum of a and b okay and so okay but now these are operators uh so I would like an analog of this minimum so I'd like a minimum of positive operators non-commutative minimum okay so uh one way to define the non-commutative minimum is in this way okay so here's here's this definition so the minimum of two positive operators a and b okay this is definition okay so it's just one half a plus b minus the absolute value of a minus b um okay so it has a few nice properties of what you would expect of a minimum so in particular it has this interpretation that if you take the trace of a minimum b uh or if you want that this a minimum b achieves the following which you would like is that if you take any operator m her mission operator not necessarily positive um that is smaller than a and smaller than b okay of course there's many such operators but you want to take the best one right so the one which maximizes the trace and uh so it turns out that this operator uh optimizes this so satisfies this it's the one which satisfies this property and it's uh achieves the maximum trace okay so this is a semi-definite program you can write its dual also which is useful uh we'll use this even uh it's the minimum uh for all positive operators between zero and identity of trace of a identity minus lambda plus trace b lambda okay so this is very related if you will see to to the operational interpretation for the trace distance that you have done in the exercise session i think in the second day um because you see that the the the trace distance is nothing but half trace of this quantity right so it's normal that these things are very related okay so um what other properties will we need uh so we'll need this property right this is what we uh we wanted here for scalars uh we want this to hold also for this operator minimum and and it does hold right so if i take trace of a times b divided by a plus b and remember the the definition of dividing is just sandwiching by the inverses the square of the inverses this is at most the trace of the minimum of a and b okay and also another thing that we'll use is that this quantity is concave okay and this now that we discussed yesterday you should see obviously from this expression because it's uh it's a minimum of linear functions that it's concave okay so this easy this thing is a little bit harder to prove um but uh yeah so i refer you to this paper uh for if you want more probably if you want the proofs of these things and further properties uh and how it's used in information okay good so now um okay so now let's let's continue so now okay so what have we done so far so if i now bound this a b divide by a plus b by the minimum of a and b i this is the expression i get i get that the error probability for the specific for a specific encoder e and for the pretty good measurement corresponding to e is given by this so the average of the trace of the minimum operator between uh between w e of s and the sum over all w e of s prime okay for s prime different from s okay so now this is a relatively simple expression as a function of e let's now plug in an e now and and analyze this expression okay so as i uh announced before the way we'll choose e is that we'll choose it at random okay so we'll choose for different s's uh we'll choose e of s independently and randomly but according to which distribution so the natural thing is we have a distribution px which optimizes this expression right so if you remember this expression right so i take a px for which this is the largest okay good so now e is defined my encoder is defined and so as is usual in the probabilistic method what we'll do is we'll analyze the expectation of the probability of error okay the expectation here over the randomness in this encoding okay so what i need to do is is the following is to analyze right so you see here that for different s's this quantity is distributed in exactly the same way right so i i don't need to so it suffices to focus on s equals one okay and so you see this is the reason just a comment this is the reason we look at the average right so if we had a maximum here this would be an expectation of a maximum it would be more difficult to analyze whereas with with an average it's it's obvious sure uh so um okay so e of s think of e of s as a table right so for every s from one to m i have to give you an an element in script x okay so what i do is i just take capital m independent samples from the distribution px okay and so the first output is e of one the second output which is independent is e of two and then e of three is an is a third independent choice etc is that clear okay so um okay good so now i'm at this stage here and so i will choose uh e of one so the the the first element at random px and the remaining elements also at random independently okay so i wrote this explicitly because we'll use it in the next step okay so now what do i do i will use the concavity right in the second argument of this the trace of this operator minimum okay to put this expectation over these operators to inside the trace of the of the inside the trace of the minimum okay so now let's look at what is this quantity so this again right it doesn't depend on s prime right so it's the same for every s prime and the expectation of e of s prime of w e of s prime is nothing but uh rho b right so for okay so i didn't redefine rho xb but rho xb is i have i defined the p of x right p of x is the one that optimizes uh this uh this expression so if i have a p of x and i have the channel i have rho xb which is defined automatically okay so when i talk of of rho xb and rho b i mean this specific one okay and so here what what is the expectation over s prime accord it chosen according to px of w e of s prime it's just the average over x of p of x times wx okay and this is nothing but roll b okay and this is the case for every s prime okay so i have uh m minus one here because i have uh here i have a sum for s prime except one okay so i have m m minus one of them and so i i just multiply m minus one by rho b okay and okay for the other part i just renamed uh e one to just x okay so that i have an expression which doesn't involve e okay and so now actually we're done right so i just have to um so i just put in this extra system x here um so that i write the expression this way right so of course the the so here i just add rho x and here i i put rho xb and so this expression is is the same good okay so um yeah so the the the nice thing here is that i i don't have the encoder and decoder anymore so it's an expression which only depends on my probability and the channel okay so this is the error probability and now i would like to to relate this remember our objective was to relate this to hypothesis testing okay so how will i do this so i will use one of the two expressions this sdp expressions i had which says that this this trace of the operator minimum is given by this infimum over lambda of trace of identity minus lambda rho xb plus the trace of lambda times the second operator okay which is m minus one times this but now this expression is linear in m minus one so i can put out the m minus one it's not here right here it's not i cannot pull out the m minus one here but but here i can uh good and so now you see this is exactly uh hypothesis testing setup right so this is the the the lambda this is the lambda which distinguishing is between my two setups my two states and this is the type one error probability and this is the type two error probability okay and so by now by definition of this hypothesis testing relative entropy i have an operator f a distinguisher which is by by definition in between zero and identity such that the type one success probability is one minus epsilon plus delta and the type two success probabilities two to the minus this relative entropy hypothesis and so as a result right by plugging in this f here right so this expression is at most the same expression if i just plug in f instead of lambda and so the first term here is just epsilon minus delta okay because it's trace of identity minus f times rho xb and the second term is m minus one times the type two error probability which is exactly this okay good okay so so and that's it we're now done actually because if i choose m that is according remember i chose m basically so that m was smaller than roughly delta times this quantity okay so globally this thing is smaller than delta and so the expectation of the error probability is less than epsilon okay so again to recall what i did is i chose e the my encoder at random and i computed on in expectation over this choice of randomness what is the error probability for that code okay when i chose the decoder to be the pretty good measurement okay and in expectation this is less than epsilon so it means that there must exist a code e and d that has a probability which is more than the expectation and so i'm now done okay and so again the statement was of the form just i recall everyone that if m satisfies this if m is not too large okay so if i take m to be roughly this quantity okay maybe i should do minus one so that i'm sure that it's an integer then i have an m epsilon code okay so if i take the floor of this then i get an m epsilon code okay good okay so i'm going a bit slower than what i expected so oh yeah yeah it's actually for the exercise session it's crucial that i do this so okay so we have now a characterization of this log of the optimal number of messages in in general so now we want to say that for some specific cases can we characterize it in a in a nicer way let's say so for now very large channels can i can i find an expression which is nice so and in particular the the important example that we consider is these product channels w tensor n okay so for that as i said before we'll define the classical capacity so the capacity of a channel w is defined as follows so i take the limit as n goes to infinity of log of the number of messages and then i take the limit as epsilon goes to zero okay again this should interpret as the optimal rate for transmitting information over the channel w okay good so okay so one thing which is easy to do given the previous the the one-shot expression so we just in principle have to take the previous expression and plug in the channel of the form w tensor n okay so and okay this is what i do here but okay so so yeah if you do this and look at the previous expression what you'll get is is is an expression of this form so you will get a hypothesis testing entropy with the relevant states okay as a upper bounds and lower bounds for the optimal success probability okay and then if you take n going to infinity and then epsilon going to infinity then you can write an expression for the capacity okay but this expression is a little bit ugly right it has all these limits how do we compute it okay so i claim that okay so here i claim that we have a lower bound and an upper bound so the lower bound is relatively nice but the upper bound has a limit for now and so yeah given that we're out of time i will skip this part but as you can expect we'll use at least in the lower bound we'll use Stein's lemma to relate this hypothesis testing entropy to the usual quantum relative entropy and remember that the quantum relative entropy between a state and the product of its marginals is nothing but the mutual information between the two parts of the state so okay let me skip that for now let me skip that as well and let me just state this theorem and maybe actually tomorrow i will i will go over the proof because i think for those who haven't seen chanon's theorem before it's useful to to see it at least once so okay so the statement that that will finish the proof of tomorrow is the following is that if you take a classical quantum channel then this capacity which i define as the limit of the optimal number of messages is given by this simple expression it's just a supremum over all input distributions of the mutual information between the input and the output okay and evaluated for the usual state row and so yeah and i wanted to to cover this because in the in the problem session you will i asked you to compute this capacity for several channels okay so i'll stop here for today thank you