 So what's the plan for today? So as I announced on the first day, my plan for today is to do the proof of the data processing inequality for the quantum relative entropy. So remember, this was like the fundamental property of distance measures in general. So it's relatively simple to prove for the trace distance, for example. For the quantum relative entropy, it's a little bit more complicated. And this is what we'll see today. And this is really, I would say, it's a central result in quantum information theory. It's used everywhere. And it's, I think, almost every proof of a converse theorem, like no-go kind of result, uses a form of data processing. So remember yesterday when we proved the Stein's lemma. So we had two parts. We had the part which shows an achievability, which gives a strategy, which achieves a given type 2 error. And then we had to show that you cannot do better than this. And if you remember, the place where we use the data processing inequality is when we prove that you cannot do better than the quantum relative entropy. And for that, we used the data processing. And this is basically in almost all information processing tasks that you try to analyze and try to compute the optimal rates for, you use the data processing inequality or a form of data processing inequality. OK, so yeah, OK, so let's get to it. So again, the main objective today is to show this inequality. So I take two states, rho and sigma, or two even positive operators in general. And I apply quantum channel, the same quantum channel, to both sides. And this says that the quantum relative entropy can only decrease. OK, and so the plan for this is as follows, is that first we'll show a convexity result. So this is a function that takes as input two positive operators and outputs a number. And it has a nice property of being convex. We often say also jointly convex because it has two inputs. So it's jointly convex in the two inputs. I'll say in a minute explicitly what this means, but it's the natural thing you would expect. So and then we'll see that convexity is very related to the data processing inequality. So we'll see how we can get from convexity to the data processing inequality, to the fact that when I apply quantum channel, it can only decrease. And I should say that this is a very general statement. And so in general, for a general distance measure, when you have convexity, you can get data processing and the other way as well. I wouldn't say that this is under some conditions. It depends on some particular properties of the divergence we consider. But in general, it's almost always the case that these two statements are equivalent. But you can go from one to the other and I'll present a proof that is relatively generic that allows you to go from convexity to data processing. The other side you can also do again under some conditions. So once we do that, so this is the second point. The third point will be to explore implications of it. And namely, this will be, namely, we'll discuss strong subadditivity. So this is maybe one of the main implications. So it's fundamental inequality on the von Neumann entropy. So good. So now let me, OK, so we'll start with the first point. And so let me state it as a theorem. So I consider this function. So as I said, so that maps rho sigma, so recall the expression for the quantum relative entropy is trace rho log rho minus rho log sigma. And my claim is that this is convex. What the theorem says is that this is a convex function. So notice that, yeah, so it takes two inputs. So what do I mean by convex here? Is that if I take a convex combination of rho 0 and rho 1 in the first input of the relative entropy and the same convex combination in the second input, then this is upper bounded by the convex combination of the corresponding quantum relative entropy. OK, so how will I show this? So I will even show you this in a way that makes it very explicitly convex. And what do I mean by this? I mean that I will write an equality. So I will write the quantum relative entropy using a different expression than its definition that makes it very clear that it's convex. OK, and what do I mean by this? So I mean that I will write it as a supremum of functions that are linear in rho and sigma. OK, so let's go a bit over this expression. The exact details of the expression don't matter for now. But the important thing is that, OK, so it's a supremum over some operators, over an infinite family of operators, Zt1 for every t from 0 to infinity. And then I take an integral from 0 to infinity of an expression. And this is the important part, an expression which is linear in rho and sigma. So the crucial thing that I want to emphasize here, at least at this stage, is that this expression is linear in rho and sigma. So of course, a function that is linear is in particular convex, right? And it's a very simple kind of convex function. And the supremum of linear functions and even the supremum of convex functions is convex as well. So if you accept this equality, then convexity becomes obvious. And this is even a certificate if you want a certificate of the convexity of this function. So yeah, I'm reiterating this point here, is that this expression that I'm giving here in star makes it manifestly convex as a supremum of convex functions. And OK, so I even included a brief proof of this. So if you have a set f of linear functions, right, and you take a supremum over all these functions f in this family script f and of this function applied to the convex combination, then just by linearity here, you get that this is 1 minus pf of rho 0 sigma 0 plus pf of rho 1 sigma 1. And then of course, you can separate the two supremum. So we have a supremum of a sum. This is smaller than the sum of the two supremum. So this is obvious. And this, if you want, you can see it in a picture. This is something you're probably used to. So in 1D, if you take a function that is convex, you can write it as the supremum of these linear functions. OK, see what I mean. So all these linear functions, OK, my function is not very nice. If you consider, for example, the tangents at every point, so these are linear functions. And the function of interest is a supremum of all of these tangents. So in some sense, the expression I'm giving is an analog of this, right? OK, so the other remark I wanted to make is that, yes, for this expression, I mean, I never said what the base of the log is. Usually it doesn't matter because we were only comparing relative entropies. But here, for this expression, it matters. So I take the log base E. If you want to get the log base 2, you just divide by log 2. But yeah, this expression is for this log base E. OK. So yeah, another thing is that this expression is quite nice is in that it allows you also to prove data processing directly without going through this standard proof of going from convexity to data processing. I mean, both proofs are relatively easy, but I'll present both of them just so that you see that this expression, that you see the potential uses of this way of writing the quantum relative entropy. Yeah, so this way of writing the quantum relative entropy is not very well known, usually, but I think it's useful. And I have been using it recently for computational purposes. So I wanted to present it to you. OK. Good. So yeah, so given this, so the only thing I have to prove now is this expression, right? So yes. I thought it was that expression? Yes. Yes, I would actually prove it. I would prove it. Yes, yes, yes. So what I was saying is that the whole proof, actually, of strong sensitivity will be to prove this expression. Yes. Ah yeah, there's a T missing. Yes, you're right, thanks. Let me fix that. Thanks. Other questions? OK, so yes, so again, this is my objective. I want to write this expression. And so yeah, the first thing to note, maybe, I guess you all know this, but just to see the difficulty here, the main difficulty here is that we have logs of operators that are not commuting. So rho and sigma, in general, don't commute. So I don't have that the log of rho minus log of sigma is log of rho sigma inverse or something of this form. OK, so this even doesn't make sense because rho sigma inverse is not a positive operator. But yeah, I just wanted to make sure that you're all aware that this is the main difficulty. OK, so OK, so what are we going to do in order to get this expression? So we would like to have just a single log, right? So having a difference of log is a bit difficult to handle. And so you'd like to have an expression which involves just a single log. And so one way of doing this is by using this simple trick, where you write the trace of a product of two operators, in terms of its bilinear in A and B. So you write it in terms of the tensor product of A and B. This is a simple trick. I'm sure you've seen variants of it. Yeah, it's very simple. You just check that this works. So you write A tensor B. And here I chose the version where you sandwich it with a fixed maximally entangled state or an unnormalized version. So some of our I of Ii were I some fixed basis of my Hilbert space. And here the transpose is in the same basis. OK, so now using this, I will write the recall the expression for the quantum relative entropy is trace of products. So I will just use this expression. And so I write it as my operator A will be just row log row and tensor identity for the first part. And for the second part, I would take A to be row and B transpose to be the log of sigma transpose, which is log of sigma transpose. OK, so now notice that I did it slowly. So I just rewrite it. So as row tensor identity times log row tensor identity plus identity tensor log sigma transpose. And here notice that because the row and the sigma are commuting, I can combine them into a single log. And what I'm using here is this expression. So the fact that log of A tensor B is equal to log A tensor identity plus identity tensor log B. And so now I have written the quantum relative entropy as this row tensor identity times log of row tensor sigma inverse transpose. OK, so now starting from this, OK, we have a single log. What we'll do is that we'll use this integral representation for the log. This is often useful in this area. And you've seen this in the first problem session. This was quite useful in order to prove, for example, the operator convexity and the operator monotonicity of the log. Because you prove the operator monotonicity of the inverse function. And then you use this integral representation to directly get the operator convexity and monotonicity of the log function or minus log function. OK, so here we'll do the same. We'll use even the same expression. So remember, you used exactly this expression for in the exercise session of the first day. So let's use this. And so here I will apply to, so OK, this is valid for scalars, but by the functional calculus that we've seen, right? So for matrices, you can apply function for matrices as well by just applying the corresponding function to the eigenvalues. OK, so the A tensor identity, I just leave it on the side. And now I applied the integral representation to this log, OK? So yes, so OK, so I get, so A tensor identity, the integral of 1 over, I don't know why I put the identities here. OK, so this is just a constant here, right? It doesn't depend on x, OK? So it's a constant function. I can put identity here if I want, OK? Yeah, I say as identity, because remember, we're working on the tensor product Hilbert space, H tensor H, OK? OK, so now what about the second term? So this 1 over t plus x, OK? So the way we can see it is that this is, so x, I replace x by A tensor B inverse, OK? And this is identity if you want. OK, so now I will just rewrite this expression. So I put the A tensor identity inside, OK? And then the A tensor identity here cancels with the A which is here, and so I get an A inverse in the first term, OK? So here it's just I put the A tensor identity inside the integral. OK, so good. So now we're starting to be in good shape. Why? Because, OK, so this part is simple because it's linear in A. So OK, so notice that I, so A will be rho, and B will be sigma transpose. I just wrote it in general because I didn't want to write the transposes all the time. OK, so let's look at this expression. So this part is linear in A, so that's good. This part is not linear in A and B, OK? But this is a very well-studied quantity in matrix analysis, right? It's sometimes called the parallel sum, OK? So and it's sometimes written in this way. So A parallel sum B is just 1 over the sum of the inverses of A and B, OK? And so this is directly related up to a factor of 2 to the harmonic mean, OK? So if A and B are scalars, right? So you might be aware of the harmonic mean of two scalars, OK? So this is up to a factor of 2, an operator version of this harmonic mean. And so this is very well-studied in matrix analysis. So we know very well properties in general of these B. There is a big theory of operator means in general. So you know the geometric mean, for example. There is a generalization of that to operators, which has also very nice properties. And there is a whole theory related to this, OK? So good. So you might be worried because I'm taking inverses here about how about if some operators are not invertible and things like this. So I don't want to discuss this during this lecture. This is one can handle it. But yeah, these are technicalities. I don't want to get into it. And basically all the properties are satisfied, even in the general case, where they're not invertible, OK? But you should see basically the inverse as the inverse on the support, OK? So one of the many nice properties we know of this parallel sum or of this operator harmonic mean is a nice variational expression, OK? And we will use that to get our expression then. So let's go over it. How does it work? So it says the following. So I take two positive operators, E and B. I take the parallel sum, OK? So A parallel sum B. And I apply it to some vector X in my Hilbert space, OK? So then it turns out that this has a very simple expression or a very simple variational expression which is linear in A and B, OK? And so what is this expression? So yeah, I get something linear in A and B. And on what I sandwiched them with which vectors was vectors Y and Z, OK? And the only constraint is that Y and Z should sum to X, OK? So this is the expression. So I even included a proof of it for completeness. I mean, so I say, OK, what is hard in this is coming up with the expression. But once you come up with the expression, it's relatively simple to check that it's the case, right? So it was just a simple manipulation. OK, so yeah, let's go over this quickly. So the parallel sum, you can write it in this way. So the inverse of the sum of the inverses and by some standard manipulation, you can rewrite it as this, OK? So B minus B times A plus B inverse times B, OK? And so then what I will do is I will just write this expression, minus this expression, OK? For an arbitrary Y and Z, that sum to X, and just observe that it's always non-negative, OK? OK, so yeah, this is what I'm doing. So I'm writing this expression, so it was Y. And now Z, I take it to be X minus Y, OK? And I'm subtracting the parallel sum between A and B sandwiched with X, OK? So I'll just write it down. So where does the sandwiching with X appear with B, OK? So with Y, I get one from A and one from B. And then I get the cross terms, which are here, OK? So then the parallel sum term, I just replace the parallel sum with this expression, OK? So yes, so this is what I get, OK? So this cancels with this part, and this is what remains, OK? So now I observe that this is, I can see it as a two norm of some operator by taking half of this operator, basically. So everything is a mission here, right? It's positive, even, right? So A and B are positive operators. So I can take square roots, OK? Here, too, I can do the same thing. And here as well, OK? I'm just starting to write it in a way that will make it obvious that it's a square, OK? By just multiplying and dividing by A plus B inverse, A plus B to the power of half, OK? And then, OK, by combining these two, you see that it's just a two norm, right? So it's a two norm of a difference of some vectors, OK? So X to which I apply some operator, and Y to which I apply some other operator, OK? So this is obviously positive, right? Non-negative. And I also see with this expression that it's obvious how to make it an equality, right? To make it equal to 0. I just have to pick Y so that this thing is equal to 0, OK? And this is just Y is equal to A plus B inverse B times X. No, unfortunately, not here. I don't have much. Oh, you mean to prove it? You could try to prove it in this way, like say that B is identity and then doing the general case, or? Yeah, so sorry, I don't have much insight on it. OK, so now let's use this expression back in our, OK, so we just get back to our expression. Remember, it was this. So the relative entropy to norm sigma. Yes, OK, so I just rewrote the expression I had in the previous slide. And then I just put in the phi inside the integral, OK? And so, OK, so let's look at each one of these terms. So this is rho tensor identity sandwiched with phi. And this is the parallel sum between some operator on rho and some operator related to sigma. And I sandwiched it with phi. Remember, phi is the maximum integral state. OK, so let's compute each one of these expressions. OK, so the first one is easy, right? This is just rho tensor identity sandwiched with phi. This is just trace of rho, OK? Up to this scalar factor 1 over t plus 1. OK, so this is just trace of rho. This is nice. Now what about the other expression, right? So this is a parallel sum sandwiched by a vector. So this is exactly the setting of the proposition that I showed here. And so I will apply to just x being this vector phi. And I will optimize over the y's and the z's. OK, so here I just did that. So again, the parallel sum between rho tensor identity and identity tensor sigma sandwiched with phi is the infimum over all operator z. Now I'm in the tensor product Hilbert space. And so, yeah, my first term is rho tensor identity over t sandwiched with z. And the second term is identity tensor sigma transpose sandwiched with phi minus z. OK, so what I will do now to get to the original expression is I will just write z in a fixed basis. So I'll choose the same fixed basis of h that I took before. And I write z as z ij i tensor j. And I just express each one of these terms. So the rho tensor identity sandwiched with z. So yeah, rho tensor identity I sandwiched with z. You see this identity, I get the delta on jj prime. And so by just reshuffling the vector z into a matrix, I get this expression, trace of rho times z z star, where z is now an operator, which is obtained by just flipping from ket j to a broad j. And you observe that z star is, of course, the adjoint. And so I get this expression. So already, hopefully, you start seeing where this is going. This is, so we already have seen this expression, the trace of rho. This was the first term. The second term is exactly this one, the z z star. And with the other expressions, we'll get exactly the other terms here. OK, so let's do this quickly. So if I take phi itself, then phi with identity tensor sigma transpose, this is also a trace of sigma. Then if I sandwiched i tensor sigma transpose with z, it's exactly similar to the one with rho. I just get trace of sigma times z star z. Now for the cross terms, where I have one phi and one z. So these will be terms that are linear in z, not quadratic in z. And because this phi has no z parameters. And so in particular, with this choice of z that we took here, this will just be trace of sigma times z. And the other cross term will actually be trace of sigma times z star. OK, and so now if I get back to the whole expression, to this expression, and putting everything together, I get exactly what I want. And note that, OK, here I didn't put t in the dependence on z. But notice that the operators for which I'm taking the mean depend on t. So there is a t appearing here. So the optimal z can depend on t. So I have to pick a different operator for the different parallel sums, OK? So yeah, so just putting everything together, we get what is inside the integral has this form. Yeah, OK, here the t was not needed. But OK, it's better to let me put it here. OK, and I conclude by just taking the integral. I'm going to get back this expression. Good. So I hope this was clear. Yeah, I agree that the formula for the parallel sum is a little bit magical. But so the main idea is to reduce this relative entropy by an integral representation to an expression which is related to this parallel sum and then use the known results about the parallel sum. So in particular, the parallel sum is known to be operator concave. And this is one way of showing it was this variable. An explicit way of showing it that it's operator concave is via this variation expression. OK, good. So we have now proved the joint convexity of the relative entropy. And see, this was the proof was not very long, right? So and I think I did all the calculations. I didn't skip any details. So it's relatively short. And again, it's in a way that makes it explicit that it's jointly convex. OK, so now remember our objective was not only the joint convexity, it was the data processing inequality, or sometimes called monotonicity under quantum channels. So if I apply quantum channels, the relative entropy can only decrease. Again, as I said before, also, this is a rather generic argument. If you look at this argument, it doesn't, in any way, depend on the quantum relative entropy. This is very generic. OK, so for example, and OK, so I should say that there are many other useful divergences, as I mentioned. So for example, one of the most known is there is rainy divergences, right? Where it was a parameter where you have a parameter alpha. And so for all of these, this kind of a relation between joint convexity and data processing holds as well. OK, good. So let's go over this standard argument. So OK, so for this argument, it's useful to use the Stein-Spring dilation of the quantum channel E. So remember, I can always write a quantum channel E in terms of applying first an isometry, which I call V here, and then doing the partial trace. OK, so I wrote it here in a compact form that E can be seen as applying an isometry. So this quantum channel V is just applying an isometry followed by partial trace. OK, so the isometry is easy to handle, right? Because it just keeps the relative quantum relative entropy exactly the same. It doesn't change it. OK? So this doesn't change anything. So the only thing I have to analyze is, and I put the calculation here, it's just using the definition. And so the only thing I need to analyze is the partial trace. OK, so I need to show that when I apply the partial trace to some joint state, the relative entropy can only decrease. OK, so yeah, this is what I'm saying. So here I'm saying I take two states on B and E, rho and sigma. And what I would like to do is to transform this rho BE into the marginals, rho B and then sigma B. Of course, in such a way that I use the convexity result that I proved before. Right? OK, so the question now is, is there a simple way of mapping rho BE to just rho B? OK, and more specifically, we'll do it to rho B tensor the identity on the E system. OK, so if I had that, then the relative entropy between rho BE and sigma BE is just this. And the relative entropy between rho BE tensor identity and sigma BE tensor identity is just because of the additivity of the relative entropy under tensor product, this is just equal to the relative entropy between rho BE and sigma BE. OK, and so the question is, what kind of map can we use in order to do this transformation? And so you might be familiar with this. This is sometimes called a quantum one-time pad, if you want, because you erase the E system, you make it uniform. So there's a simple way of doing this by taking Pauli operators, or if you're not in a set of qubits, there's generalized Pauli operators, which is the x just shifts my fixed basis, my computational basis by 1, so k maps to k plus 1, and the z adds a phase. And so I can define this quantum channel, which is by applying a z followed by an x to some power m. And I guess many of you would have done this calculation so far, would have done this calculation at some point, is that it's easy to see that this map, this quantum channel, what it does is that on k, k prime, you get 0. So on the half diagonal terms, you get 0. And on the diagonal, OK, so why is this? It's just because the sum over z of these phases, if k is different from k prime, the sum of these phases is equal to 0, except if k is equal to k prime. So if k is equal to k prime, now you let x act, like the Pauli x act. And then if you apply a random shift to a fixed operator on the diagonal, you get the identity. Good. OK, so again, what have we seen here is that if I take my operator rho b e and I apply this depolarizing map d on e and do nothing on b, what I get is rho b tensor identity on e. And same thing for sigma, of course. OK, and now remember, I want to use the joint convexity. And so how will I use it? So this channel has a very simple form. It's a mixture of unitary channels. So I pick l and m at random. And I apply a unitary, which is the Pauli z fold by the Pauli x to the power m and l. So I can use the convexity here. OK, so let's see it explicitly. So now I will use convexity to establish the following inequality. So here I have the relative entropy of some mixture and the same mixture here. And I bound this by the mixture of the relative entropy. And so this mixture is nothing but what I introduced before is I apply to the state rho b e. I apply the randomly chosen Pauli on the e-system. And as we've seen by the previous calculation, this operator as a whole, if I take the average, it's equal to nothing but rho b tensor identity. OK, and same thing for here, it's sigma b tensor identity. OK, so now let's look at the other side of this inequality. So I have now an average. The average is now outside of the relative entropy. And now I have a relative entropy between rho b and sigma b, but to which I apply this fixed Pauli l and m. But now this is not really what I was interested in. I was interested in the relative entropy between rho and sigma itself. But then these two are equal because x and z are unitary. So and we saw before that applying a unitary to the same unitary to rho and sigma keeps the relative entropy the same. OK, and that's it, and that concludes the proof. So yeah, we've seen that. So this side is just the relative entropy between rho b tensor identity and sigma b tensor identity, which is nothing but the relative entropy between rho b and sigma b. And the right-hand side is the relative entropy between rho b e and sigma b e. So we've shown that if you take the partial trace, the relative entropy can only decrease. I hope this was clear. OK, good. So I told you also that we'll see somehow more direct proof of. So you see, this is very generic. So there is nothing I used about d except a few properties, like invariance under isometry and joint convexity. You see, if I replace d here by any divergence which satisfies these two things, this proof goes through. So if I have convexity, I get to data processing. OK, so now I'll look at another alternate proof of the same thing, except that it will be specific to this expression. So I will try to tell you how that just by looking at this expression and doing some very minor manipulations, you get directly the data processing. OK, so remember, this was the expression. So the relative entropy looks like this. So it's the supremum of an integral of things which are linear in rho and sigma. So now I apply e to rho and sigma. So I apply this operator e to rho and sigma. And so the natural thing to do here is to apply the adjoint on the other side of the trace. So the adjoint of e, I apply to this other operator. So I will apply the adjoint to identity and the adjoint to zt star, et cetera. OK, here I again forgot the t here. OK, so now we only need a small claim. Is that so e star, adjoint of e, apply to identity is identity, right? Because this is e star is unital. OK, so the only thing which is a bit tricky to handle is this thing. So it is the adjoint of e applied to a product of two operators, zt, zt star. And of course, what I want to do, right, what do I want to do is I want to relate this to the expression for rho and sigma itself, not for e of rho and e and sigma. OK, so what I would like to do is to relate this thing to a product of two operators. And here also this thing, these are the difficult parts. But this is a well-known property, a simple property of quantum channels. Sometimes channels that satisfy this are sometimes called Schwarz maps. It's the following. It says that if I apply some unital completely positive map to zt star, this is lower bounded in the positive semi-definite ordering by f of z star times f of z. And then, to see this, it's relatively simple. You just use the complete positivity for some positive operator which is constructed from z, z star, and z star, z. OK, if you're familiar with Schwarz complements, this is exactly what is going on. OK, so now what do I do? So I just use this inequality, apply to e star. And I write yt to be just e star applies to zt. Now you see that I get e star of zt, zt star is lower bounded by e star applies to z star times e star applies to z. And so because I have a minus sign, then this becomes an upper bound. And so by just using this inequality and plugging it in, I get this expression. So here I get yt, yt star. And I do exactly the same on the other side. So the identity is just by unitality. Here it's just linear. So I don't have to use this Schwarz inequality. But here I use it again, but just inverting z and z star. OK, but this is nothing but the original expression for the relative entropy. OK, I just called yt equal to zt. OK, so again, I wanted to show this to show you how useful it is, let's say, to have such expressions which are linear in row and sigma. Good. OK, so the last thing I wanted to discuss today is an implication of this data processing inequality to just entropies. So forget relative entropies. So many of you might be dealing with just von Neumann entropies, but not really relative entropies. So let's see what are the consequences. And yeah, so this is sometimes called strong subadditivity of the von Neumann entropy. So what is the way of writing it, which justifies this name of strong subadditivity? So subadditivity would be the version without conditioning on C. So if I don't condition on C, it's just saying that H of A plus H of B is at most H of AB. And the strong version is when you condition on C. OK, so sometimes you can write it in this way. This makes it obvious why it's called strong subadditivity. Just by basic rewriting, this is equivalent to saying that if I condition on an additional system, so imagine I look at the entropy of A condition on C, and now I condition on an additional system that I call B here, then the von Neumann entropy can only decrease, which is natural. And this is again equivalent to some other quantity, which is the conditional mutual information being non-negative. So these are all rewritings of the same thing. OK, I'm not trying to define the conditional mutual information in general setting. It's just equal to the difference between these two quantities, just by definition, the difference of these two. So just briefly, why is it the case that the data processing inequality implies these three inequalities? It's immediate, but let's go through it. So if I compute the von Neumann entropy of A condition on BC, this is, remember the definition, it's minus the relative entropy between the joint state on ABC and identity on A tensor, row BC. And what is the version where I condition? Ah, I don't know. OK, it's a symmetry, but it doesn't correspond to here. Maybe let me. A plays exactly the same role consistent with what I wrote here. Let me just correct this. And this is fine. I think all is good. OK, so, ah, yeah, thanks. OK, so, yeah, now written in this way, it's obvious how to go from here to here. I just need to partial trace B. OK, so if I partial trace B, I get from here to here. And this quantity is smaller than this quantity. And so there was a minus sign. It's the opposite. OK, good. OK, so how to see that two implies one. This is, again, very simple von Neumann entropy manipulation. And I think you've seen a bit of that in the exercise session yesterday. So, yes, I can write, remember, by definition, if you want, or by the simple properties of the log, H of A conditioned on BC is equal to the joint von Neumann entropy minus the entropy on BC. And I can add and subtract H of C. And so this gives me H of AB conditioned on C minus H of B conditioned on C. And so by combining, yeah, so this is exactly the same as, is exactly equivalent to this inequality. I should say that this equality is often quite useful. It's called the chain rule for the von Neumann entropy. And, yeah, so it allows you to decompose the entropy, why is it useful? And even though it's sort of an obvious identity, is because it allows you to decompose the entropy of a joint system, AB, as a sum of the entropies of the parts, right? So the entropy of B conditioned on C plus the entropy of A provided you condition on both B and C. This is often a useful tool. OK, and, yeah, concerning the third one, as I said before, the conditional mutual information is anyway defined as the difference of these entropies. So in this case, obvious that these are all equivalent. OK, good, so that ends what I wanted to say for today. So let me just briefly recap is that I hope you have seen that the data processing inequality, it's not too hard to prove it. And it's related to its joint convexity, and that it is a sort of a general inequality that can be used to get the useful inequalities we know about for the von Neumann entropy. And so I'll stop here for today. Thank you.