 Anything from it, it's the Hilbert transform. So it plays a special role. So this kind of derivation that goes with the Hilbert transform, which we ran into several times before. This is our guy. So remember we had this difference quotient derivation, delta j, which, well, if you like, it's defined by saying that when you differentiate the variable xi, you get 1, 10s, or 1. But another way of saying it was the, so this was ij from 1 to d, if you have d variables. But in the case that d is 1, you can think of it as a map from polynomials in one variable to polynomials, commutative ordinary polynomials in two variables. So this dj normally is a map from polynomials in d variables to the tensor power of that algebra. And in this case, it's just a difference quotient. It's taking a polynomial p to this function of two variables, which of course is also a polynomial. Now, what we've frequently encountered are equations of this kind. That when you evaluate tau on something, xj multiplied by q, then we get tau tensor tau of dj of q. Another way of saying it is that this xj is simply the adjoint of this derivation del j applied to 1, 10s, or 1. Because of course our dot product xy is just tau of xy. That's in the case that x and y are self-adjoint. And so if you look at tau of xj, q, this is just the dot product of xj and q. And to say that this is, on the other hand, tau tensor tau of 1, 10s, or 1, sorry, of dj, q, you can think of it as 1, 10s, or 1 dj, q. And so now if you equate it like this, then you see why I'm saying that this is like applying the adjoint. Is there a question of somebody simply dying a slow death? That's Ian. All right. So we had several examples of this, as I mentioned. One example was the semicircular law. That was the case where xj was just equal to xj. And in fact, in that case, this equation was simply the characterization of the semicircular law. Another example that we briefly touched upon is limits of random matrix models like that. So you choose matrices at random with respect to some measure that looks like that. In that case, we also had an equation like this with cj being what was called the cyclic derivative of v. And so that was under some assumptions on v. Now this object in free probability theory is an analog of a classical object. And this classical object is called a score function. Namely, if you have a classical probability measure with density rho, it's just the derivative of the log of rho. I mean, it should kind of remind you of this. If rho is e to the minus v, log rho is more or less v. So when you take the gradient of it, you get this derivative of v back. It's also, of course, just being the logarithmic derivative, just that. And it plays an important role in classical information theory. I mean, the L2 norm of this guy, which, of course, is the integral of this thing against rho of x, although usually when you look in the book, they, of course, cancel one of these rho of x. So it will be the absolute value of rho prime squared over rho, is up to a constant, the derivative. This is called efficient information. And up to a constant, it's the derivative of entropy under a Gaussian perturbation. So if you take your rho and you convolve it with a Gaussian of variance t, and then you compute the entropy, which is that, the derivative of entropy under this perturbation, and maybe I messed up the factor of 2, ends up being this efficient information. All right. Now, in cases that we looked at, this dj of p, this derivation was, in fact, given as a commutator. So this was the way we proved that it exists in the semicircular case. And this rj, in our cases, was bounded. Sometimes it can be unbounded. And notice that this equation actually defines rj almost uniquely. I mean, you can perturb it by anything that commutes with everything in your algebra. So it's defined up to something that's in the commutator. And there's some choices. I mean, in the semicircular case, we often make the choice that rj adjoint kills 1. And then rj1 will be our cj in that case. Or you can sometimes make the choice instead that rj is self adjoint, in which case rj is, gives you, I think it's wrong. I think twice cj or something. Then the shift 2 is on the right side. And in the classical, sorry, one-dimensional case, when d equals 1, I mentioned that the derivation is just that. But this you can write, if you think about it, as a commutator between p and something that has the integral kernel 1 over s minus t. And that's, of course, the Hilbert transform. So you can write it this way. And then from that, you recover that this c is just the Hilbert transform of the density. This is just that equation there. So this variable cj, it has a name. It's called the conjugate variable. And more generally, we say that x1, xd are, I just coined this for this lecture, by the way, the word freely differentiable. So there's some smoothness in the underlying law of these variables. If these partial derivatives, these non-commuted difference quotients, are actually closable. So they're defined a priori on polynomials. But if, as unbounded operators, they're closable, then we will say that the d tuple is differentiable. And it's not a very difficult exercise to check that, in fact, in the one variable case, this differentiability amounts to saying that your law is non-atomic. If the law is non-atomic, then this difference quotient actually defines a closable map from some domain to L2 tends to L2. All right. Now, in the case that we have this equation, we call these cj's conjugate variables. And there's a little lemma which says that this is a sufficient condition for differentiability. So if these cj's actually exist, then we're in the differentiable setting. And the reason for it is very simple. I mean, you know that if you have an unbounded operator t from some domain to a Hilbert space h, then t is closable if and only if its adjoint is densely defined, right, where the domain of the adjoint is simply all vectors c with a property that the map, say, zeta into txzeta is bounded. So it's a classical story, right, that the closability is equivalent to dense definition of the adjoint. And now the fact that this DI is a derivation allows you to actually write down a formula for what happens when you apply DI to a primitive tensor, a tensor b with a and b, anything in the domain of d, so for instance, polynomials. And then there is a formula like that. And well, because of that, you see that a tensor b for any polynomial a and b is in the domain of d star. And these things are dense. And L2, so actually it's densely defined. So actually DI is closable. Now as it happens, semicircular perturbation is a good operation for these sorts of things. And the semicircular perturbation of any d tuple is always differentiable. These conjugate variables immediately start existing. All right, so I wanted to show just a quick example of how you apply these things in functional analysis or in operator algebra. So I apologize in advance for those of you who don't care about this problem, but people that do phenomenon algebras kind of do. So I just wanted to show you as a sample. This is a very simple proof just to give you an example of what kinds of things we're interested in. So the setting is that I have this x1, xd. And remember, they generate something which is called the phenomenon algebra of x1, xd. So this is, if you like, what would be the noncommutative analog of all essentially bounded measurable functions of x1, xd. And the question that you may want to ask is, what is the center of this guy? So what things are there that commute with everything? Well, of course, there are multiples of identity, right? Multiple of identity commute with everything. But is there anything else? And so the statement is that if we have a differentiable d tuple and d is at least 2, I forgot to write it, because otherwise it's wrong. Of course, if you take only 1, the algebra is a billion. But if it's a d tuple with d at least 2, then it's freely differentiable. Then actually, the whole thing is a factor. So here's the proof. Suppose that we have something in z, and it's in the center. Well, if it's in the center, it's got to commute with everything in the algebra. In particular, it has to commute with x1 and x2. In fact, that's all we'll use. Now, what I want to do is I want to differentiate one of these commutators. But the problem is that although x1 and x2 are in the domains of my derivative, it's not clear that z is, right? The operator is only closeable, so its domain is not everything, a priori. But in this nice case, because of special properties of these operators, you can use the theory of Dirichlet forms to regularize things. So you write down a kind of Laplacian, which is d1 star d1. And then this Laplacian is a nice positive operator, and so it has a canonical closure and all that. And so using that, you can define a kind of a smoothing operator. So this delta is this Laplacian, and you look at alpha over alpha plus delta, the power 1 half. So because delta is positive, this is always bounded. But when alpha goes to infinity, this converges to identity, right? And but the nice thing is that somehow the derivation composed with the smoothing operator is now always globally defined. That's not a very, very hard fact to check. So you call this new thing called d1 alpha. It's a smooth version of d1. Now d1 alpha is no longer a derivation, but because d1 kills x2, it's one of the things that we know about these different quotients, anything that's in the kernel of this delta is untouched by this regularization. So this thing again will have the derivation property over x2. So it will still have this property with respect to x2, the Liebditz rule. Of course, keep in mind that d1 of x2 is 0. So now we apply this to the fact that x2 commutes with z. We have the right to do this because now it's a bounded operator. So what do we get? We get x2 commutator with d1 alpha of z. But where is this d1 alpha of z? d1 alpha of z is something in L2, tensor L2. Because we know that free differentiability happens, we know that x2 has diffused spectrum. So suddenly, we have something in L2, tensor L2, which you can think of as a Hilbert-Schmidt operators. So you have a compact operator, namely this guy, commuting with an operator that has diffused spectrum. Boom, the operator is dead. So this d alpha of z has to be 0. But since d alpha of z has to be 0, we can remove the regularization and conclude by closeability that actually the z is in the domain, and moreover, is killed by d1. Excellent. So now we can apply d1 to the first commutator. What do we get? Well, z is killed by d1. And x1 is differentiated into 1, 10s, or 1. So by Liebditz, we get this. And now you apply tau, 10s, or 1. One of these z's gets replaced by tau of z. The other z's stays the same. And you get the tau of z is z. So it's a scalar. So a very, very cute proof. OK. So if somebody asks you to prove that the phenomenon algebra of two semicirculars is a factor, there you go. Of course, that's not the earliest proof of it. OK. Let me prove also with knowing a little bit about this differentiability. Let me prove for you the subordination fact. So remember, the subordination fact told you that when you do free convolution, is there a free eraser? Oh, this board may kill somebody by falling on them. But hopefully it won't be me. So remember, we had the following things. If I have x and y free, we looked at the Cauchy transform of, so this had low mu 1 and this had low mu 2. And we said that if you look at the Cauchy transform of the free convolution of the two things, this is just the Cauchy transform of one of them evaluated in a different place. Now, just to decipher what this means, of course, g mu 1 of z, this is just the trace of the resolvent of x. And similarly for mu 2, so what this is saying is that the trace of the resolvent is actually the same thing as the trace of the resolvent of x, but evaluated in a different place. Hope I got the brackets correctly. No, I didn't. OK. All right, so this lemmo voiculescu philosophically explains why on earth all these resolvents occur in free probability. What is so special about resolvents? And the statement is this. Suppose I have a variable which is freely differentiable, then of course when I apply my difference quotient to a resolvent, I get a tensor product of two resolvents. We actually used this once already. But conversely, suppose I have some function in L2 of my variable and that function satisfies the same equation. That is, difference quotient is just the function tensor itself. Then the function must be resolvent. So it's a beautiful fact that characterizes resolvents. If you like, this is similar to saying that if I give you the differential equation d over dx of f is equal to lambda f, this characterizes f equals to e to the lambda x. So these functions are the important functions for differentiation. Well, this is saying that resolvents are important functions for this kind of free difference quotient. Well, the proof of this is fairly easy. Suppose I have that my function is satisfied. One way is easy. I mean, it's just a very, very simple algebraic manipulation. Let's go the other way. Suppose I have some function f that satisfies this. So plugging in what the difference quotient is, it tells you that this difference quotient is equal to f tensor f. So in other words, it's equal to f of s times f of t. So now I can multiply by s minus t to conclude, I know, differentiability, right? So my measure is non-atomics, it's not a problem. I know that f of s minus f of t is f of s times f of t times s minus t. OK, now you do a little bit of work to prove that actually this implies that f is continuous. And moreover, it tells you, therefore, that f at any s is just f at some fixed point t0 times that. And this is just solving the equation. And now if you, I think there's a missing equality here. But if you insert the missing equality, you see that actually it is of this form, w minus s inverse. I mean, it's just solving for what f of s is. So from that, you see that this function has to be of this form. And of course, w cannot be in the spectrum because otherwise this should be w. It cannot be in the spectrum because, of course, we know by some work that f is continuous. So therefore, w cannot be equal to s. So the short point is that this equation characterizes results. So now let's do the subordination. In fact, we'll prove something stronger. You know, any time you have a phenomenon algebra with one of these traces, if I have a phenomenon algebra generated by, say, x and y, I have a phenomenon subalgebra generated by x. And I have always a map from one to the other. I have a kind of a projection map, which in classical case corresponds to conditioning. It's a conditional expectation. So the statement actually is this. If I take the resolvent of x plus y and don't take the trace of it, but simply project it onto the phenomenon algebra generated by x, then I get a scalar. A priori, this is an operator, but I get a scalar. And the scalar is just that. It's the resolvent of x evaluated at some other point. And so, of course, this tells you that it's scalar. So therefore, I can apply tau on both sides. So it tells you that this is that, which is exactly the subordination statement that I have written here. My lease is puzzled. I may have written something wrong. I mean, I'm barely reading what I've written. And I'm kind of trusting my late night writing. Anyway, so the idea is that you, first of all, you check a little identity. This is trivial, actually. If you start with some function and you condition it onto x, so you project it onto the algebra of x, and then you differentiate it, that's the same as differentiating first. And then conditioning the result. Not too bad, if you think about it. All right, the other thing that you notice is that when you restrict differentiation in x to the algebra generated by x plus y, this is the same thing as differentiating with respect to x plus y. This confused me for about half a year once, but I think I survived. The point is that what is the definition of dx plus y? Well, it's the unique derivation having the property that d of x plus y applied to x plus y is 1 tends or 1 plus Leibniz. Well, what happens to dx? When I apply it to x plus y, when I differentiate y, I get 0. When I differentiate x, I get 1 tends or 1 plus Leibniz. So of course, they're the same on the generator, so they're the same everywhere. And now using this, you see that if you look at the algebra of x plus y, if you differentiate with respect to x on that algebra, I mean, you have that equation. But since we're working on the algebra of x plus y, I have the right to replace dx by dx plus y, because they're the same differentiation. So therefore, I have this variable here, this thing here. And now you just apply it to this function, fz. So if I look at fz, which is what's written there, I know that dx plus y of fz is fz tends to fz, right? Because, well, it's a resolvent, and we're differentiating with respect to x plus y. So plug that in here. And you see that from that you see immediately that this dx of fz, sorry, of w star of x of fz is equal to e w star of x, no, just fz, right? The dx, actually, that's what I'm saying. Sorry, I'm saying it wrong. What I know is that dx plus y of z minus x plus y inverse is equal to z minus x plus y inverse tensor itself. So now, because of that, when I apply it to fz, I get that this is equal to fz, tensor fz. So from that, I know that fz must be some omega of z minus x inverse, right? And so what? Yes, you're right. Yeah, yeah, yeah, you're right. Yeah, yeah, OK, yeah, you're right. Yeah, so you have equality without the traces. And then when you apply traces, you get the equality of these two things, sorry, you're right. So you get this. And then, of course, because the whole thing, of course, obviously analytically depends on z, you get that this omega has to be analytic. So this proves it for the case where x and y are differentiable. But by this little regularization trick, you can actually extend it to an arbitrary pair of x and y which are differentiable. So that's subordination. So again, the key thing is this fancy characterization of resolvance here. Because the image of this projection lives in the algebra of x. Oh, did I write it wrong? I know it's correct in the notes. I may have copied something wrong. What did I do wrong here? So I'm too tired to think about what I'm doing. So no, no, no, it's not that. Actually, what I think I need here is just x. Yeah, I think I need x here. Yeah, then it's correct, right? Yeah, so this is just a typo. This should just be x. And that actually is what I want to compute ultimately, because I want to compute dx of fz, which is that. Sorry about this. So this y is just there. Yeah, OK. All right, now, so these were just two applications of our superpower, this derivation. So I wanted to, you know, I had not so much time left in the whole course. Of course, I cannot talk about everything. So I thought I would just concentrate on one story, which is the story of free entropy, which also connects very well with random matrices and goes into a lot of current work and very interesting questions, and also connects to a number of lectures that happened here. So I'll just do that. So I told you before that there is a classical formula that the Fisher information, the classical Fisher information, can be written as the L2 norm of this classical score function. So in one approach to free entropy, Kolesko copied that, and he just defined the free Fisher information as that. So you define it to be this, if these conjugate variables exist, in other words, it's plus infinity. Now, there's a very nice formula related to semicircular perturbations that tell you that if you actually start with a semicircular, if you're interested in the conjugate variable to a semicircular perturbation of x1, xd, then not only does this variable exist, but in principle, you have a formula for it. Namely, you have to simply take the conditional expectation onto the algebra generated by this perturbation of just the perturbation itself. It's related to the fact that sj, being semicircular, is its own conjugate variable. Now, there's a price to pay, which is you have to divide by 1 over squared of t. So as t goes to 0, even though this term remains bounded, this thing can blow up. And well, whether it does or does not is the question of your specific law. Anyway, so that's just a little remark for later. So one consequence of this is that you have an estimate on the Fisher information of such a semicircular perturbation. It's always finite. In fact, the entropy is at worst 1 over t, or d over t, if you have d variables. And so then, Vojkulesk could define free entropy by simply forcing the derivative of a semicircular perturbation of free entropy to be given by Fisher information. And if you think a little bit about it and notice that semicircular perturbations are form a semi-group, if you perturb something by semicircular of variance squared of t times a semicircular and then perturb further by squared of s times some other semicircular, that's the same as perturbing by one semicircular with squared of t plus s in front of it. It's just like Gaussians, right? So because of that, you can compute that actually the derivative of this free entropy is given by Fisher information. And you have a number of nice properties. So for example, if you look at this kind of relative entropy, if you look at chi star minus the quadratic variation, this is maximized precisely by the free semicircular d-tuple, or n-tuple, d-tuple I guess in this case. You have a characterization of freeness. If all the entropies here are finite, then the entropy of a d-tuple is the same as the sum of the entropies precisely when the variables are free. And in the case that d is 1, so you have only one variable, there is actually an explicit formula for what happens. You use the fact that you have an expression for this conjugate variable in terms of the Hilbert transform. And then what you get is this logarithmic energy plus the universal constant. So that's very nice because of course logarithmic energy, as you saw, is intimately involved with random matrix models in one dimension if you have one random matrix, one self-adjoint random matrix. So all of these properties are not so hard to get from properties of the Fisher information, more or less this formula plus a little bit of stuff. Now, there is a competitor definition of free entropy. And this is the so-called microstates free entropy. It's also due to Vojkulesku. I should have written his name as well. And it proceeds in a different way. It's kind of rigged to look like a large deviation principle. It's not written as a large deviation principle. It's not exactly equivalent to one. But it's sort of the flavor of what a large deviation principle is supposed to say. So let's try to parse it. At the onset, you're given some d-touple of operators, x1 through xd. And then what you want to do is you want to try to find n-bion matrices that look in law like your d-touple. Now, what does it mean to look in law like that? You pick a certain weak neighborhood of your law. So you specify a degree of approximation, m and epsilon. And what you want to say is that if you take any monomial of degree at most, m, then the normalized trace of your monomial evaluated in your matrices is approximately the trace of your monomial evaluated in your operators up to an error of at most epsilon. So you're trying to model the noncommutative law of your operators, x1, xd, by matrices of finite size, of size n by n. And then, well, what you do is you measure a kind of logarithmic rate at an appropriate, so logarithmic, at the rate of 1 over n squared, a logarithmic understanding of how likely this happens. So what you do is you take the Lebesgue volume of this set of matrices, and then you take the logarithm of this volume. You rescale it by 1 over n squared. n squared, dn squared, by the way, is the dimension of all matrices, because we're looking at d-touples of self-adjoint, I've got to say self-adjoint, d by d matrices, n by matrices. And this factor, n over 2 log n, is added on just for the fact that if you have a ball of finite size of finite radius in this thing, then the ball will have a certain volume, which is not exactly like 1 to the dimension. There'll be some constants that actually have some asymptotics. And to kill those constants, you have to add this n over, should be d, by the way, d over 2 log n, sorry. OK, so that's the definition of free entropy. Now in the case of a single matrix, you actually have a quality with the previous definition. In particular, you see that this thing here is just the logarithmic energy. So why is that? Well, this is kind of the usual story that if you have a single matrix x, which you can write as u times lambda 1, lambda n.