 Let me just very briefly remind you what we concluded in the last lecture. So in the last lectures this morning, we came up with an ansatz to describe quantum many body states, which was based on the area law behavior of the entanglement. And the idea of this ansatz was that we would take entangled pairs, then group them up, and then on these group of the entangled pairs apply some linear maps, which would be just a construction to take some auxiliary system of dimension capital D, another one of dimension capital D, and to make a physical system of some dimension small d. And what we saw is that if we worked out the math behind that, we could rewrite this as some state where the coefficients in some computational basis were given by a product of a set of matrices where each of these matrices basically would be describing the action of one of these maps p. And this is why these states are termed matrix product states or in brief MPS. So before going on to describe some of the properties of these states, let me briefly also introduce a third way of describing these states. And the reason why I introduced several ways of denoting describing these states is because depending on the situation, one or the other turns out more useful. That's kind of the mathematically in some sense most clear or direct way to write them. This here, I will argue later as well, can give us more insight from a quantum information intuition because we can use these entangled pairs to do things like teleportation. And this way try to think about the structure of the state. Now the last notation is basically a graphical calculus for writing these big sums. This is a huge sum over all these indices. And actually there are even more sums. We are just hiding them because each matrix product itself is a sum over the index which we multiply. So it's a very complicated sum. And in order to re-express this, we will use some graphical calculus. So basically what we want to do is we want to express this expansion coefficient, which is some complicated object c I1 to In. And this is basically some tensor, some object with n indices. Indices I1 to In. And what we will do, so that's a tensor network notation, what we will do is we will say, OK, this tensor with n indices, we will denote as some box with n legs. So each index corresponds to a leg which sticks out of that box. Then these a's here, well, for a given i, this is a matrix, so a two index object. But i is also an index. So each of these objects on a certain side s, we will also denote as a box with an index alpha, an index beta, and an index i s. So again, each of these indices just denotes one of each of these legs, just denotes, depicts one of the indices of the tensor. The point is now the following. What we want to write is a summation over these indices. So if you multiply two of these matrices, say we take a1, i1, and a2, i2, and multiply over this index beta here, the way we will depict this graphically is that we will take the box for a with indices alpha and beta and i1, the box for a2 with indices beta, gamma, and i2. And now what we will do is we will connect this leg graphically. And connecting this leg means exactly what is being done here. It means that we identify that index. So we say the second index of this guy should be equal to the first well. So the kind of right matrix index here should be equal to the left matrix index here. And we are summing over it. So connecting legs is equivalent to identifying the index and sum over it. So we will see in a moment why this is actually very powerful. But you can already see one of the uses is that we don't have to write all these summations and all these indices. We can at some point just stop even putting labels on the indices if they're just indices which are being summed over. Kind of similar to what we did here because we don't really need to know how we label them. So if we look at this whole expression, what are we doing here? Well, we have all these guys. If we want to know what this whole coefficient is, we have a1 with the left and right matrix index and the index i1. We have a2 with the two matrix indices and an index i2. And now what we do is we do the matrix multiplication. So we connect this index. Now comes a3. And again, we do the same. We multiply the matrices, which just means we connect these indices. And we continue like that until the last side. So we just connect all these indices. And then because we take the trace, we have to identify the left index here and the right index there and sum again. So what we did is basically we took this single tensor which has many indices and decomposed it as a network of tensors, which are much simpler, right? Which are much lower complexity, much less parameters. And that's why this is termed a tensor network. And what we see later on that actually, if we want to do calculations with these states, that's actually a very powerful formalism to write in this graphic language. But again, it's just a different way of writing this formula. If you prefer writing formulas, this might actually be a more useful way of working with it. And again, this also kind of looks a bit more resembling of this one. And indeed, that's exactly what's going on, the fact that we identify these matrix indices. And sum over them, if you remember, the derivation comes exactly from the fact that there's a maximally intended state here, which exactly means that we have the state here and the state here being equal and being summed over all combinations of them being equal, right? That's just something like sum over i comma i. OK, so before I talk about how to do calculations with these states, let me just briefly highlight that they're actually a very good class of states for describing the kind of systems we want to describe. So they're very well suited. So given a specific state which doesn't have too much entanglement, so it shows up as a ground state of some local Hamiltonian, can we approximate it efficiently by matrix product states? And the first observation is that in principle, every state can be described in this form. So this means this construction, this class of states, has enough descriptive power to describe any state we want if we're just willing to toss in enough parameters. I mean, obviously, parameter counting will tell us they are constraints, right? This thing here has, we had this last time, these are n of these guys. This index has d, small d values. The other two have capital D values. So this has n times small d times capital D square parameters. This here has small d to the n parameters. So we see that if we want to describe any state, we have to choose this variable capital D, which is a degree of freedom we have, right? How big are the matrices we choose? Very big, actually exponential in n. But at least in principle, we can get any state. And there are at least two ways to understand how this might work. If you think in terms of this picture, so imagine we have some wave function psi on a number of sides. Let's say 5, OK? We want to prepare some state of 5 spins. So one way we could think about that is to first put all spins on the same side. So we put our full spin system. We build it up locally on a single side. So I'm trying to think in some operational way. So now we have these five spins here. And now we would like to apply a map, which does a following. It promotes this spin here to a physical spin. So basically I would like to apply a map p. So what do I? OK, sorry. So I have these five spins. Then I add entangled states, OK? Kind of just an operational prescription. So I add two pairs of entangled states. And now all these spaces have the same dimension, small d. Now what will I do? Well, one thing is I will take the central spin and just say this is my physical spin at the central site. The second thing is I will do a projection of these two guys on a maximally entangled state. It's like teleportation. Just in teleportation, we want to do a real measurement. So we have, for qubits, four possible outcomes. If we are free to do a projection, we can just project onto the measurement outcome where we don't have any Pauli errors in teleportation, right? There's one outcome in teleportation where you don't have to do any correction. If we can just project, we can project onto the state where no correction appears. If we do so, this state here will get teleported here. We can do the same projection here. And then this state will get teleported here. And the same on the right. This guy gets teleported here. This guy gets teleported here. Now, this might look like a mess, but it's just linear maps. It's linear operations. It's projections onto entangled states. And this is just an identity map. So it's certainly a special case of writing some linear map p, which basically just teleports some of the stuff to the left and the right. And now we can basically proceed exactly the same way. We can add yet another entangled state here, I guess. We project these two guys on a maximally entangled state. So this guy gets teleported even further and ends up being here. And this guy up here, we just promote to our state at that side. And we do the same thing on the left. And this last state, we also just promote to a physical state. So overall, what you see is a middle state ends up in the middle. This state here goes up here, goes down here. So it ends up here, as it should. This state goes here, gets teleported again. Ends up here, becomes physical. So we end up here. So now, of course, you can try to write this down in formulas, but the general idea is you don't really have to think much about it. You can just say, well, all I did was I did some pretty complex linear map here. And this is what I call p3, say, for the middle spin. I did some well, also fairly complex linear map here. This is what I call p4. This one is somewhat simpler. This is what I call p5. And same to the left. So just by thinking in terms of teleportation, we can understand that we can build up any state like that just by preparing it locally and teleporting its shares, its parts to the according location by using these entangled states. So we can use this entanglement to do teleportation. And well, because we don't need polycorrections, we can project that really corresponds to just applying a linear map at each side. Another way is in this graphical language. You can understand the same procedure in the graphical language. What you say is, well, we have five sites. And we want to get some C with five indices. So let's just do the following. Let's put this C with the five indices in the middle. And what we would like to have is that the first index goes to the very left. That one goes here. That stays in the middle. And so further. And what we can do now is something very similar to here. We say, well, this complicated thing, which actually now has five indices, we declare to be our central A with I3 here. And then it has some alpha and beta, but now we have two indices. But of course, what we can do is we can bundle these two indices just to a bigger index, to a multi-index. So this has some value alpha 1 and alpha 2. So we can just build a new alpha, which we define to have the components alpha 1 and alpha 2. And similarly here, so we just define a new more complex object. And then we do the same here. And the same here, we define a new object. And indeed, it's very close to what happens here. This is just something which teleports this part here. So it's really just a different way of phrasing the same fact. And of course, now you can try to write explicitly these As using the fact that this here is a delta tensor, where this index is equal to that one and so forth. But then, of course, you get lots of indices. It gets very messy, but it contains the same information. So the basic point is that really every state can be expressed in this form. That's a good feature. It wouldn't be good if we had a family of states which wouldn't get complete at some point, which would really essentially miss certain types of features. So that's a nice property. What is more important is that when we have states which are ground states of local gap Hamiltonians or states which obey an area law, we can do this efficiently. We can get an efficient approximation of any state. So here, of course, the problem is that just by parameter counting, as I said earlier, we will get a scaling like this, an exponential scaling. The d square basically has to scale like small d to the n. You can see that this happens here, because we have to take half of our system and teleport it to the left. We have to take n over 2 minus 1 sides and start teleporting them to the left. So we need that amount of entanglement. We won't get away with less for a generic state. You could, for instance, try to build up a maximally entangled state, where each spin on the very left is maximally entangled to the very right. And so further, you have entangled pairs like that. You will just see there is no way to prepare this with less entanglement, because it has maximally entanglement if you cut through the middle. No, you can change that. I think from a quantum information point of view, I find this a fairly nice and intuitive argument. But indeed, you can just do it in a formal way, so to say, without talking about teleportation. I think it depends on your way of approaching things, whether this is more intuitive or not. I mean, obviously, formally, it's completely equivalent. I just find that coming from quantum information, for me, thinking in the procedural way and operational way is often more intuitive. But again, it's a matter of personal preferences of the way you think about these things. So second thing is, if we have ground states, we can actually efficiently approximate these states. And what do I mean by efficient, or what's a general idea? Whether general ideas, if I follow this procedure, I have some state psi. I can write it in this form, where what happens is that the dimension in the middle gets bigger and bigger. Let me just cartoon-wise depict this by putting thicker lines. But now what I can try to do is I can try to take this middle dimension and reduce it to something which is smaller, something which is less. And the point is that this is actually possible if the entanglement is not too big. So if the entanglement is not too big, so what I can try is I can do a cut here and insert a projection, which projects out some of these degrees of freedom there and only keeps part of them. And the point is, if there's little entanglement, the error which I will get by discarding degrees of freedom will decay very rapidly. So in fact, what's happening is a following. So this is a summation index, right here. All these guys, these connected legs are summation indices. So let me call this summation index alpha. And this summation index alpha will basically be something like, well, I will have some state on something on the left. Let me call this left alpha and something on the right. So what I have is that basically this is a way of decomposing this state into a left half and the right half times some alpha. Now, these states are not normalized, but you can see it resembles a Schmitt decomposition. Well, it's not really a Schmitt decomposition. It's a general decomposition. But you could go into the Schmitt basis by doing a basis transformation. So in this basis, in the Schmitt basis, what will happen is if you don't have much entanglement, these coefficients alpha will decay fairly rapidly. Because if they wouldn't decay rapidly, you would have a lot of entanglement. Because the entropy of lambda squared exactly measures the entanglement. You don't have much entanglement. The lambdas have to decay quickly. And then basically what you can do is you can throw away exactly those directions in that space, right? Those contributions which correspond to the small lambdas. This way, you will make a small error and still keep an accurate description. So by discarding the small weight contributions, we can get a good approximation with a small error and a small d. And so the basic statement is let me see that if I want to cut to some value of d, the scaling I get is in two norm. So it's case linear in the system size. That's nice. It's case exponential in the maximal entropy I have across any cut. But that's not surprising, right? Because entropy is a logarithmic quantity. So the number of degrees of freedom has to be exponential in the entropy. And then there's some polynomial scaling with some constant in the dimension. So if I want to get a certain accuracy, I have to scale my bond dimension here only at the polynomial rate in the accuracy. And I also only have to scale it in a polynomial way with the size of the system, which is much better than this exponential scaling I would expect in general. So d, which is a number of parameters basically, will go like some polynomial in n and 1 over epsilon. So it means that basically the number of parameters, and as we will see in a moment also the computation, time to do computation, scales polynomially rather than exponentially, which would be the case for a brute force treatment of a many-body system. And that's not the same C I suspect. Actually it's a bit more subtle. In fact, the entropy is some rainy entropy for alpha smaller than 1. And the constant depends on the rainy factor. It's something like alpha over 1 minus alpha or so. So that's actually the divergence when you approach the phenomenon entropy. Oh, there's three. The two C's on the right, this C is different. I think these two are actually equal, yes. These two should be equal. And they actually, so to be accurate, they depend on a rainy index. And that's a maximal rainy entropy across any cut for some alpha smaller than 1. The statement actually breaks down as you approach the phenomenon entropy. And it relates to the fact how fast do your Schmidt coefficients decay, given you fix the entropy in a certain way. And you can see that this depends on the rainy index. So now if we, so one thing we have seen is that we can get an efficient description of, well, states with an area law and thus well low lying states of local Hamiltonians in this framework. But of course that's not enough, right? If we want to say do some simulation with these systems, we would also like to be able to evaluate quantities efficiently, right? Not only have an efficient description of the state without having any chance of computing anything. I mean, if we just want an efficient description of the ground state, we could use a Hamiltonian, right? It would tell us everything about the ground state. It would just be extremely complicated, presumably to extract that information, like any useful information we want. So we need to say two things where first of all, we would like to be able to efficiently compute physical quantities of interest, like the energy of the system, correlation functions in the system, something like that. Then maybe even further, we would like to be able to find such a ground state efficiently, which is yet another level of difficulty potentially. So let me maybe start with the simplest thing, which is normalization. And we've talked about these wave functions but we haven't even seen if they're normalized. In fact, generally, we don't expect them to be normalized if we don't put any condition, right? We start with some state, we apply a projection. The norm will be the probability to succeed with that projection. That could be a non-trivial problem. So how do we compute a normalization? Well, normalization is, you think here. So if we use this notation that, okay. So psi is the sum here, C i1, to i n, i1 to i n. Then the normalization is, well, we could just write the sum, but we know that we get a delta function, right? We know that the normalization is the sum of this thing, absolute value squared. So we have C complex conjugate. Let me put a star times C i1 to i n, right? So now let's try to write this down graphically, right? We could go into the formulation, put the trace of matrix products, try to think about it. I mean, one thing we could do is we could just compute the C in our computer and then evaluate the sum. Now that's the worst thing we can do, right? Because the C is this exponentially big object in the system size, and our whole point was to avoid describing exponentially big objects. So we better don't compute C. We should find a better way of evaluating this quantity. So rather than doing formulas, let me draw a diagram, and I will look at open boundary conditions. So we have a i1, sorry, a1, a2. So that's i1, i2, and so on, right? Now let's write C star. Well, C star looks basically the same, but with complex conjugate i's. Okay, let me put bars for complex conjugates because the stars will look messy here. Now I draw the legs down. You could think it's a convention for conjugates or just some convention I randomly choose. And so further, so now what do we have to do is, well, these guys have identical indices and we sum over these indices, right? So remember the convention was that if we sum over an index, we have to connect these indices. So what we are doing is actually we have to connect all these indices because we're summing over all these indices. So what we get is graphically a tensor network like that, which doesn't have any open legs. And this makes a lot of sense because the normalization is a number, right? It shouldn't be dependent on any index. All indices are being summed over, and that's exactly what we have here. All indices are being summed over. So now the question is, is there a way to evaluate this number efficiently? And well, basically what can we do is we have an object with many tensors and many summations. We have to find some order of summation, basically. So we could start summing all these guys first. Now this would correspond to computing this object here, which was just C, right? This is a bad idea because it's exponentially big. So we should do something different. So what can we do? Well, we can slice the whole thing in the opposite direction. So what we could do is we could basically say, let's try to cut our system into slices like this. Now what is this slice? This slice here I can think of as some object with two legs sticking out. But as before, I will think of this as some double index, right? This is some index alpha, well, alpha prime and alpha double prime, but I will just gather the variables alpha prime and alpha double prime to some bigger variable alpha, right? So I can think of this as some D squared dimensional index. So it's something which only has a single leg, right? So it's a vector, so it's a D squared dimensional vector. And let me call this vector E1. Now on the second side, I can do the same. Let me call this E2. This has a double index on the left and a double index on the right. So this is now D squared times D squared dimensional matrix. And so further. So what you see, I mean, if I would write a formula, what I would have is that ES at side S is given as a sum over A at side S, tensor A at side S, conjugate. Because this would have indices alpha and beta. This would have indices alpha prime and beta prime. And this would have this, sorry, prime double prime. So this would have this double index alpha and a double index beta. But I don't have to put the indices, right? That's why I write a tensor product. I can just omit these indices and just talk about the tensor product. And this will immediately give me this kind of bigger space, right? That's what a tensor product does. It takes two vector spaces and makes one bigger vector space out of it. So these objects E is something which we can just compute efficiently. That's what this expression is telling us. So this E is a matrix of size D square times D square. And it can be efficiently computed. Sum is over I here. It can be efficiently computed because we just have to sum over this one internal index, right? There's no scaling with the system size. It basically scales as D, but D is anyways a size of the object we're dealing with. We can't hope for anything much better than D. So now what we have to do is we have to evaluate this picture where these indices are again all being contracted, right? So what this total formula, again, this is a normalization. What this total object corresponds to is a product of these guys. We have to take E1 multiplied with E2, multiplied with E3 and so further. Now this is an object of size one times D squared, sorry, this is an object of size D squared times D squared and so on. And we have to, so we have to take a vector and multiply it with a matrix and keep going on. Now what's a complexity? The number of operations needed for that. So what is this? So if we have an A times B matrix and we multiply it with a B times C matrix, what do we have to do? Well, we do matrix multiplication. We have A rows, B columns, B rows, C columns. For each entry here, we have to take the corresponding row and take the scalar product with the corresponding column, which has B elements, right? So we have to take B elements, multiply each with the other one and add them. So for each entry here, we need B operations. I mean order of B operations. And there are A times C entries. So the total complexity of matrix multiplication is that we need an order of A times B times C operations. So what happens here is we have this vector we multiply it with a matrix. So we need one times D squared times D squared operations. So we need an order of D to the four operations. After that, we again have a vector, right? So again, we are doing the same thing. So in every step, we're doing order D to the four operations. So the total cost for evaluating the whole expectation, the whole normalization is on the order of N times D to the four operations. Now it turns out one can do it smarter in a way where we only need an order of N D to the three operations. I won't explain how, but it's actually not so hard to figure out the questions in which order do you use some things. So if you want to think about it, you're very much encouraged. It's very instructive to try to figure out how to go down to D to the three. There's probably a factor of small D then appearing. Okay, so we know how to compute expectation values. We saw it works efficiently in D. And again, D scaled nicely in the system size and in the accuracy, it's scaled polynomially. So the overall scaling is polynomial in the accuracy we want and in the system size. So now we might want to go further and compute expectation values, correlation functions, maybe an energy of some interaction. So if you want to compute an energy, it's very similar. So we have some psi, some operator and some psi. We can do this again with a graphical calculus. So we have the ket vector, the bra vector. But now we apply an operator and say we apply the operator on these two sides. So it acts on two qubits and it returns two qubits. And it's easy to see what we have to do. We have to take these indices here, multiply them with the components of the operator and then connect that here. So if you wish, what we're doing here is we're writing this as sum over C I1 to IN, C J1 to JN conjugate. And then we have, well, our basis element, J1 to JN, the operator and then I1 to IN. Now since it's a local operator here, everywhere where it doesn't act, so all but these two sides, we really just have to connect these indices straight away. And only on these two sides, we get some cross terms. So we get some J, whatever, 3, J4, OI3, I4. And that's exactly what we have here. That's I3, I4, J3, J4. So that's exactly what this tensor gives us. It gives these matrix elements of the operator. And except for that, it's exactly the same diagram. Just that now on these two sides, I and J are not connected directly, but they're related by these matrix elements. So we get this kind of tensor network, which we have to evaluate. And now you can see, we can use pretty much exactly the same trick as before. We say this is an operator here. This is an operator, E1, E2. And here we define some new operator, which we say could call what E3, 4 subscript O. So it's very similar to the original E, but it now contains this operator O. So it's just this object here. And again, you can see that's a finite size object, right? So we can compute it with a finite number of summation. There's no kind of unfavorable scaling in N because N doesn't enter in this object. So what we get is really exactly the same computation. So if you want to compute this expectation value, we simply have to take E1 expectation value. It's computed exactly the same way by first determining all these E's and multiplying them, these E's are still D squared by D squared matrices. So the complexity of the whole procedure is exactly the same as before. Actually, if we move to periodic boundary conditions, not so much changes. The only difference is that the very leftmost thing is also a matrix and we take a trace. So we have to indeed, it's more complex because now we have to multiply two D squared by D squared matrices. So we get D to the six rather than D to the four, but it's still polynomial, right? We're not losing the efficient scaling even with periodic boundaries. Okay. Okay, so in the last 10 minutes or so, let me maybe say how one would go about, so what we saw is we can efficiently extract physical properties, right? Let me say or kind of give an idea how to actually find such an optimum size. So given a Hamiltonian, do we have a way of actually finding upside in this matrix product form, which is a good approximation to the ground state? So procedure I described here doesn't work, right? This procedure was saying, well, mathematically, we can think of it as building the exact state and then discarding information. But building the exact state is again an exponentially complex object. We don't want to do that. Second expectation values. Third optimization over, and so this is what is known as a density matrix renormalization group method or short DMRG method. And don't ask me why it's called like that. Okay, so what is the idea of this density matrix normalization group method? So I will just give a sketch of how it works. There are of course lots of technical details once you start actually doing that. But the basic idea is based on a simple observation about these matrix product states. So if I take a matrix product state and at a certain position, let me call this matrix X. So this is the same as A K I K. So I will take one side and look at it as a special side and ask what happens if I change my matrix only for that side X. So let me write this as Psi sub K as a function of X. And if you just look at this expression, you immediately see that this is linear in X, right? If I take an X and X prime and I add them, I will add those as state Psi. So the state Psi is a linear function of X. So if I fix all tensors except one at a single side, the state would be a linear function of the tensor at that side X. So what does it mean? So it means that if I try to compute the energy as a function of the tensor at some side X, and I have some Hamiltonian, some energy which I want to compute, then this is linear, this is linear, so I got the quadratic expression. Well, I will in a moment. The unnormalized state is linear in X. The normalized state is obviously not linear in X, so I prefer not to, I will normalize the expectation value, yes. Exactly, so what this is, this is a quadratic expression in X. So if I think of X as just a huge vector, X is a three index tensor, but I can just take all entries of the three index tensor, put them in a huge vector, just by grouping all the indices if you wish. Then this will be some quadratic expression in X. So X, scalar product with m times X. And now indeed I would like to normalize this thing. So again, this is a quadratic expression. So what we see is that if you want to compute the energy as a function of this X at side K, then this will be given by a ratio of quadratic expressions, of quadratic functions. So now the question is do we have a way of finding the optimal energy as a function of X? And well, it turns out that if we rephrase this, this is basically something like an eigenvalue problem. So what we have is that we are writing X times NX times EX is equal to X times MX. So the optimum of this kind of expression is exactly given if I have an eigenvalue equation, right? That's some formulation of a variational principle. These are both Hermitian operators because these are, well, Hermitian expressions. So what we will see is that the optimum will be given, both the minimum and the maximum of this expression will be given by the extremist solution of this eigenvalue type of equation. So that's what's called a generalized eigenvalue equation. Or if you wish, we can move the end to the other side. You wouldn't do this in practice because there are good solvers for generalized eigenvalue problems. But in principle, of course, you can think of this as an equation of the form N inverse times M times X. So now it's just a normal eigenvalue equation. And if you look at this normal eigenvalue equation, the point is that the minimal eigenvalue E of X will be exactly the optimal solution you can find here. I mean, it's just a kind of a generalization with this N of the variational principle, right? That the smallest eigenvalue is indeed what gives you the smallest expectation value of all quadratic expressions here. For instance, you could take square root of N and redefine X by absorbing it. Then you just have a normal eigenvalue problem. You can use what you know about the variational principle. Anyway, so once we have this, what we see is the following. If we want to fix all tensors, except for the one at position K, we want to minimize the energy we can do so by solving an eigenvalue problem. So optimizing the energy where we change the tensor only at position X corresponds to solving an eigenvalue problem. So it's something which can be done efficient in the size of the problem, which is again the size of these matrices. And with the same tricks as before, it also turns out that this M and this N can actually be determined. I started five minutes late, so. But I'll stop in a moment here. It can be computed efficiently. So what we can do now is the following algorithm and that's a basic idea of DMRG. We take some initial value for all these A's and we start somewhere, say it's the first side. We say this is our X and we optimize this guy. So we optimize X1 if you wish. So the energy as a function of the first side tensor. Then we move on to the next side. We optimize the second tensor to minimize the energy. And we continue like that. So we move through our chain. We minimize the energy as a function of each side. Then we move back and we keep doing so, going back and forth. And the remarkable thing is that this actually converges very well and typically converges to the ground state. There are a few things, a few tricks which are used in practice to make sure it actually doesn't get stuck. But it actually rarely gets stuck with a few simple tricks. This is kind of remarkable because from a quantum information, from a complexity theory point of view, you can actually show that the hard instances and all kinds of things. So you wouldn't expect it to actually works, but it does work, like many computational methods work very well in practice, even though we can't formally justify why they work. And this also happens here. Now there are lots of tricks, right? There's a trick I mentioned to go down to D to the three instead of D to the four. But there are also a number of additional tricks so one can also improve the computation, the optimization problem in each step by kind of storing intermediate results. So if you want to actually do this numerically, lots of tricks should go in. But the basic idea is really just that we can efficiently compute expectation values, energies. The energy is a quadratic function in each individual tensor, each individual parameter. So we can efficiently optimize for that one. And then we can just sweep through our system and the energy will go down in every step, necessarily by construction. And it turns out it actually goes to the ground state. And yeah, that's a great point to stop. Thanks for your attention.