 So yesterday we spent two slots on introducing exact diagonalization, the overall concept, the different parts of what you need to understand in order to implement an exact diagonalization code. And so we basically looked at how to generate Hilbert spaces, how to handle symmetries, spatial symmetries, and also some of the AC conservation and so on. And also we spent some time on the linear algebra part, so discussing the so-called Landsoch algorithm, and some of its advantages and challenges. And now today, certainly the first block this morning, I will continue to give lectures. And then depending on how far I get in the afternoon, there might be a little bit of lecture still and also some tutorial questions which you might work on here or later at some point if you're interested. OK, so far we did yesterday, we stopped precisely when starting to talk about observables. So the idea is that we have implemented, we have an understanding of the Hilbert space, of the symmetries, we know how to formulate the Hamiltonian matrix, how the matrix elements are calculated. And with the linear algebra solver, we are able to calculate energies, but also wave functions. And then the question is, what can you do with this? So with the wave function, you can obviously calculate, or are obviously interested to calculate different types of correlation functions. And different types of correlation functions, these are just slightly more complicated once than usually. But the one on the left is a dimer-dimer correlator. So this is a four-point spin correlation function where the bond i and j is the first pair contracted to a singlet and this k and l. And then it's a connected correlation function. Therefore, you subtract the disconnected part. And then that's what you get out once you have calculated it. So the blue is the reference bond. That's like the bond i, j, which is kept fixed. And the other k, l is also nearest neighbor pair. And you sample different other bonds, which have no overlap with the reference bond. And in this particular wave function, you get that pattern. And probably not everybody among you has thought about such correlation functions. But I can tell you that if you get the pattern like that, this indicates you that the system has a strong tendency to actually break spatial symmetries. Because imagine what that kind of means if you think about the spin system, is s dot s is almost like a project to say on a singlet or a triplet state. And the different colors mean a different sign. So blue is a positive sign and red is negative sign. And the width of these lines is a strength for the correlation. And this basically tells you is that if you put your reference bond here, you can somehow assume this project onto a singlet, and then you see there's a very high probability to find yet another singlet on the same column here. But if you check even a singlet here, is there an enhanced probability to actually have a singlet on that row? It turns out no, it's negative. So they're anti-correlated. So in some way, you see here that this is a system, a wave function, which has strong correlations imprinted. And these correlations are of the type that the bond energy wants to become modulated. So if you sum up this contribution over the whole cluster, you get something like an order parameter squared for this type of spatial symmetry breaking, like a dimerization kind of state. And here, there is another correlator which measures a particular component of a spin current. And so this is an oriented operator, which means if you flip those two, then the sign gets changed. But if you keep track of the orientation, and the reference bond also has an orientation, you see here that is a wave function, which actually has the tendency of having spin currents which circulate in alternating patterns on the square plaquettes of the square lattice. So this is an instance of a wave function which seems to show ordering of having spontaneous spin currents circulating in alternating way on these square plaquettes. And so that's something you can measure. But as a technical remark, so in principle, since you have the full many-body wave function, you calculate all kinds of correlators which you might think about. That's an advantage compared to some other methods. For example, quantum Monte Carlo, there are very efficient algorithms for particle models. But it gets harder and harder to calculate more complicated correlation functions. So that might be that the Green's function works very efficiently. But if you want to calculate something like a three-body or a four-body Green's function with various kind of indices, it gets harder and harder to do that efficiently in Monte Carlo. So here, we limit it in system size. But on the other hand, because we have the full wave function, we can basically measure any kind of correlation function. And so something is what you also have to think about is that if you use a code which has spatial symmetries, you actually also need to symmetrize your measurement operator, your spin correlations, also in appropriate manner, which means that if you calculate the nearest neighbor correlator, the actual operator you have to calculate is taking nearest neighbor correlator and kind of move it with all the symmetries. And that's the operator, which you then have to normalize by the number of symmetry operations so that the operator you actually measure expectation value is also a symmetric operator. Otherwise, things might go wrong. OK, so this is the simplest case of just static and equal time correlation function, just normal expectation values. But something which is a bit more interesting because it also links to some of the discussions we had about the Lanzer algorithm is when you're interested in frequency dynamics. So that can basically be phrased into this generalized type of a green function. So this G of A, where A is a certain type of an operator with some frequency dependence omega and some broadening eta. So the general framework is that you have your ground state wave function psi, which you have calculated. And then you're interested in dynamical response on top of that ground state. And the ground state, the excitation you're probing with is operator A. And so A depends now on the physics of your system. I mean, if you have a spin system or a fermionic model or something, then it also depends on what type of operator you're interested. That has to do with what you're going to probe your system. So A can be something like a spin component at the particular wave vector. It can be a fermionic annihilation or creation operator. And then in this case, this is related to say photoemission experiments. And here's some spin components. This is related then to inelastic neutron scattering. So you can choose what you want here as a scattering operator. But the general form of these response functions is that you have the ground state. You apply this operator. And then you have this kind of resolvent. So these are all numbers. These zero is the ground state energy. Omega is your excitation energy. This is the broadening factor, which you can choose at will. And your age is the Hamiltonian operator. So it has a structure of a resolvent. And now, if you give that expression, how would you proceed if you had no further knowledge about how one actually does it? Then you might think, OK, I need to calculate, OK, this state, that's something which is easy to do. You apply this operator H on psi. OK, I think you know how to do that. But then how do you deal with this expression here? Do you really have to go ahead and calculate kind of the complete spectrum of your Hamiltonian and then invert the eigenvalues? Is that what you're going to do? And actually, it turns out that's not. There are more clever ways to do that. And it now links to these ideas of the Krilov space. You have discussed yesterday. And now it becomes clear why in this Krilov algorithm you have a particular starting state which we discussed yesterday. So I recall to you that you have to add this Krilov space, k, with some index n. And you've had the Hamiltonian or an operator. Here we call it mh. And then you have the starting state, phi 0. And this is what's basically, as I said, this was the span of all phi 0. And then h phi 0 until mh to the n phi 0. And do you remember yesterday in the discussion of the simple plain Lantz-Ross eigensolver, we saw that there is a Krilov space. And the Lantz-Ross algorithm is generating us a three-diagonal basis in which the Hamiltonian up to that number of iterations is three-diagonal. But actually, these considerations, yesterday the starting state phi 0 played no particular role. It was a random state of your Hilbert space. And it was supposed to be non-orthogonal to the ground state, but otherwise the actual structure of that state did not play a role. But now in this application here, actually the starting state becomes important. Because imagine now if you choose phi 0 as this operator a applied to our ground state psi. Psi is this ground state. And phi 0 is then a applied on to. So this is a very particular state. This has to be this capturing experiment. And usually, a applied on to psi is not an eigenstate. Typically, I mean, generically, this is not an eigenstate. And now what you can see is that that's very nice. And if we now basically three-diagonalize the Hamiltonian H by a Lanzos process, this really means just three-diagonizing the Hamiltonian. So we start the Lanzos process with this particular starting vector, not with a random one, but with this one. And then the Lanzos process, as discussed yesterday, will generate us a three-diagonal matrix. Here are off-diagonals, beta n, and here it is 0. And something which is also remarked, that's from yesterday, basically, but the specific point is now is that our starting state, in that basis in which the Hamiltonian is made three-diagonal, the starting state is just a vector with component 1 and the rest is 0. So if you're in the Lanzos basis of your Krylov space, then this particular starting state is really just the first component of your Krylov space. And now if you go back and look at this expression, what you really want to know is not, you don't want to know the entire inverted H, shifted and inverted H, but you actually only want to know, basically, the one element of this operator in that basis, namely the element, basically, 1, 1. You see, you want to invert that operator, but in the end, you're sandwiching that with here and there, the conjugate of each other. But this amounts to just calculating the matrix element 1, 1 of this expression. So this is an expression of H, but basically in the Krylov space, so say it is in the xk. And what you actually want to know is basically, so what I mean here is that the starting state in this basis in which H is three diagonal, the starting state is just the vector with only component 1, set 1, and the rest is 0. And this is the bra of it. And this is the expression, so these numbers, like E0 plus omega plus i eta, I call z, it's just a complex number. And so I have the operator here is z minus H constrained to the Krylov space to the minus 1. And this basically just means that this expression is basically just the element z minus H to the minus 1. And then it's just the element 1, 1. Do you follow so far? So it's a particular choice. So what we're doing is that we're taking the state, very particular starting state. We're building the Krylov space on top of that particular vector. We get the three diagonal matrix, because we can start a non-source process with any state. So we do it for this one. And in this Krylov space, which is generated on top of that starting state, the matrix is three diagonal. And now we have not yet employed the fact that H is three diagonal. But we know that in our Krylov basis, the particular green function we want to calculate amounts to having just calculating the inverse of C minus H. But then we are only interested in one particular matrix element of that big matrix. And so the point is we don't actually have to calculate the entire inverted C minus H. But we just want to know one matrix element, namely the top left corner, if you wish. And now, so that's one step. And then the next step is to now recognize that C minus H is still three diagonal, because it basically just amounts to put everything minus to negate H. And then on the diagonal, we just put the C minus alpha everywhere. So that's then C minus H. And now the goal is just to calculate the one element of the inverse of that matrix of C minus H. And then I am not explicitly doing it. But the key point is now, since this matrix is three diagonal, you can now use this Laplace kind of development theorem for the inverse, where you can actually where the matrix element, the entry of an individual matrix of the inverse, can be calculated by calculating the determinant of the remnant of the matrix if you delete one the first entry. So if you do that and you think about this particular structure, then you actually realize that what you have to do is calculate a continued fraction, which implies these elements. So I'm not working that out completely. I can, it's documented in the literature. But the key point is that since you have a three diagonal matrix, calculating the very first element of the inverse is something which you can systematically develop. And then you arrive at an expression with continued fraction, which involves the alphas and betas and C in some particular way. And then the continued fraction terminates basically after that finite dimension of the Krilov space. And then you get an expression for this Green's function at that particular size of the Krilov space. And so you see now that the fact that you have these Krilov algorithms available is a very powerful tool. Because as I said, if you go ahead naively, you would think, OK, I'm having a representation of A times psi. But I really have to invert H. And for that, I basically have to know all the excited energies exactly. And then because you can also write down something like a Lehmann representation, where the spectral function is then basically the same as having the overlaps between your A times psi with all the excited states. So that then becomes obviously very expensive. You really go ahead and calculate all the excited states when exactly and you calculate overlap. That's very expensive. Here you see that's actually not required. Using this nice Krilov approach and exploiting this algorithm, you actually get the rather cheap expression for this spectral function. And that really is the key why we can then calculate a spectral function here. That's an example of a particular Heisenberg model on a triangular lattice in zero magnetic field. And this is basically some spin structure factors so my A is basically something like a AC component at different momenta. So these are different momenta. These different individual plots. And here you can now see the spectral function which you obtain. And so the point is basically that in your continued fraction, I mean, you have your alphas and your betas. You have calculated them once and for all. They are there. They come from this launch process. But once they are there, you can then re-evaluate the continued fraction for different values of C. And then by changing C, you can basically change either the broad and the eta. Or you can change the frequency which you want to sample. So in principle, you just take a grid of frequency as fine as you want. And you re-evaluate the continued fraction for all the frequencies. But this, since the alphas and the betas do not change, you only have to do your Krilov run once. And then evaluating it for all the frequencies is very cheap. So yes? You need to know that the matrix element of C minus H is 1,1. No. You need to know that it's the same if you do it in the grid of space. Or if you invert H that is C minus H in the final space. Or you do it in the same space. Yeah, there's something which I have not yet mentioned. I would have come to is that what's the size of the Krilov space? That's like a free parameter. And it actually depends on the structure of your spectral functions. How big that Krilov space has to be in order to actually get the correct shape of the spectral function. And the point is basically, if you have a spectral function, which is, I mean, in a finite system, a spectral function is just a collection of delta functions. Because the spectrum is discrete, you can just have a bunch of delta functions. But a bunch can mean one or two important. It couldn't mean 100. It couldn't mean 1,000. And so that depends on the structure of the spectral function, how it actually looks like. And so the controlling parameter is basically you have to increase your Krilov space to make it large enough to actually correctly reproduce the spectral function. So that is something that you have to check by basically looking at different sizes of your Krilov space, which you do here. And then you check whether your spectral function basically converges with the resolution you have chosen. So if you choose a resolution eta, so that's the broadening, that's like choosing a resolution, you can choose it freely. But once you set it, you can then look at different sizes of the Krilov space and basically check whether your spectral function at the resolution you're looking at is converging or not. So that's something. And actually, something which is also interesting, and which we'll have to do with the real-time evolution, I'm just going to discuss on the next slide, something which is also interesting is that these techniques can actually also be used to calculate basically the overlap of an initial state with a Hamiltonian. Because if you're doing interest in time evolution, then you might want to know if your initial state, which is not an eigenstate, typically, if you're doing some quench experiment, you would like to know how an initial state decomposes on the eigen spectrum of the Hamiltonian by which you're integrating your system in time. So I mean, you have some psi naught, some initial function, and then you evolve it with the Hamiltonian. And basically, you would like to know what the overlap is of psi naught with the phi n's, where basically the phi n's are eigenstates of the Hamiltonian by which you're propagating. And again, I mean, you would like to know this. And the question is, how would you do that? Again, I think the simplest way is just to calculate the spectrum of h, calculate its eigenfunctions, and then you loop over all eigenfunctions, you calculate overlap, basically, as it's written here. But there's, again, a way how you can do that by actually taking phi naught as a starting state for a Krilloff run. So you take phi naught. You do a Krilloff run with your new Hamiltonian on top of that one. And then by having basically no scattering, just taking phi naught is initial state, you get h. And then if you calculate the resulting spectral function, the resultant on it, you basically get the spectral distribution, which means you get the curve, which has peaks at the energies and with the weight, which corresponds to the overlap of that. So you get an energy result, the composition of your initial state in the basis of your Hamiltonian by which you're propagating. And so you can basically see how broad your initial function is distributed over the eigenlevels of the Hamiltonian by which you're propagating. And that's actually also quite useful because, again, I mean, these Krilloff type methods, they are comparatively expensive as like launchers runs for ground states, even though you're talking about excited states. And so that's quite nice because you can use the sparse matrix techniques. And with a few hundred or perhaps a thousand iterations or so, you get already a very good idea how your starting state is decomposed in the eigenbasis. And then you can develop a physical picture out of that. And it's clear that for Hilbert's basis of millions, it's impossible to calculate all eigenstates and calculate the exact overlap state by state. That's not possible, but with this Krilloff technique, to kind of do the Krilloff run with this non eigenstate, you're actually able to learn something about these overlap distributions, even for Hilbert's basis, which are larger than what you can deal with with exact full diagonalization. You have questions so far? And so another closely linked, yes? Yes? For the five-year-old, constructing the Hamiltonian and the Krilloff? If you have constructed the Krilloff space, you have these alphas and betas. And then you have to write down a continued fraction. And this has to do with the fact that you're developing there are linear algebra formulas, like how to develop an element of the inverse of your matrix by doing consecutive determinant calculations where you delete rows and columns out of your matrix. And if you're applying these ideas to this particular form of a three-diagonal matrix, if you do that, then you realize that you're actually getting a continued fraction. So like a fraction, like 1 divided by and then 1 plus. And then these alphas and betas start to enter. And so you get the continued fraction. And it stops once you have at the level n corresponding to the size of the Krilloff space. So there you're using the explicit form that your matrix is three-diagonal and not the general one. And then you can actually kind of buy formally on paper. You can develop the form of that matrix element of that one particular matrix element of the inverse by doing kind of a recursive evaluation of determinants. And then you realize that kind of turns out to be a continued fraction involving these elements on the three-diagonals of your matrix. And so a continued fraction, in the end, gives you a number because it's a continued fraction which involves these numbers alpha, beta, and c. And if you evaluate it, you get the number. And that's then basically the value of your green function at the value of c which you have chosen. And then you choose another value of c, many of these choosing another frequency. You re-evaluate the continued fraction, gives you another number. And if you do that for many frequency, you get this continuous curve like it is plotted here. Because formally what you're doing is that it's the same. Like if you take A times psi as a starting state or some other state, psi not like an initial state, it amounts to the same. This is like giving you something like the spectral decomposition of this state in the excited state space, basically. But you can also take a wave function which does not come from scattering but which is the ground state of another Hamiltonian. And so doing these type of spectral functions for time evolution amounts to the same thing, basically. So another application which I find quite important is, and also where these Krilov algorithms come in very handy, is real-time evolution. So it's again the same story. If you're interested in actually knowing the propagator, so the propagator is the matrix exponential of minus it times h, where h is your Hamiltonian. So if you want to go at any instant to any time, the obvious approach is to take your Hamiltonian and completely diagonalize it, then you can exponentiate your propagator in the eigenbases, which is the exponential of all the energies times t. And then you go back into your bases and then you have the propagator for all times. But this requires to really be able to calculate the entire spectrum for a Hamiltonian. And as we have discussed, then you're basically limited to, say, about 100,000, where you can calculate matrix dimension of 100,000. There you're able to calculate the complete spectrum. But if you're working with the Hilbert space of millions or billions, this is clearly not possible. And then you might ask, I mean, is it still possible to do real-time evolution? And the answer is yes. So if you give up the fact that you want to know the complete propagator, where basically you can put in any time, and you can put in any starting state, whatever, but if you're willing to give up that and don't just focus on the initial state you choose, and then doing a time evolution for some time interval, then there are actually discrete-off methods available where you can propagate a wave function you chose as initial state in time by consecutive time interval boosts, basically. And that also relies on Krilov techniques. You see that these Krilov ideas are really quite powerful. And so here, this is just an application, but I can give you another short idea. So here, now, the idea in that particular state is still that we build a Krilov space, but phi naught is then really kind of our psi naught. That's the initial state, the initial state which you want to propagate. So psi naught, that's our initial state, and that's then kind of phi naught, because phi naught was my convention to label the Lanzos vectors. So we choose a particular initial state. That's the initial state of our quench dynamics. We build a Krilov space of a certain size, and then we now remove the signs and disease again. So this is just the standard Lanzos form of the Hamiltonian in the Krilov space. So that's the Hamiltonian. And now the idea is, again, quite simple. Our initial state in the Krilov basis has this form. So phi naught, basically, has a component representation of a vector like that. And so the natural thing to do, then, is now to calculate the Krilov space of a certain size. And this is a rather small matrix, typically. And now the idea is you can just exponentiate that small matrix. So calculate, basically, the matrix exponential, exponential my i t h, but h constrained to the Krilov space, so this small matrix. So we call that u of t, again, a constraint to the Krilov space. So that's a small matrix. So if your Krilov space is 20 vectors, which you have chosen, then this matrix here, which lives in the Krilov space, is a 20 by 20 matrix. And calculating this matrix exponential is straightforward for a 20 by 20 matrix. That's no effort, so you can do that. And now you can see, since the starting state, I mean, what you want to do in the Krilov space is then to apply this propagator, u of t constrained to the Krilov space. You apply it onto that our starting state. But since our starting state in the Krilov base is trivial, it's just the first component, you're basically applying that to the vector 1, 1, with the rest, all the zeros. And then you basically see you're just getting the first column of your propagator is the linear combination of your Krilov vectors in the full Gilbert space that gives you the time-propagated state. But this basically means that you're getting the first column, basically, of this u. So it's u1, u1, u1, 2, and so on. And now you can again ask, what is the relation of the size of your Krilov space to the time interval, which is something you might be wondering. And actually it turns out that mathematicians have really proved the convergence of these algorithms. So it actually turns out that if you fix a certain time interval, say you take a starting state and you say you take a given time interval by which you want to propagate your wave function, then you can actually prove that the convergence of the quality of your propagated wave function is exponentially in the size of the Krilov space, which basically means for each time interval you can basically converge the time integration by making the Krilov space sufficiently large. And now you might wonder what does the sufficiently large mean. And that actually turns out, like if you're doing some, this is a many-body quench of some quasi-Harvard model, where we're comparing also two time-dependent EMRG. But the point here is you basically want to sample the dynamics. And so here the time steps are rather short, something like a few parts of, perhaps, a few parts in 10 to the minus 2 or so. And then so these are rather short time intervals. And then it actually turns out that the Krilov space you need to generate to be extremely precise for this time interval propagation just amounts to a handful of Krilov vector, perhaps 5 or 10 or so. And then you're already converged. And then however, if you make the interval larger, if you directly want to jump from 0 to 1, then it can actually require a lot of iterations, something like 50 or 100 or something. But even that is, in principle, feasible. You just have to make your Krilov space large enough. And actually there's also a kind of a truncation criterion, which I think can also be made quite intuitive. So the idea is basically you see here the time-propagated vector in the Krilov space of your starting state has this representation. And what you can basically check is whether the last component here of your time-propagated vector in the Krilov space, if these components here start to become smaller and smaller, this basically tells you that if you're expanding your Krilov space, you're not adding something substantial anymore. Because if basically your time-propagated wave function, the Krilov space, does not profit from the Krilov space getting larger, the result is basically converged for that time interval that you have set. And so there's an intuitive criterion somehow to check for what is last time-propagated element of that vector is actually doing as you make your Krilov space larger. Because making Krilov space larger means this vector gets longer and longer. And if these components down there start to become smaller and smaller, basically it tells you that even if you make your Krilov space larger, it does not make your calculation more precise because you're converged for that time interval. And if you make your time interval larger, basically it probably will have to blow up your Krilov space. But it just tells you the message is that you can formulate the convergence criterion like having a very accurate representation of a propagated wave function. And then you run your Krilov algorithm. I mean, you do matrix vector multiplication. And after each matrix vector multiplication, you basically check, because this is inexpensive, you can basically check what is the last component in that vector. And if you see that this last component of the vector is basically converging to 0, then you actually can stop if it's below your convergence criterion. And then you're actually guaranteed to have made a very accurate time integration. Yes? I'm sorry, correct me. You just wrote here that you have many time steps. Yes? Steps in the propagated. Yes? And then the next step, you use the final state for the propagated wave. Exactly, yeah. Is there a way to prove that there is no accumulation of arrows, no consolidation of arrows, one propagates for very long time? Yeah, I mean, we made some checks because for small sizes, as explained, you can actually calculate the full propagator, and then you can check. Or you can also tighten the convergence criterion, this Lantos propagation, and check whether your results actually change. And so at least numerically, we checked that. And I think also that, I guess, that is mathematically. So what is the convenient time step to use the material that you have to multiply? Because it is so that you can move you along the steps Yeah, I mean, often it's actually dictated by the sampling frequencies by which you want to sample your physics. It depends on what you want. But it's true. I mean, if you really somehow want to make a long boost in time, I would be a bit careful. So one should probably check if you're doing like a time step of 10 or something where your initial frequency, natural frequency in a Hubbard model is actually like you, which is very large, perhaps you should really check whether everything's working fine. But here you somehow want to resolve some oscillation phenomena, so somehow we have to sample, just measure sufficiently frequently in order to resolve the dynamics just kind of by Nyquist theory, meaning to sample large enough. But if you're not interested in actually resolving that, you can also take time steps which are larger. So what is really nice is that, say, if you think about other time integration methods like Trotter methods, which you might have heard of, or some Runge-Kutta, you can also formulate Runge-Kutta in your Hilbert space to do time evolution. But these algorithms clearly have an error which has to do with the size of your time step. And these methods do not have that because you give a time step, and then the algorithm basically makes sure by increasing the Krilov space to its sufficient size that basically you make no detectable error on that time step. But of course, the larger your time step is the more effort you have to work, but there's no time discretization error if your error criterion is tight enough. There's basically no time discretization error. That's also nice because then you don't have to make runs for different kind of time steps and so on. Because you do take a time step, and then if you're following this error criterion, then you're guaranteed basically to have accurate results and there's no Trotter error or something like that. Yes? Does this method depend on how highly excited my user is? Yeah, it has an impact because something which I actually is not so well known, but I can tell you, is that both basically what you're doing, if you're calculating this Krilov space, the Lanzos values here, this alpha and this beta, these coefficients of the T matrix, they actually have a physical interpretation that's perhaps not known to a lot of people, although if one says it, it's actually obvious. Like, I mean, alpha zero, if you think what that means in the algorithm, it's basically expectation value of your operator H in your starting state. So alpha zero is actually the mean energy of your starting state or whatever. Of that state, if you do Green's function, it's basically expectation value of your excited state. It's A times prime, which you have generated. And if you do quench dynamics, it's basically the mean energy of your starting state. But then you can work ahead and actually figure out that beta is actually related to the variance of your initial state. So you can figure out basically how broad your energy is fluctuating in terms of a second moment of your state. And that's also something which is nice. If you actually start to make a run with an eigenstate of your Hamiltonian, you can also ask what's happening if your excited state or the initial state is an eigenstate. It basically means that you start with alpha, you get the mean energy because the energy is finite. But then actually beta one will be zero because it has no variance. It's an eigenstate. And then your Greenoff space is actually exactly closed because it's an invariant subspace in principle. And generically, you do not expect it to close it before reaching the Hamilton space size. But if you have an eigenstate or if you have an invariant subspace, the Greenoff algorithm will actually stop exactly after the size of your invariant subspace. And the eigenstate is obviously an invariant subspace under your operator. So then you're done. So the algorithm actually tells you if you have by chance started with an eigenstate, the Greenoff algorithm will actually stop. And then what it will do is, I mean, there's nothing wrong with it. It will stop. Then it will just generate the phase because you have a one-dimensional state. And then you're just rotating your state with the phase, which is exactly what you should do. So it's interesting that. But then you can go on regarding your question. So if you have a highly excited state, which is not an eigenstate, basically you can transform these coefficients, these alphas and betas, into higher and higher moments. So you're actually making a moment expansion of your energy distribution, of your initial state. And that has an impact on how you're propagating. The question is not that much how highly excited your state is, but how broad it is in energy. The broader it is in energy, the more structured it has, then probably you need to have more. For a given time interval, if your state is broader, you probably need more lancer situations. And if a state which is very only few frequencies kind of contribute, then you probably can keep your grid of space smaller. But the nice thing is you don't have to worry about that. If you use these convergence criteria, which I have discussed here, then it basically, the algorithm kind of finds out by itself. But it has to do a lot of iterations before converging for that given time step or whether it will actually stop earlier. So that's actually something I really like is that if you do all these things properly, this grid of time step, this grid of time evolution is really a kind of a workhorse which you can implement without worrying too much. I mean, you have to implement these convergence criterion, obviously. But I like that this is the only knob you have to put. And then it actually works really quite reliably. And there are other methods, which I don't want to describe, but which go under the name of Chebyshev methods. I think they are also very efficient and very powerful. But there, for example, you need to make some mapping of your initial Hamiltonian, which has a certain bandwidth onto some interval, basically, of energies between minus one and one. And so basically, you always need a first estimator of what is the bandwidth of your Hamiltonian. And this can all be done. It's not impossible, but it's just you first have to figure out what you're bound with it. You have to map it onto an interval. And if your mapping is not exact, it might induce some additional overhead which you don't perhaps don't want to do. So in that sense, I find this grid of time evolution method really quite flexible and really a workhorse tool to do all kinds of spectral decomposition, real time evolution, and also the spectral function. So I think this is a very powerful concept. Yes? Yes? Less than? No, no, n can be 100 or so. Yeah? Yeah, but full solvers, I mean, these full solvers for matrices up to, in principle, up to 100,000 can be done. Obviously, you don't want kilo spaces of 100,000. But kilo spaces of 100 or a few hundred, I mean, numerically, it's immediate to calculate the spectrum of 100 times 100 matrix. Yeah? OK, yeah. But this is really close. Here, there's nothing exponential of your physical Hilbert space. This is just the size of the Krilov space, which is, yeah, and just as a real time example. So here, I'm showing just some spreading dynamics, which we have calculated using the Krilov method. So what we're doing here, this is a Heisenberg Hamiltonian. But we're actually working in a sector where, basically, we're having all spins up apart from one spin in the center, which we have flipped down. And so this is not an eigenstate, but it's actually a superposition of all these spin flips of a ferromagnetic with one flipped spin. It's initial state is localized. Then we let it evolve. And that's basically what you get. So I don't want to discuss the physics. It's rather simple. But it's just an example where you can also do that in principle on using full diagonalization. But you can also scale that up. And there's no, you see, because this is basically 200 times 200 gives you something like 40,000, I think. That's something which is still feasible with full diagonalization. But actually, if you do that using this Krilov method, it's extremely cheap. Because for each time steps, you do a few sparse matrix vector multiplications of dimensions 40,000. That's rather quick. You do that. And then you can propagate in time no problem. Whereas if you want to do that kind of 40,000 with the full diagonalization, you can do that. But it's about more or less on the limit what you can do. But here, you can also do like 1,000 times 1,000 like this. Because then your Hilbert space is a million. It's only a single particle problem. But doing Krilov with a million is no problem. Whereas it's impossible to do 1,000 times 1,000 problem that way. I mean, one has to say this is an exacty sort of problem. So you actually don't want to do numerics on it because it has a closed form solution. It's just a product of two Bessel functions. But it's just a simple example. But you can also do a two particle problem. And that actually turns out to have somewhat more structure. What's happening here now, here it's the same Hamiltonian. It's a Heisenberg Hamiltonian. And the initial state is two flipped spins, which are on the nearest neighbor bond, which is oriented in the x direction. So the initial state is just two particles nearest neighbor, so two flipped spins in the middle. And then the left plot is just the AC profile, so basically there are these two particles moving in time. And then you can see it has a particular form. And these two are two other observables where it's basically projecting the wave function onto two neighboring spins at later time. And then what you can learn from that plot is that kind of this initial, this single particle is just ballistically spreading. And you have some oscillations because these are Bessel functions. But there's no additional structure. That particle just spreads ballistically. But now you can ask what happens if you have these two particles. And then you see there's something like a ballistic spreading of two particles as well. If you just look at the density, it looks rather similar. But what we can see here, if you look at the x-projector, so there's actually something like a bound state of two particles, which is actually propagating in that direction somewhat easily. But it's actually not spreading in that direction so much. You can see that the bound state, because in the projector, you really need to have two particles close by. Otherwise, you don't get anything. But it's actually intense mostly on that line. So it's actually not spreading that much in the vertical direction. But this bound state is mostly propagating in a counter, propagating way along the horizontal axis. So it seems there is something like a bound state so that your initial dynamics have a part which is ballistically flying out. These are like single particle excitations, which are not that much correlated. And then there is a bound state of two particles, which is also a fraction of your wave function, which actually seems to fly out but directional in one direction. And this is another nice application of this Krilov method. And here you can see this is a 60 by 60 lattice with two particles. So here, the Hilbert space is already significantly larger. But still, you can do that with Krilov. Whereas if you want to do that with full diagonalization, you might reach your limits at some point. And you can go significantly larger with this Krilov method. OK, are there questions on this integration? If not, I would like to tell you a little bit now about the parallelization strategies, which some of them are simple to implement. So you might want to hear about that. And then I should also show you a bit what you have to think about if you're doing large-scale parallelization using distributed memory techniques. So here, I show you schematically how a matrix vector multiplication looks like. So as we know in the Lantros algorithm, the Lantros algorithm just requests us to give in some input vector u to apply h onto it. And to deliver back v, which is the product of the two. And now the question is basically, how do you actually parallelize that? Now, we have a lot of shared memory machines, so multi-core machines where the memory is shared, where memory access is not the problem. You don't have to do particular programming for that. However, we have multiple cores at our disposal. And the question is now a bit, how actually do you parallelize such a loop? So it's a rather simple consideration, but there's one way to make it wrong. And therefore, I want to put it out. So one thing is basically, you could say, I basically reserve one part of my starting vector here to be one core. This is another core and so on. And then you can ask, basically, should I basically apply then only a part of h, that one or that part here onto that. But then it actually turns out if you do that, if you basically distribute over the u vector for the different parts to different cores, it actually turns out that if you're calculating results, you will actually interfere on the result side because different parts of your source vector when applied onto h, they get actually spread all over v. And that's a problem because then two cores, two threads actually might write and update the result in the result vector at the same time. And you have to forbid that because if both are writing at the same time, there's a danger that one of the two results is getting overwritten by the other. And so you have to synchronize access to the result vector. But that slows down things because you have to make sure that there's exclusive access to updating these memory cells. But that's not what you want, especially because there's a simple other solution which actually distributes your problem over the result vector, which basically means that each core is responsible to calculate one part of the result vector. And so you clearly discriminate that. So there's no interference on that level. Then core one somehow calculates the matrix elements in here. It's a sparse matrix, but still kind of all sparse matrix elements in here are calculated. And all of the cores basically will read matrix elements from you. But this is not the problem because read can be made concurrent. Like it's not important which thread reads the element in you first because they are not altering it. It's really always giving the same result. But it's just a simple remark, but you have to worry about that. If you're kind of paralyzing your four loops the wrong way, your results might actually be wrong. But if you paralyze over the right correct loop, you have no problems. And it will be a very good speed up by doing that. So that's the explanation that there are no uncritical concurrent reads, which means a different course might simultaneously read elements from you. But that's not the problem. But we make sure that each core only writes and updates results in one where he basically is responsible for it and there's no interference. So here are some benchmark results from, OK, in the meantime, somewhat older machines. But it still gives you a flavor of what's happening. So this is some problem where you're reporting the time for three iterations of such a long source iteration. So they were kind of expensive at the time. And so these were different machines, an SGI origin, some IBM Power 690, some P4, Power 4 processor, a Sun machine, an IBM with a Power 5, and an SGI altix. And it's just interesting to illustrate that this was every all the time it was the same code with the same paralyzation strategy. But there were machines like this black one where and also the gray, the brown one here, where the speedup was actually not that good, which means that although the code is very nicely parallelized, actually the hardware is not able to keep track of or is not fast enough to deliver the memory we're reading and so on. And so if you're using more threads, it's actually not speeding up that much. But you can see other machines like this green one and also the blue and red one. Actually there, the speedup is quite nice because this dotted line, which you might see or not see, they're actually indicating like a one over N improvement. And so you can see that basically on these machines, using four times as much threads, gives you basically a four time lower wall clock time for these matrix vector multiplications. And nowadays, as I said, workstations are multi-core machines, so it's actually quite simple to update an ED code with this parallel open MP type, a parallelization to just use multiple cores and then matrix vector multiplication really gets sped up a lot. And this can basically be done whether you recalculate your matrix on the fly each time or whether you store it in memory, you can apply it to both settings. So obviously the problems we have described somehow assume that you can write down the whole Hilbert space and the lunches vectors that they all fit in the memory of your workstation or of your shared memory machine. So typical sizes nowadays can be, say, a few hundred gigabytes that perhaps what the fat workstation or a compute node on a shared memory workstation typically has. And that puts you a limit on to that. And then there's a simpler extension to that if the problem still fits on memory, but somehow you have a large number of matrix elements. So that's typically when you say if you do quantum chemistry-like problems where the Hamiltonian is like a four-body, a four-body four-fermion interaction or also quantum whole problems, as I explained to you yesterday, they somehow have like a number of matrix elements which scales with the third power of the number of orbitals. Even though the Hilbert space is not huge, the number of matrix elements is substantially larger than in the nearest neighbor spin model. So the actual number of matrix elements to process is quite large. So in that case, we can actually use a hybrid, a hybrid parallelization, which basically means that we're now distributing our problem over different nodes. And on each node, there is a... On each node, we actually keep a copy of our u-vector, the one which the launcher's algorithm gives us as an input. But on each node, we can... On each node, we now apply a part of our matrix onto that vector which we keep completely and we only calculate one part of the result vector. And if the node by itself is a shared memory machine with multiple cores, we can obviously slice that into more cores on the same node in the same way as I described on the previous slide. But here, between that one, there is no shared memory. So that's a different node with another chunk of memory. And on that one, the node two just calculates one part of the result vector. And so these calculations at that step are completely independent. And what you then do after each of these nodes has finished its task of calculating a part of the u-vector, the result, sorry, the v-vector, the result, then you do a broadcast call with MPI, which basically, this has the task to actually send the result of one and to broadcast it to all the others. And all other nodes know about the result of one. And then two does it in the same, and three and four. And so in that way, all the nodes, after the iteration there, they know what v is, and they have their local copy of it. And then locally, since u and v are both, are all then locally available, you can do your lunchbox step, like the linear algebra in the lunchbox algorithm. You can do on the same node, and you can come up with the next u for the next lunchbox iteration. Then again, each node calculates its local contribution, and then you broadcast it, and then you basically replicate your problem onto different nodes. Each one only calculates a part of it, and then you're broadcasting the result. That's also quite efficient. But again, the limitation is that the entire lunchbox vectors actually fit onto your memory. So this is more for applications where you have a lot of matrix elements, but not yet a really huge Hilbert spaces. And so this is a scaling example of so-called strong scaling examples, or strong scaling means that we always take the same physical problem we want to solve, but we increase the number of processors. That's a strong scaling, whereas weak scaling would be that you actually increase the physical problem size while increasing the number of processors. That's not what we're doing here. So this is a particular problem of some Bermouon problem formulated momentum space. Therefore, it has a large number of matrix elements. And here you can see on that axis is the number of CPUs, whereas here is the number, the time, the wall clock time per iteration. And then you can see for a very small number of processor counts, you have a significant time spent in one iteration. And as you increase the number of CPUs, the time for the local multiplication is really going down as one over N, basically. That's very good. The broadcast time also goes down initially, but then it levels off when it stays more or less constant or increases slightly going from 500 to 512 to 1024. So that starts to saturate. But since the broadcast time is actually a small time compared to the local multiplication, you actually get a very nice speedup from very small processor counts up to roughly 512. If you get the speedup of about 500, it's only if you double again, then the broadcast time which saturates comes in and therefore your speedup is degrading a bit. But still in the range between basically a few cores up to 500, you have almost perfect parallel scaling. So it shows that for this type of problem, that's actually really a nice thing to do because there's no really expensive communication apart from that broadcast call. And so that's actually quite fine. Yes? No, I mean this particular problem is something like you take a Hubbard model, you formulate it in momentum space, but that's not necessarily a good thing, but what you can start to do is then truncate the momenta which you take in the restyprocal space. The application there was actually something where we actually look only at the strong coupling physics at the so-called saddle points. Like if you think about the Hubbard model, there are saddle points in the dispersion where the density of states has one over peaks. So we're basically putting a lot of case-based points around there, whereas we delete some of the others. So it's really not the real space Hubbard model anymore written in momentum space, but it's motivated from a real space Hubbard model that then we start truncating part of the momentum orbitals and put more orbitals around other points. So it's really more like discretizing a field theory if you want because you do something directly in momentum space. Okay, but now if you want to deal with problems which are very large, I mean there are two kind of challenges. On the one hand, there are really problems which do not fit on current shared memory machines. And another thing is also that probably Thomas Schultes told you about that in the last lecture as well is that somehow shared memory machines do not, I mean we have these workstations we all know about or these compute classes in Linux clusters and they get larger and larger, but still like if you have like one terabyte per such a node, it's already a huge quantity and it's not so common to really have very large shared memory machines. I think I have never heard about the shared memory machine I think which has more than 16 terabytes. But for some machine, I mean for some problems you might actually want to or have to use like 40 or 100 terabytes perhaps. And I think in the foreseeable future there will be no shared memory machine which has that much memory. So one has to think about the parallelizing these codes using other parallelization paradigms because there are machines available which have aggregate memory of hundreds or perhaps even in the meantime perhaps petabytes of memory but there will be basically no shared memory machine. So we have to think about to develop ED codes which also can work on distributed memory machines and here it's just an example is the blue gene P machine which has rather small nodes or with a few cores and perhaps a gigabyte or two of memory per core or per node depending on the generation of the machine. But however you have a large number of like here the rack and node cards and then racks and then assembled machines and they have like huge number of cores and also year 224 terabytes, a huge amount of memory but it's all distributed a lot and so we have to think about how to get these things done on such machines. But one problem is that it's since we're actually trying to distribute a problem in Hilbert space, I mean Hilbert space as far as we have understood and I think many others struggle with it as well, there's no clear locality in Hilbert space. It's not like a partial differential equation of some physical PDE where somehow you have like a spatial decomposition and since your physics is local you really know where your boundaries are it's only between kind of nearest neighbor in space or in space time where you have to exchange information. Here in the Hilbert space there's no locality as we know which basically means that matrix elements can potentially go from one subset of your Hilbert space to basically any other. It's still sparse so it does not have an infinite amount it does not have a full connection structure in terms that all matrix elements are there there are still a lot of sparse but still kind of where they are in a coarse grained Hilbert space it's still a potentially an all to all structure so that makes it complicated and it's more challenging for a machine to actually a code with this large with which is coarse connection structure. And another challenge now if you come really to exact ionization the way we describe this is also that we have seen that we use lookup tables in various variants in order to do rapid calculation of matrix elements but we have to worry now that these lookup tables are really kind of small and since we're now targeting spin systems for example in close to 50 spins then even tables which require only half of that spins still have our tables which describe 24 spins and then they can become large because two to 24 is something like 16 millions and if you have a lot of symmetries and so on these tables can actually become quite sizable and use a few 10 or 100 megabytes but if you're working on a blue team machine where say the memory per core is of an order of a gigabyte you actually start to get into limitations. So one has to think further how to do that and then one can actually somehow come up with something like link tables also for the case with translation symmetries and here is an illustration now for the Kagome lattice which that's a cut of a Kagome lattice and what we're doing there basically is to decompose the Kagome lattice into three different sub-lattices but that's not the physical choice we're not constraining the physics but it's just that we consider the Kagome lattice as being composed of three sub-lattices and here the gray one, the red one and the violet one so you see these are sub-lattices and what is nice about them is that under some symmetry operations this sub-lattice of a given color is either mapped completely onto another sub-lattice with a different color or within itself but it never happens that if you apply a physical symmetry operation of the Kagome lattice that two sub-lattices get mixed because if you somehow choose a sub-lattice decomposition at random if you apply a physical symmetry there is some chance that some of the sides exchange between some colors, others stay the same and so on and you don't want that, you want that the symmetries act kind of in a convenient way on your sub-lattices so there is not an infinite number of choices for sub-lattices but here this free sub-lattice thing is a convenient choice and then we basically generate lookup tables which know about the configuration of these sub-lattices and if you think through that you realize how to do lookup tables which use quite a little memory but still do not require too much computation so that it's kind of the optimum to really reduce memory as much as we need but still be sufficiently fast lookup tables in order to have a rapid matrix vector multiplication time and then basically what such a code is then doing is that each core, like each process in your MPI process has a certain number of representatives so it only sees a small part of your configurations and also has a small part of your large lunchbox vectors but on this configuration which are known it applies terms in the Hamiltonian generates new states and then with this lookup table the code can figure out where, on which other process that representative is stored so that can be figured out locally and then it basically puts a request for a matrix element into a buffer and then when the buffers are full on all the clusters there's some collective MPI call where this request for information or sending of information to other nodes is then done kind of simultaneously and then there's a lot of traffic on the network of the fast interconnect between the processors and then they are exchanging these matrix elements between the different MPI tasks which are connected by these matrix elements and this can also be done and then you might worry since I said there is some kind of all to all structure it might turn out that this is extremely inefficient but actually using cutting edge hardware like for example the blue gene P at that time when we did the first run of that size that actually turns out to be reasonably fast so first we made some tests with the Kagome 42 site a lattice which does not have that much I mean it is only slightly about 40 sites but initially we used the code which does not deal with symmetries but was just using a C conservation and the spin inversion but no spatial symmetries then the Hilbert space for that problem is something like 19 billions and then if you use a thousand Intel Xeon in Finneband course which is a standard kind of Intel type cluster which you have in many computing centers then one iteration takes 74 minutes of that time but that's already like four years ago that's not a current simulation and then we went through this 48 site Kagome lattice and used as much symmetries as we could for that code and then the ground state sector has 250 billion states I mean it's known to like recount them all so that's the number of how many states there are in that sector and then we did different runs one on a slightly slow machine with 1600 cores but with the NUMA 5 link for the interconnect and so here one iteration takes something like 20 minutes or a bit more that's not that fast then we did a test on the same machine as that one but with 3000 cores here then you get a time of 650 seconds so like something like 11 minutes or so for one iteration and then we also ported it to the blue gene P which has a lot of processes but they are not an individual core of a blue gene P at the time is far less rapid than an Intel so there is some so it's not like 16,000 Intel cores but then on that machine because it has a nice balance between computing speed of the processes compared to the bandwidth of the interconnect then you actually get an iteration time of slightly less than 10 minutes and I don't have results now for the blue gene Q but I think on the blue gene Q it's the same thing around substantially faster so that shows you that this actually works you can do quite large solve for ground states of quite large eigenvalue problems that way but something you might be worried about is what about convergence? We're working in such huge Hilbert spaces so is precision and round off errors everything under control or should you be worried? So something we did there in order to test just for consistency which is I guess you understand that this is important computational physics to make sure that everything works out fine so something we did is a very simple test is that as we have discussed yesterday the Lantros algorithm basically converges to the extreme values of the spectrum but it actually does that on both sides of the spectrum so we're typically interested in the ground state which means like the most negative or the smallest energies directly smallest one but the Lantros algorithm also converges the other end of the spectrum and since here we're interested in a Heisenberg model on a Kagome lattice but the antifermagnetic one the antifermagnetic side of the spectrum is really complicated and that's what we want to understand but at the other end of the spectrum it's the negative Hamiltonian which has its ground state up there and that's basically a ferromagnet but the ferromagnet has no frustration so you exactly know what the ground state energy of a Heisenberg ferromagnet on the Kagome is and so in the same run where you're actually interested in the antifermagnetic physics that you can also just check to what energy the other side of the spectrum goes and this is the same run so it's the same information and then you can clearly see that the energy on the top end of the spectrum goes exactly to the ground state energy of the ferromagnet with the other sign but you can really check whether you're complicated in the diagonalization on the other end of the spectrum is giving you the correct energy for a ferromagnet which is exactly known and obviously our results satisfy that constraint so that's a nice consistency check and another one which we made is this is one particular sector in our Hilbert space which is that large and then we need a run with double precision that's what we typically do where everything from the matrix element to the Lanzos vector and everything is done in double precision and then since actually increasing to higher precision is difficult what we actually did was to scale down the precision so what we called mixed precision here is that the Lanzos vectors were actually done in simple with floating point numbers in single precision so they have kind of an epsilon which is only square root of double precision so it's more like 10 to the minus seven or eight and whereas the Hamiltonian matrix elements were still calculated with double precision and then we really did the same run we basically the same starting vector but just with this mixed precision and then you basically see that as long as there are the mixed precision results are available the convergence of because that's the instantaneous energy as a function of iteration they really exactly do the same thing so we really think that doing the double precision is really the physically correct thing and the converging to the right thing and if we're reducing the precision it's not that the two conversion processes start to fork and do different things but three that one is just a more accurate version of the other and there is no evidence for somehow accumulation of round off errors or things going wrong with the precision so that's actually good news okay and then we calculated some energy but I think that's not so important right now so we can also calculate the ground send energy and compare that to DMRG and that is some valuable confirmation of DMRG results and so with this I would basically like to conclude the part of talking about the technical aspects and how to implement exactingization codes and here is a list of literature just a few which came to my mind where you can find something about exactingization but there are some lecture notes by Nicolas Laflorentine and GDF Qualblanc there are some lecture notes by Reinhard Noack and Salvatore Manmana which also have a part on finalization methods then there is a lecture note by Alexander Weiss and Holger Feske on exactingization techniques and there is also a chapter which I wrote in a book on Frustrated Magnetism where different numerical methods are described but the part on exactingization is kind of the most comprehensive one and this is also complementary to others and what I have not written down here but under Sandvik who was also lecturing here he might have shown you lecture notes which he wrote which is mostly on Monte Carlo and SSC there's actually also part in that with exactingization results which is actually also quite pedagogic for beginners so it does not talk so much I guess about very advanced techniques which I covered here but for a basic introduction it's actually quite valuable as well so I see now that it's 12 o'clock should we make a break now and I talk about the applications later I think I will stop here then we start on time and I will briefly talk about applications and there will be some smaller time to formulate the problem and then you can work on that in the tutorial part okay, thank you