 Then let's come to the last speaker of the session and Hannah Lange, and she talked about neural network approach to quasi-particle dispersants and doped antiferromagnets. Please go ahead. Yes. Thanks. Can you hear me properly? Maybe I have to remove this. Okay. Yeah. My name is Hannah, and I'm going to talk about neural network approach to doped antiferromagnets, and in particular, I'm going to present a method that allows to gain some insights in some very interesting parts of the physics of these systems, the emergent excitations by calculating the quasi-particle dispersions. Yeah, but very quickly, what do we actually want to do? On long term, we would like to understand strongly correlated quantum-anybody systems, and one very famous example are unconventional superconductors like the cooperate superconductors where you can see the phase diagram with all these very interesting and not completely understood phases here in this picture. The model that is believed to capture the physics of these materials is the Harvard model, and if we go to the strong interaction limit of large repulsion U, we can project out doubly occupied states, and we end up with the TJ model, which consists of a hopping term between two neighboring sites where spins can hop to empty sites and a spin-spin interaction term for neighboring spins with amplitude J. This model looks very simple, but as all of you know, it's quite hard to simulate the physics of this model. There are established numerical tools like the emergio quantum Monte Carlo, but all of them face some challenges, as I guess all of you know. Luckily, there are also experimental platforms which reach very nice accuracies in the preparation of quantum states by now. But we all know and acknowledge that this is a very highly complicated task to engineer these systems. And then at the end of the day, we do not only want to understand ground states, we also want to see what's the emergence excitations in these systems, and in order to do that, we need spectral functions or dispersion relations. So the goal of this slide is really to show it's a very challenging task to get insights in this model and the physics of this model. And that's where neuro-quantum states enter the stage. We had quite some introductions to neuro-quantum states already, and all of them were very nice, but since we had them already, I decided to do a little bit different one and go a step back and just consider a typical neural network task, which is, for example, image classification. Because I think with this very basic machine learning task, we can already see some of the strengths of neural networks, which can then be very useful when we want to use them to represent quantum states. The first one is the universal approximation theorem, which states that in principle neural networks can capture very high-dimensional and very complicated probability distributions. And that's, of course, useful if we also think of highly entangled states that we probably want to simulate. The second is that we can impose symmetry constraints. That's also a very standard thing that people do also in the classical machine learning tasks. There are architectures, which are very suitable for two-dimensional inputs, also good for two-dimensional quantum systems. And, I mean, usual machine learning tasks are usually performed with external data. Here in the context of NQS, we usually do variational Monte Carlo, but still we can extend the training to also use some external data to enhance the training, and I'm also going to comment on this later on. So yeah, as I said, all of this points in the direction that neural quantum states can be very useful for representing quantum states. And if we really want to do this and use a neural quantum state, the typical strategy that we follow, and that's basically also the outline of my talk, is as follows. So we have our system of interest that we want to simulate. And first of all, we have to decide on a neural network architecture. Then we need to train the architecture, which is typically very challenging. And that's why I will point towards the root of for improvement, which are hybrid training strategies where we can use experimental data, or maybe also unconverge numerical data from other numerical tools to pre-train our neural quantum state. And at the end, I mean, that's why we do all of this. We want to get some physics out of the neural quantum state. So let me very briefly only start with the architecture that I'm using. I'm using a recurrent neural network, but Jonas already introduced them very nicely in his talk. So I only highlight some of the main features. RNNs are auto-regressive that results from the fact that we have this recurrent structure where we have a local input at each lattice side of the system and a local phase and local amplitude that we get out of the RNN cell. And what we do is then just multiplying all the local parts of the wave function together to get the full wave function. And it's easy to normalize each of the local ones. So at the end, also our full probability distribution and the full quantum state will be normalized. That's why we can have the perfect sampling property. There are two extensions that I'm additionally using just to make the RNN more suitable for two-dimensional systems. I'm using a gated recurrent unit and I'm using a tensorized RNN, which means that I'm passing the information in a two-dimensional manner through the system and not only in a one-dimensional manner as, for example, an MPS. And then the GRU makes our RNN a little bit more suitable for longer-ranged correlations. So, yeah, this architecture has been applied already for a range of different spin systems. And I think the works that I have to mention, I hope I put most of them here, but I want to highlight the one by Mohamed Ibad-Allah, who I think was one of the first to apply this RNN. And he and also others used the RNN for spin systems on different lattice geometries. Recently, Mohamed also put this work on archive for the RNN for simulating also topologically ordered systems. But now for our purpose for the TJ model, we basically have to do two adoptions. The first thing is that we now have three states instead of the two spin states, since we also have the whole, and we have fermionic particles. Yeah, I won't say a lot about these modifications since, again, Jonas already introduced the Jordan weakness strings, which can be incorporated when we calculate expectation values. And that's a very easy way to incorporate fermionic statistics into the training. Yeah, now we can already have a look at the first training results that I got out of this. For the full TJ model, the results don't look too good, I must admit. If we scan through different numbers of holes and calculate the resulting relative error that we get, we get this kind of dome for the low doping case, both for the fermions and the bosonic case. So I think there are two things that we directly learn from here. It's of course hard to simulate fermions, and it's harder to simulate fermions and bosons, or sorry, I get to you, but also the bosonic case is quite hard, yeah. This is a very small system size, it's for 16 sides, so a four by four system. Yeah, I tested everything with this small system size just to get rid of the representability issue, because I can use quite many neural network parameters. Yeah, so that doesn't look too good. And first test to find out where this issue comes from was to look at the exact amplitudes and the exact phases from exact diagonalization. And the first thing that we immediately see if we compare a point where the method works very well and a point where it works very badly is that we have a much higher spread of different amplitudes of the exact quantum state when we get into this more difficult region, and in particular we have a lot of amplitudes that are very small. And then again, if we get to a regime where it works well, the variance of the amplitudes is quite high. Additionally, the thing gets more complicated, of course, if we use fermions, because then also the phases look more complicated, and it's harder to learn that for the RNN. We can also have a look at the little bit simpler problem, the TJZ model, where we basically only have Ising interactions for the spin part, and here we can see that the RNN is able to capture or to get quite good results for the bisonic case, but the fermionic results are still not very good. The big question that is in the room now is, of course, can we improve that? And there are quite some ways that we can take to improve it, but actually I don't want to go into detail here because we had some overview by Agnes already in her talk. We had discussed the influence of symmetries already in Christopher's talk, and there will be two more talks on specialized optimization strategies by MINSR and another modification of SR by Aachen and Ricardo Rende. So let me just skip this. I'm happy to discuss more after my talk and directly jump into another route for improvement, which is this hybrid training that I mentioned before. So the general idea of hybrid training is that we want to pre-train our NQS to get a good initialization for the VMC training. So instead of starting from some random initialization, which can be at a very bad point of the optimization landscape, where we have to go through a lot of local minima to end up in the global minimum, we would like to start somewhere here where it's very easy to optimize the rest with VMC afterwards. There were some previous works on this, mostly in the group of Roger Melko, where they used measurements from the sea bases for some rootback system to enhance VMC with this hybrid training strategy and also, of course, it's often very important to get some information from other bases than the computational bases. This is what this work from Benowitz was doing, but I need to highlight that it's very costly to rotate out of the computational bases because you also need to rotate your NQS out of the bases and that scales exponentially. Both of these works use the Colbeck Leibler divergence in this first part of the training before the VMC to just make the two probability distributions, the ones of the samples, the external samples that you train on and the ones given by the NQS amplitudes as close as possible. However, there is one drawback of the Colbeck Leibler divergence, which is that it doesn't incorporate any information on the physics of the system. And one example where you can immediately see this is a very simple one of the 2x2-Heisenberg model. And if we have two probability distributions, P and Q, which basically are the same for these two kind of ferromagnetic cases and different for the anti-ferromagnetic nil-ordered patterns, the Colbeck Leibler divergence doesn't see that from the physics-wise, this first configuration, oh, you can see my, that's nice. And the last one here, that these configurations are actually equivalent in the system that we are considering. So Colbeck Leibler divergence would say that these two are different. Another probability measure that can improve this is the Wasserstein distance. And this distance basically tells us how much probability mass we have to shift from one probability distribution to the other. The nice thing is that we can give it a ground metric. So we can say, we can tell it like what's the distance between snapshots of our system. And here I use the diagonal part of the energy, which exactly tells the probability measure that those two configurations and those two configurations are the same. So, yeah, as I said in the ongoing work that I'm pointing to here is I compare the Colbeck Leibler divergence and the Wasserstein distance, like I compared their performance when I train on measurements in the sea bases. But I also use, or, yeah, so this is this first part of the, I use them in this first part of the training. But I also use some information from the X bases. And in this case, as I said, it's quite hard to rotate out of your computational basis. So I use basically spin-spin correlations of the X bases and train on that using some scaled mean square error between the target spin-spin correlations and the spin-spin correlations of my NQS. And that also allows me to get rid of some experimental errors. For example, here what I show is an average over different translations that I did to the snapshots. And then the second part is the variation of Monte Carlo. And actually, I'm using snapshots from the same group that we were also, like, talking about in the last talk, in the last talk. But here, they considered a bit different systems, the dipole XY model. Since here, we also have these different measurement configurations that they measured in. And we can see there's this first part of the training where I only train on experimental snapshots and decorations. And then afterwards, I switch on the variation of Monte Carlo. And at the end, I get better results compared to a training in gray where I only use variation of Monte Carlo. OK, somehow it doesn't show it from here, but now that I showed you some ways to improve the results, we can actually, we're ready now to dive into the physics. Now I'm going back to the TJ model that I was talking about before. As I said, it's very important, like, after the search for the ground state, it's also very important to get some information on the emergent excitations in the system. And in order to have that, we need information on the dispersion relations and on the spectra. And that's typically very hard to calculate. But there are some methods for conventional methods, but also for new quantum states. We heard about this on Monday already. In this method that was presented on Monday, the momentum that we wanted to target to get the full dispersion relation is used directly in the definition of the wave function. Here, I want to take a little bit different perspective and note that it's possible to apply the global momentum operator to the neural quantum state wave function to get the expectation value of the momentum of our current neural quantum state representation. And that enables us to use a usual variational Monte Carlo training, but in addition, add an additional penalty to the cost function, which is just given by the mean square error between the targeted momentum and the NQS momentum. And the NQS momentum can be directly calculated from the expectation value of the translation operator, so it's quite efficient to do this. Yeah, let me very quickly show you some results that we obtained with this method. The first system that I had a look at was a one-dimensional system in 1D systems, and in this limit of T larger than J that I'm considering, we get spin charge separation, which means that our system basically consists of charge excitations and spin excitations, and consequently, the fullest person is just given by a combination of both, and if we do the not so complicated math here, we get the gray line, but I'm also comparing to exact diagonalization results, and the method that I was presenting gives us the following results here. So we see that the agreement is not perfect, but at least we can capture some of the main features of it, and in particular, the bandwidth is captured correctly. For two-dimensional systems, we don't strictly get spin charge separation, but again, in the respective regime of T larger than J, we can have a look at the system in the partial picture, so we can still kind of define spin excitations and charge excitations, but they are bound together, and in this regime where T is larger than J, the charge excitation is much lighter compared to the spin excitations, so we can kind of think like a very big and heavy center, and then the light hole that is moving around it, but that directly explains that the total movement of this system is governed by the heavier part, which is the spin on, and that's why the fullest version is governed by J, and there are only some corrections by T. This is also what I observe when I do the method that I was presenting for the TJC model, where we don't have J plus, J minus terms, I get a quite flat dispersion in very good agreement with the exact results, and then again for the full TJ model, I correctly capture the bandwidth, and I also see this minimum at pi half, pi half, and then here you saw that I was always comparing to ED, so this was again for a quite small system, but I can also go through larger systems where I then have to compare two dimetry results. So yeah, with this I am at the end of my talk, and I conclude I was showing you how to use the RNN to represent bisonic and fermionic ground states of the TJ model. In principle, this architecture, this tensorized RNN doesn't have any problem with two-dimensional data, and it's also adaptable to any lattice geometry, so here I'm showing the results for a triangular ladder, and then at the end I showed you how one way to calculate dispersion relations from NQS, and in between I had this intermediate interlude on hybrid training and showed that it can improve the performance as I showed here for a transformer architecture and the dipolar XY model. Okay, so with this I want to thank all my supervisors and collaborators and the organizers and the audience. Thanks a lot for the very nice talk. Are there any questions? Yeah, so just to make this clear, because I think maybe I didn't completely make this clear in the talk, I only used the pre-training for the dipolar XY model and not for the TJ results, and for the pre-training I'm using snapshots but only from the C basis, and I also have snapshots from the X basis, but it would be very hard and costly to rotate all of my spins of the NQS to this X basis because it scales like two to the 100 if I have a 10 by 10 system, so I decided to just look at spin-spin correlation functions because there I only have to rotate two spins at a time to calculate the spin-spin expectation value. Yeah, and that's the type of data that I'm using. I think I also didn't mention this here. Here I'm comparing two red lines. The lighter one is for numerical data and the solid one is for experimental data, so I test it on both. Yes, yeah, yeah, there's a coefficient, you're right. Yeah. So the way how I do this is I actually first converge to the ground state without the penalty and then I slowly turn on the penalty but this is a bit, try on error to be honest because if I go too fast, it might find the correct momentum but it will be in a very high energy state but at least starting from the ground state makes it much easier because you're already, you try to follow the low energy state. It for starting not from the ground state, it's very sensitive for the results here that I showed at least it's possible like for all of these points here that I'm showing, I use the same switching on like the same routine so there it wasn't too sensible. The poor performance of the TJ, like of the RNN, I mean for the TJ model, right? I would expect that it moves us somewhere in the optimization landscape that is already in the right area of the ground state and closer to the ground state. Of course, it's not exactly the ground state and especially if we pre-train with experimental data but yeah, that's my intuition. Last question. Four by 10. Okay, then let's thank the speaker again.