 Včešel sem taj sešen. Protočen je izgledat z Federico Baca z Universtju Trieste na vsečenju vsečenju, na kvandu spi modelji. Proste. Dobro. Can you hear me? Yes. Dobro. Thank you very much. First of all, I would like to thank the organizers for the kind invitation. Indeed, I'm coming from two floors above, so not far. And I will talk about transformer way functions for the pointer doesn't work very well, but let's say, transformer way function for quantum spin models. And let me say that all credits go to Riccardo, who is in the audience, I saw him there and our chairman, Luciano, who really did the dirty work improving the architecture of the neural network. So for any technical question, please ask them. OK, I will... They are really the persons who worked in the code, optimizing everything. I was mostly interested in understanding how these way functions can be applied to challenging problems in frustrated magnetism. So I will start slowly just trying to motivate a bit and giving a very short introduction on previous way functions, more physically motivated, and then I will describe transformer states. So I already saw that in this workshop people discussed about spin models, even hubbar models, so you should be more or less already in the game, but let me just emphasize that here I'm interested in multi-insulators in which the charge, degrees of freedom are frozen, let's say, in real multi-insulators indeed charge still around, but here I'm interested in models in which I only have spin degrees of freedom. So spin fluctuations are present and they are modeled by Heisenberg Hamiltonian and the simplest one is the one in which you have spin one-half interacting on some lattice and the nearest neighbor interactions. And the funny story comes when frustration is present, meaning that you have essentially in the lattice, so loops with an odd number of edges. So the simplest one is a triangle in which if the coupling is antiferromagnetic, then the system is frustrated because you can put, you can anti-align to spins, but then the third one is indeed frustrated. And every time you have lattices in which you have couplings, indeed forming loops with an odd number of edges, you have frustration. And frustration is interesting because indeed you can destroy antiferromagnetic order or indeed any order down to zero temperature and have exotic phases. Now it's already 30 years and maybe they are no longer really exotic, but still interesting and we are still lacking a full understanding of their properties. And the simplest one is the so-called resonating valence bond state proposed long time ago. It's almost a little bit more than 50 years that Philip Anderson proposed a picture to describe a fully disordered state starting from a singlet in which you couple two nearest neighbor spins and then you cover all the lattice by these singlets and you make them resonating. And if you do a big exponentially large linear emissions of all possible patterns in which you have singlets you have a state which is called resonating valence bond, RVB, indeed this is a short range RVB because you only allow for nearest neighbor singlets and this is pictorially described a state which is not ordered and maybe stabilized in some realistic spin model. And indeed if you want to do calculations with this kind of wave function there is one very insightful way in which starting from your spin model you perform a fermionic decomposition of the spin by using the so-called bricks of fermions in which essentially you rewrite spins in terms of fermions and by doing this you enlarge the Hilbert space and if you want to be to have a faithful representation of the real Hilbert space in which you have only two states per site you have to impose a constraint. And why this is interesting because once you rewrite the Hamiltonian in terms of these fermions you can perform a sort of mean field the coupling in which you have a quadratic Hamiltonian that can be easily diagonalized and from which you can extract the ground state. This ground state has nothing to do with this original spin model because it lives in the fermionic Hilbert space in which you have four states per site empty, up, down and up occupied but then you can go back to the original Hilbert space of the spin model by putting a good thriller projection that just kills all the configurations that are not allowed in the spin model and so you have a correct way function for the spin model. And of course this is a non-perturbative operator but can be treated within variational Monte Carlo technique and so you can indeed assess the properties of this kind of state by doing Monte Carlo sampling and this gives a very transparent interpretation of the wave function because you have some variational parameter a very limited set this is the power and also the limitation as we will see of this kind of approach because you have variational parameters that are essentially the hoppings and the pairings in this mean field state and if you want also some fictitious magnetic field in order to generate magnetic order in the unprojected state and the transparency comes from because the unprojected spectrum may suggest indeed the actual properties of the spin state as beautifully described in the original paper by Xiaogang Wen and indeed it turns out that gapped spin liquids can be obtained whenever the fermion spectrum is gapped and gapless spin liquids like the U1 state that has been used for instance for the Kagome lattice Heisenberg model Kagome lattice is obtained when you have gapless spectrum and maybe you have direct points using this approach indeed in the last years a few models has been studied mainly this J1J2 Heisenberg model on the square lattice that Nomura san described in the first talk in this week and eventually after many years model has been introduced in the late 80s by Dusso and Chandra after many years we are reaching a final understanding of the zero temperature phase diagram in which you have NEL phase which is eventually destroyed by frustration and you have a gapless spin liquid this very small region of a gapless spin liquid this picture has been already shown by Nomura san on Monday and indeed this comes from my talk in KITP which is now for some reason used in many talks and so there are several works including the one by Nomura pointing out that indeed the phase diagram is this kind by the way this approach can be obtained within a wave function a fermionic wave function with very few parameters I will tell you later order 10 even less of course as I told you this approach is limited because eventually you cannot really improve the wave function much you have a very limited number of parameters essentially order number of sites one way that we use and understand why this is always greenish should be black anyway one way to improve this wave function is to apply Langso steps on it so to use this state as the starting state to do a Langso procedure and indeed this can be done on small systems and moreover you can do a non-variational zero variance extrapolation by computing the energy and the variance and these are examples in which I plot the energy as a function of the variance for a 6x6 cluster and this is the case in which you do a random initialization exactly like you do in real Langso approaches in which you start randomly and then eventually you converge to the ground state here notice that the variance can increase during the Langso approach of course the energy is monotonically decreasing and you have this kind of approach and of course if you start from random state you see you are very far look at the scale and it's difficult to reach eventually you need many Langso steps to reach the exact ground state instead if you start from the good state which is the fermionic one you are already very close see this scale is much smaller than that and moreover by performing a few Langso steps here just two you linearly go to the exact and you are happy so this is a way that can here one should be very careful because I should just touch so of course you can do many Langso steps on small systems if you want to increase the lattice size which is important to understand the thermodynamic properties you cannot do many Langso steps in principle you can do only a few we did only two because of course the algorithm becomes very complicated but still starting from the variational state you can do up to 18 by 18 here and the extrapolation here on the 18 by 18 we did only one Langso step but the extrapolation remains to assess the exact ground state energy and here this is coming from a paper we did with Sandro Sorella a long time ago you can even construct excitations and do the Langso step procedure on excited states but you see that indeed in this approach the wave function is size consistent in the sense that variance per site remains almost constant by increasing the system size this is very important if you want really to push the system the method to larger systems yeah this is one is if I remember correctly one is the spin one excitation the lowest spin one and the other one is the lowest singlet with momentum pi zero of course you can fix also this is very nice because you can fix quantum numbers just by playing with fermions and this will give you the quantum numbers of the full state so in this sense it's transparent here this is a table that I think has been already shown by Filippo last last week and shows the energy of the 10 by 10 square lattice at J2 equal to 0.5 which is the sort of challenging point and these are the various energies obtained by different methods here there are references but I couldn't list the actual references if you are interested you can look at this paper by Ricardo and Luciano and you see that of course here we have the exact solution but all the methods are variational by the way these are done on a periodic boundary conditions and so the lowest the better and indeed our state with two lengths of steps which has 5 variational parameters has quite good energy but still since you have to fight with other methods you really need to push to the lowest possible and indeed now I will consider this transformer wave function which indeed has many more parameters but can reach very accurate results so the idea indeed is try to in some sense if you want either confirm the results with this wave function or really understand if there is something else beyond it so the idea is to use neural network wave functions and of course everybody is happy to cite the work by Giuseppe and Matthias Trojer in science who really introduced the idea of using neural networks to parametraize quantum states so the idea is that you have input configurations and then you have some hidden layer, some hidden neural network in which essentially you do some linear combination, you apply some nonlinear function and at the end of the day you obtain a number which in general must be complex or real if you know that the wave function is maybe real with some trivial science you use this to extract the logarithm of the wave function and the variational parameters now are inside the hidden unit so in principle by increasing the number of neurons here you can increase the number of variational parameters and in principle you reach the exact solution in principle because indeed the minimization of the energy with respect to all these parameters may be very hard because the landscape becomes very complicated and indeed this is the simplest possible which is restricted Boltzmann machine and even though in principle it's possible to describe any state in practice it's not so it's not true especially when the system size becomes order few hundred even less size so the idea proposed by Luciano and Ricardo was to consider transformer states in which essentially you start always with the spin configuration in the lattice you use a deep neural network here which has the structure of a vision transformer to obtain a hidden representation another set of vectors z and then you apply restricted shallow Boltzmann machine to this hidden representation and with that you obtain the logarithm of the wave function and so the idea is that indeed you can reach a very good accuracy not applying the Boltzmann machine to the original spins but to this hidden representation then Luciano was very proud to show this slide again for any technical question refer to our chairman so the idea in more detail is to construct patches of spins in the lattice so you have your original lattice square lattice for instance you split the lattice into small patches 2 by 2 for instance and then with these patches you project into some d dimensional space of vector x you use this transformation typical of the transformer depending on this variable h which is an hyper parameter which is the head and then the final set of vectors is mapped into this y vector and eventually the hidden representation is obtained by summing all these vectors and then you obtain the wave function by taking a complex object so in order to see if this architecture works in actual non-trivial examples, the first exercise we did was to consider the J1, J2 model in 1D so here you have one dimensional chain you have spin one half interacting antiferomagnetically at both nearest neighbor and next nearest neighbor and indeed also in this case J2 introduce a frustration the phase diagram for this model is very well known you have a gapless state of course here we are in 1D so there is no magnetic order because of the Merman-Bagner theorem so you have a gapless state when J2 is small and indeed the critical point is now known with very high accuracy then if you increase J2 more you enter into a gap phase which is first commensurate and then it becomes incommensurate and indeed many works have clarified the phase diagram of this for this one dimensional case you can even apply the MRG Luciano was really choosy because he wanted to compare the transformer with the MRG with the same conditions so he pushed the MRG calculations with periodic boundary conditions which is indeed not really the best choice for the MRG and the calculations have been done here for 100 sites and indeed at the beginning Luciano told me that the MRG calculations periodic boundary conditions gave an energy higher than the transformer so this also suggests the accuracy of the transformer state but eventually he succeeded to have the MRG energy lower than the VIT so this is the accuracy of the VIT with respect to the MRG which is let's say VAT or hopefully so and as a function of some hyper parameter this H which is the number of heads you have in the transformer and D which is the dimension of the space in which you project and you see that of course if you increase either D or H you increase the energy and you arrive to very simple case which is sort of simple you can reach eventually with the best state an accuracy which is order 10 to the minus 7 let's say to be pessimistic maybe 10 to the minus 6 but which is crazy for this large system for frustrated case like J2 equal more difficult but still the accuracy in percentage is let's say order 10 to the minus 3 which is let's say unbelievable with let's say other kind of way functions like the one constructed with fermions and not only the energy which eventually is let's say not so important but also correction functions are very accurate these are the spin-spin correlation functions as a function of the distance and these are the full dots, red dots are the transformer results and empty dots are the DMRG so you see that they completely agree and they reproduce the critical behavior of the spin-spin correlations in the gapless state J2 equal to zero and remarkably also when you increase J2 you can compute correlation functions like dimer-dimer correlation functions which go to zero in the gapless state or in the gap where they converge to a finite value because you break the translational symmetry so you have dimer order and you see that again the agreement between DMRG and transformers are really astonishing and you see that you can even describe in commensurate states in which the structure factor the Fourier transform of the spin-spin correlations for J2 equal to zero you have a peak here which is diverging like the logarithm of the sides and the peak is in pi then it stays in pi and it moves towards pi half and again the accuracy is sort of embarrassing even though you can distinguish here the fact that you are not 100% correct but still let's say the main features are optimally reproduced so let's go to 2D this was a benchmark because again the model was not as simple as the easing model in transverse field that in many cases it's considered to do a benchmark in that case the J1, J2 in 1D is non-trivial especially because the sign structure is not known for J2 larger than zero so in the easing model the ground state signs of the wave functions are known and this helps a lot in looking for the in constructing the neural network and the Marshall sign rule is violated for the whole phase diagram except J2 equal to zero and J2 equal to 0.5 which is the Majumdar gosh and indeed using this complex RBM at the end you can indeed recover the exact signs if you compare for instance on exact diagonalizations on small systems but it's trivial but not so trivial example so now let's move to 2D we didn't want to consider J1, J2 on the square lattice because there there are too many works so we looked for a different problem maybe less studied which is the so called Shastri-Sutterland model which is still on the square lattice you have nearest neighbor interaction on the square plaquette a diagonal term only on few plaquettes you see that in the J1, J2 you have diagonal bonds everywhere here you have only one out of four and again when J prime is finite you put some frustration because in the NEL phase these two spins are parallel remarkably let's say this model which is called Shastri-Sutterland because they introduce that in the 80s just for a simple example in which you can construct an exact ground state in some limiting case this has a nice description of this compound which is the Strontium copper borate which has a very interesting physical properties especially when you put an external magnetic field because you have a sequence of magnetization plateau and if you want to know about that the best person is Friedrich Mila in Lausanne who spent his life to clarify the properties of this material let's say we are not interested in the physics with magnetic field this could be interesting to do one day or another with the transformer but at the moment we studied the ground state properties and for that there are very few studies one is by indeed Friedrich with Philipp Korbotz who is the guy who essentially promoted IPEPS method tensor networks and according to them the phase diagram has an L state of course when j is large you have an L phase because let's say j prime is small and the frustration is small and then when indeed j prime is large you have essentially a dimer phase in which you decouple the lattice into uncoupled dimers over j prime bonds and in between you have this green phase which is a plaquette state in which you form plaquette correlations in the empty squares and recently under Sandwick proposed the possibility to have a spin liquid phase in between the Nell and the plaquette phase and so we wanted to assess this and indeed we confirm by using the transformer state that this spin liquid region can be there this is quite technical you discuss the importance of using symmetries in this job and indeed here you have of course symmetries the unit cell contains four sides so you have only translations by two and then you have rotations around the center of empty plaquettes and reflections on a diagonal and implement these symmetries by hand, by just summing over different states applying the symmetries to the spin configurations and by the way the sum here it's let's say not so expensive because it's fixed and doesn't depend on the cluster sides so the results first of all some benchmark just to be sure that we are doing benchmarks before plugging into the real problem so this is on the 6 by 6 cluster in which you can do exact diagonalization so this is the relative error again so it's the energy compared to the exact energy when you change the number of parameters either in increasing the number of heads or the dimension d and indeed here you see the best possible state with this number of parameters you reach an accuracy which is order 10 to the minus 4 but most importantly the spin-spin correlations in this 6 by 6 are essentially they reproduce the exact results by length so here let's say this is plotted along this snake path but let's say for fixing the spin I think here you measure all the possible spins and you see that indeed the transformer state reproduces the exact and if you go to larger systems then you don't have exact solutions you can compare with dmRg and dmRg in 2d doesn't work very well so under Sandvik did that on cylinders cylinders means clusters with L times 2L periodic boundary conditions in one direction and open in the other of course fixing the size the results will be different because the setting is different but if you perform a site scaling increasing L so you can do it on a torus L by L with our wave function or with a cylinder dmRg you see that you converge essentially to the same energy within error bars so we believe that on a torus we are fine so having done this we can go into the actual study of the problem first of all we like to assess magnetic order so we compute we compute the structure factors spin-spin correlations in case space and we measure the order parameter for the nail state taking k equal pi pi and these are the results for three different values of j over j prime and you see that you go from this value in which you extrapolate to a finite value to a case in which you extrapolate to 0 so by changing the value of the ratio j over j prime you go from a nail state in which the magnetization is finite to a state in which the magnetization is 0 now in order to understand if zero magnetization means spin liquid or plaquette you need to compute the plaquette order parameter in which essentially you take an operator in which you take a permutation over four sides in a square plaquette and you construct this operator here and you compute correlation functions at distance r so you compute plaquette plaquette correlations if you have plaquette order this will go to this will oscillate but oscillations will remain let's say finite at largest distance so you can take the order parameter like the difference between this correlation at the maximum distance minus the one at the previous distance so in a phase in which you don't have order this difference will go to 0 otherwise this will stay finite and here I show not so clear with this kind of colors but these are correlations for indeed the plaquette phase in which you see that they remain oscillating while here they are melting and what you can do is to do the side scaling of the plaquette order parameter and you clearly see that in when j over j prime is small this goes to a finite value while if j over j prime is larger this will go to 0 so combining these results with the previous ones on magnetic order you can draw the phase diagram in which indeed you have a small but still finite regions between 78 and more or less 72 in which both correlations are vanishing in the thermodynamic limit and then we can in some sense confirm the results previously obtained by others and collaborators so I'm this is my last slide I still have maybe four minutes so I would like just to say a few words about neural networks because I think that indeed they are promising tools for studying correlated systems and in my opinion their success will be related to the possibility really to tackle important and unsolved problems like the Hubbar model the j1, j2 and so non-trivial spin on electronic models of course there are still challenging it's important to study this kind of model instead of insisting on the easing model on transverse field and the other important point is that really we need large clusters in order to distinguish between competing phases like I'm referring to you because for me the Hubbar model is particular I like I'm born with the Hubbar model so for me I would like before dying to know what's the exact ground state of the Hubbar model and to understand the competition between for instance stripes and superconductivity you really need to push to large systems so it's important to benchmark everything on small systems but eventually the problems are defined on the thermodynamic limit to go to large clusters even if we don't have an accuracy of 10 to the minus 7 so having an accuracy of 10 to the minus 2 it's already maybe enough on a very large systems and this is let's say hard to be obtained with the MRG or other other problem will be to me to clarify a bit what is the role of different pieces in the architecture not to let's say to have an understanding in the way function that I showed before we have a direct understanding let's say at least you can have an understanding looking at the unprojected spectrum here really they are still black boxes in which you have a lot of parameters and it's not easy to interpret the physical properties even the MRG have done this kind of job by let's say rephrasing that in terms of NPS and so on and the last is let's say to construct the excitation spectrum from the ground state of course you can do it by doing a completely unrelated optimization you change the quantum numbers you reoptimize everything but let's say would be nice to construct excitation starting from the ground state and applying some operator to it like the sound waves are constructed by applying the density operator to the ground state so in this sense would be important especially because elementary excitation are needed to fully characterize the face that we are looking so for instance we still don't know if the spin liquid that we find is gapless or gapped so don't ask me this question because I don't know and for the rest I finish here thank you very much questions? nice talk about the black box issues did you try to visualize, extract the weights of the different is a more technical question of the attention heads in the in your model so just take the because that would suggest which type of correlations they are actually extracting yes and the result that they decay with the distance and they recover automatically the rotational symmetry of the model so Federico didn't show it but of course they decay, let's say that the point is in some cases to understand if they decay maybe exponentially then it's gapped if they decay power low it's not this I think it's still not done but this is a kind of example and about the symmetries is it as a preprocessing step or was it in the network itself so there are networks which are by construction invariant we did this for the chassis it depends because you do the maybe he can explain better but you do these patches so for instance in the J1, J2 you construct 2 by 2 patches and so you break the transition asymmetry by 1 not by 2 so by 2 is included in the network in the chassis satellite you already start from a unit cell which is 2 by 2 so you don't break so it's fully translational invariant directly in the network but for the rotations and reflections you impose it posteriori so I think it's just a more technical question and I probably missed this information in your talk but I just would like to have a comparison the VIT and the DMRG compare to each other in terms of computational cost in your spin models in the computational sense this is not easy to compare in general I don't know first of all because they work on different of course if you want to do then he will be more precise but if you want to do really a one by one correspondence like 10 by 10 with periodic boundary conditions of course the VIT wave function is happy and then the DMRG is less happy so most probably it will perform better but because this is not the optimal let's say setup for the DMRG and so the comparison let's say I would say that a direct and let's say faithful comparison is hard because since they are quite pretty much the same for most of your your case I think would be fair to give some estimate of of course listen if you study 2D the big problem in 2D is that the DMRG works on cylinders mainly so you break the symmetry not only rotational but also translational because you have open boundary conditions so in that sense you put a bias so in this sense let's say of course if you use open boundary conditions with VIT maybe you will have many more parameters so it's not easy to do a fair comparison I would say I don't know if you want to add something this is the diplomatic other questions so what would you say as now like what you see as the main limitation now that we have been showing a lot of impressive results so what do you see as the key problem now the key problem is to increase the size because let's say for the Shastri Satelland we were lucky that let's say by doing only relatively small 14 by 14 clusters you can still understand what's the behavior but for instance for the J1J2 I think it's much more difficult really to go to the thermodynamic limit so for me the main let's say motivation I'm trying to push him to really to do large systems because indeed everything let's say computational power is increasing but also the complexity is increasing so it's like a kill and the turtle we are always doing the same kind of clusters now we can do perfectly 6 by 6, 8 by 8 but let's say we really need to do large systems could it also be that transformers are like they are also computationally relatively expensive somewhat we know that they can be super powerful if we can train them but they are also quite expensive in training essentially much harder than simple convolutional network same but like is it more the computational time that is the limitation or is it something else computational and also the memory I think indeed in my opinion would be nice to decrease the number of parameters in this let's say community it seems that the larger is the number of parameters the better you are so you are more and more happy if you have more and more variations of parameters so everything becomes very complicated so the idea would be to reduce the number of parameters in order to simplify the optimization completely great talk, thank you since RNNs have had great success in describing spin systems have you tried or played with the idea of trying to get samples from autoregressively doing a transformer because they can be massed and then you can get samples from a model like an RNN and then kind of bypass the whole metropolis for the story the answer is I would say no if you convince him then it's fine no, but we didn't no, he didn't maybe we can stop so you decide so we can thanks Federico again