 They will be talking about function in neural network quantum states. Thanks to the organizer for this opportunity, in this talk we discussed how to define a powerful parameterization of a neural network quantum state based on feature learning, and then, as we discussed, an application on a fine tuning. My in the previous talk, I will give you a brief introduction na prinsipulju kvandumovostih. V nasiljenju je vse predstavno po vsem navodiljezvamo smjelo in amiltoničnost na latičnosti. Na prinsipu je zelo vse oselovalo na zelo vzelo in matričnje vzelo in amiltoničnosti na computerin, njih smo je vzelo vzelo za spetru. Prima vsi je zelo vse na ten nečiljen, in ga sem vzout. Tukaj zelo je, da po vse vse n-spinje za spinje 1,5 je 2 n-div opponenti 2 n-div. Zato taj vse inčen velik reprisarje nega vse vseznega. Zato vseznega je, da vseznega je vseznega zradi, da je pravno spusti kakva na militonu in nekaj kundumstaj nekaj način spetet nekaj način nekaj način spetet nekaj način spetet. Tako, ideja je, da vse parametri v kvandum stajte, in da vse parametri je optimizirati, in da je minimizirati vse energije. V kvandum stajte je občast, in v prakčenih vse je občast v komutatijenih basej, da je vse komutatijenih basej, kaj je vse vse vse latiče in vse vse latiče. And in this way we can define the many body wavefunction that is project that we try to approximate. So the point is how we can choose this parametrization. In general ,the wavefunction is a map between the spatial configuration and complex number. And during these years there was a lot parametrization and some network, and some year samples that we had Karloj and Reuer proposti, da je vzpečno zelo vzpečno, ko je zelo vzpečno, in je to zelo vzpečno. Kaj je najbolj zelo, da je zelo vzpečno zelo, tudi je vzpečno, da je zelo vzpečno, pa je spokratil tudi, kako lepo navukaj nekaj, in je tudi tudi, kako lepo nekaj, pa je tudi tudi, kako li nekaj, neoronetvorek quantum states, in včetno je noliničnja mapa, kaj je definičnja asa seqenčnja aplikacija noliničnja in noliničnja fungšnja za initivne konfiguracije. Če je to zelo zelo izgleda, da je teorem, Universov approximation theorem, zelo zelo zelo izgleda, da nekaj parametri nekaj fungšnja nekaj parametri nekaj parametri nekaj parametri nekaj parametri nekaj parametri nekaj parametri nekaj parametri nekaj parametri nekaj parametri odwrot w modernima étn� Recrepe Klav Ceke in tvoje grine triangles, z nekaj fungšenih. Problem v kartizijskih koordinacijah potrebno je soljeno, ali, kako smo izgledali v povoljnih koordinacijah, je zelo veliko zelo. In zelo, in povoljni koordinacijah zelo povoljnih koordinacijah zelo povoljnih koordinacijah. Tvoj povoljnih koordinacijah je zelo genučno. V prinspečnih povoljnih koordinacijah zelo povoljnih koordinacijah zelo povoljnih koordinacijah zelo povoljnih koordinacija poradljenih praying zelo povoljnih koordinacijah. Galger stripə mrdhoj s blajem ki povolj카 accidenta in johto milij d dramatically nekako Ebija. včetko vse bo izgovorili, da je pristim, da je izgovoril, in pristim je, da je izgovoril. In, da misli, da ne bo v tom,巧 je vklah v vseh vseh vseh vseh, sem tukaj vseh, da vseh pravimo izgovoril in izgovoril. Zelo včetko vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh vseh. Kaj, zelo, da je tukaj potrebovačnog netvorega. V tej modli smo zelo, da je zelo, da je tega netvorega, in da je tega netvorega, da ječ tega in očadnje injela, objevčaj, da je to vektor in očadnje, da je to tukaj očadnje in reprezentacija. Nama we izvečimo, da je to reprezentacija, da je tega reprezentacija. In ideja je, da v tem spasih je zelo sempljano. A početno, da imamo ideje v tem spasih, početno imamo netvore, da se vsega vsega vsega vsega vsega vsega vsega. In potem izgledamo vsega spasih. V tem spasih je klasters, in v klastersi imamo vsega vsega vsega vsega vsega vsega. vse trakje in tudi klaster, in tudi hodnje in so ono. In so in tudi, kdaj smo prišli, vsega klasifajera na toga reprezentacija potrebno objevamo zelo. Zato sem izgleda, da tega piktura tudi za kvantumne spimodere. Then, now that we know this framework of the feature learning we can design our neural network and don't state with the same idea. So, we start from the physical configuration and we use a deep neural network to project to system with configuration in a feature space defined and representation, and then we act with a shallow network on these representation to kot danes ima nesej kaj Sooner, kaj je to, da je najbolj, ako ta je kaj vse začel, kaj je tudi na zelo, nače če zrečno, nekaj so pošli novost in začeli nekaj je začeli, eng Teach. Tako, bo pri srpama začala, mene to je zelo tega, lega vs. Preživamo, da želim izgledaj, zato je pačen očasnje in očasnje, in je prišlično, da se je večo počasnje, vse zelo ga učasnijo, kjer je učasnje očasnje, zelo ga učasnje zelo, in očasnje boljši boljši način. Načasnje očasnje očasnje. Tako, nebo tudi, očasnje očasnje očasnje je samo, očasnje očasnje, nača je karleo in trojer. Sveti je vzelo včasnje, In tukaj ne je vsega vsega na konfiguraciju in vsega vsega na reprezentaciju idenja. Tukaj je reprezentacija vsega, kaj smo spetili vsega vsega vsega vsega vsega vsega na reprezentaciju. tudi je naprejčo vsečenje konfiguracije v delovu spasji, vzvečenju spasji, in je tudi, da je začelja konfiguracije vsečenje v tako zeločenje. In vsečenju, da se pričeš, naredimo počke modelu, modelu iz Heisenberg v square lattice, kaj smo zeločili vsečenje vzvečenje vzvečenje, kako je obtajeno z Kwandu Monte Carlo. Zelo smo načinati konfiguracij, in za kaj konfiguracij smo načinati in reprezentacij. To je zato, da je to vsečenje vsečenje vsečenje in vsečenje vsečenje vsečenje vsečenje in vsečenje vsečenje vsečenje vsečenje vsečenje vsečenje vsečenje vsečenje vsečenje. Vsečenje konfiguracije je zato, da je to konfiguracije in zato je zaprezentacija zato je obtajeno konfiguracijo, da je načinati vsečenje vsečenje vsečenje vsečenje vsečenje in vsečenje vsečenje vsečenje in vsečenje in vsečenje In unde zato 보면imo vsečenje nekaj konfiguracije nekaj konfiguracije in unko neko krajoha res se odtježimo nekaj konfig сожал postavi danes nekaj plywoodin in maybe optimizacijo automatikalno, je to vse postavljenje klustrov, kako je vse občasnje. Vse počke, da počkaj smo vse vse vse konekte, na to, na štah, je zelo počkaj, je zelo počkaj, da je zelo počkaj vse občasnje. A počkaj, da se počkaj izgleda, je zelo počkaj, je zelo počkaj, z 10x10 latičke modeli na 0,5. To je model, ki je povrstavila. Zato zelo smo vseh zelo, da se vseh zelo. Zelo smo povričati vseh izgledaj. To je bilo vseh zelo. Vsleda činesi grub, da se vseh zelo, 40 miljonči GPU kors v Slovenijskih superkomputerih v Čina, da se obtenezaj resalt. Wse vsega obštavne metoda, ki je demergiv, što je vzelo, da vse model je zelo temoveli. Zašliče zelo, da se prišli, da bi se začali, da to prišli vsi vso, in da bi se všeč prišli, tko bi se vsega s Sandor-Sorella Federikova v Beka, v kaj se so vsega vsi 5 parametri, kaj bi se vsi vsi vsega kura. In this is the equivalent of what I said at the beginning, they find a way to extract by hand the right feature of the model. However, this parameterization is not general, and so it is very well for this problem and not for a generic one. Instead we are looking for a very generic parameterization, then there are other approaches that as you see I described this morning, ko pošličaj nekaj način nekaj netor, in kar pričočenih vsejga šeko, da je vsejga šeko, zelo, da, da, pa neko vsejgaš, da pošličaj nekaj netor, da je zelo zelo, je, da tako zelo, je zelo, nekaj zelo, da vsejga vsejga vsejga vsejga vsejga vsejga. In vsejga, da, da, zelo, ta vsejga vsejga, da, da, spesivnji problem. We start with a model that has a phase transition, and we optimize all the network close to the phase transition. In this step is called pre-training. Then we freeze the parameters of the deep neural network part. And we just saw the representation are fixed, the cluster are fixed, and then we just optimize the fully connected layer at the end. In this step, this is called fine tuning in the other point of the phase diagram. And this is some example of the results. We optimize, this is for the ising model on a chain of 100 sites, and we optimize at magnetic field equal 1, that is where is the phase transition for this model. And then we fine tune in all the other point, and we are able to match for the square magnetization that is the order parameter, the energy calculation. And we can also check what happened in the feature space. So the cluster are fixed, and the only thing that the fully connected network can do is to change the relative weights of these clusters. So at the transition we have two cluster corresponding to all the spin up and all spin down, and then also the cluster have a larger amplitude. Decreasing the magnetic field, we have a symmetry breaking in the wave function, and basically all one configuration matter, and all the other ones are very small amplitude. Instead, in the opposite direction, when the magnetic field is large, the amplitude corresponding to the all spin up configuration, all spin down is decreasing, and all the other ones start increasing, and this is the right behavior, since in the limit of an infinite magnetic field all the configuration have the same amplitude. And so this feature of the ground send can be obtained just looking at this feature space. Then we move to more complicated model, the J1 G2 Asenberg model on two geometry, the chain on 100 site, and in this case there is a Beredziski Kostak Stavlis transition, and in the thermodynamic limit, the transition is at zero to four, but it's very difficult to detect by finite site scaling, so we fix the dimension of the cluster and we perform the optimization with the MRG, and then we pre-training at 0.4, and fine tune in all the other points, and we are able to match this result, and then we also test the same fully connected network, the accuracy, optimizing it on the hidden representation, that is this green line, and then the same network, but acting directly on the physical configuration, so just an RBM, and we obtain two order of magnitude back ten result with the fine tuning procedure, and then we also focus on a square geometry on the six by six to compare with exact diagonalization, we pre-train at 0.5, and then we calculate the two order parameters, so the magnetization of the null order, the magnetization of the striped order in the other points. Now I want to focus on the one-dimensional chain, and this is the energy obtained by the MRG calculation, in the second column there is the energy obtained, optimizing from scratch each point with the vision transformer, and this is the result obtained by fine tuning the pre-trained network at 0.4, and on every point, on each point we are able to obtain an accuracy that is order ten to the minus three, that is good enough to study the physics of the model. And again we can look at the hidden representation. For the easy model, the ground state is positive, definitely in the computational basis, instead in the J1J2 case we have frustration, so we have a same structure, and at 0.4 the Marshall sign is not exact, but on 100 site is a good approximation, so when we fine tune it at 0.1, the color of the cluster remains the same, and they match the Marshall sign rule. However, when we fine tune at 0.6, the fully connected network have to change the sign of the configuration inside each cluster, since it cannot modify the shape of the clusters. However, it is useful to understand if this fine tuning procedure is useful or not, in this case when the network have to change a lot of sign. So we test, we perform this three different optimization, so the same network, but in three different setup. First the orange curve, the RBM training of the physical configuration, then we use the Marshall sign prior, since we know that the network is learning this Marshall sign prior, and it is the blue curve, and then the same network that apply on the hidden representation generated by the deep neural network, and we have one order of magnitude better, accuracy with respect to just the Marshall sign rule. So the idea is that this pre-trained network is not just giving the Marshall as bias, but is also giving some information about the structure of the amplitude, some information about the symmetries of the model that during the fine tuning step, the network can use to reach better accuracy. And then I conclude, so the neural network quantum state can be used to perform the pre-training and then the fine tuning procedure. I did not stress this in the presentation, but the fine tuning step is, as a very computational low cost with compare to the optimization from scratch in each point of the phase diagram, and then using this approach, we can have an interpretation of what happened during the training in the hidden space. As future direction, we want to test what happened if we perform the fine tuning across different models, pre-training, what model and test in fine tuning in other one, and for sure we need to have a more clear idea on how to perform the pre-training, but the idea is that the fine tuning works, so we have to understand how to perform the pre-training. And then there are also maybe application on real time dynamics and we optimize the deep neural network in some point, and then we perform the dynamics just moving the parameters on the hidden space. And I just want to thank to my collaborator, Ricardo and Federico, that is my supervisor, Alberto and Sebastian, and thank you for the attention. Thanks a lot, Luciano. We have time for questions. So what's the dimension of your future space? Is always two? What? The dimension of the future space. No, no, no, the future space is dimension of order 100 to 100. This is just a projection on a two-dimensional space with some dimensional reduction technique. But can you determine like an optimal value of this small d? So in principle, for example, with PCA you can check if there is a gap in the spectrum, and then if there is a gap, then there is an optimal dimension. In some case, we check and there is a gap for a simple model, say for December model with PCA, there is a gap, but for more complicated model only with PCA you don't see a gap in the spectrum. Abade me with all the other techniques, you can reduce the dimension. Thanks. So there was this plot that you showed on the J1, J2 chain, again referring to the clusters, exactly that one here on the lower left, where if you're in the unfrustrated or nonfrustrated regime, you can clearly distinguish clusters between zero and pi phase, whereas when you go into the frustrated regime, then obviously these are now different clusters, so do you understand physically what is actually the criterion there? So the point is that if you optimize from scratch 0.6, you for sure you will find another clusters that are different. So you can also, the network always separate zero and pi configuration, but it is not easy to understand the structure of the sign, so you can check which configuration are mapped in some cluster if you are able to find a rule, but it is not easy. So we know that the network has some rule to do it, but it can be useful to try to extract this rule in some way after the training. Yes, yes, since if you perform different optimization, the energy differences are compatible and the mapping of the configuration are the same. Maybe the cluster looks different, but there is some mapping between the two, so the configuration in each cluster are the same on different simulation, yes. It would be better to see what's going on. What? It would be better to see what's going on. Yes, yes. I was wondering, would you expect that this works also for disorder configurations? I know that you cannot compare really with benchmarks. Yes, in principle, yes, so there are, in principle there are no limitation for studying disorder system, yes. Okay, thanks. Thank you. In the first part of your talk, where you introduced the hidden dimension, here you showed that for the first part you're using a vision transformer and then the RBM. What's the motivation to use both and not, for example, RBM and RBM were both times the transformer? So, the choices for, so there is that the deep neural network is performing the mapping. Now, if the mapping is good, then you only need a small network to detect the amplitude. So, for this reason, we choose a simple shallow network. Since the idea is that all the work can be done by this deep neural network to find a good representation. And then you act with a simple network to find the amplitude. Then the choice of the vision transformer is just a choice. You can also use CNN on whatever you want. In principle, the idea is the same, that is the basic idea of the feature learning from the machine learning community. So, also the restnet on transformer are also always based on this structure. When the classifier act on the representation generated by the deep neural network. Okay, thanks. Maybe the last question if there is. Not let's thank Luciano again.