 The numbers are still growing. OK, maybe we can start now. So hello, everyone, and welcome back to the Hitchhiker's Guide to Condense Matter and Statistical Physics, so dedicated to machine learning in condensed matter. So this is our first day, and today we dedicated to machine learning many-body quantum physics. So today, the main lecture will be by Giuseppe Carleau from EPFL later on. But before, we start with basic notions. So follow me, and basic notions will be given by Filippo Vicentini, who is a collaborator and collaborator of Giuseppe Carleau and also from EPFL. So as usually, we'll ask questions and you'll put questions in the question and answer box, and we'll try to answer all of them during the lecture. So I give the word to Filippo and enjoy today's lectures, everyone. So Filippo, please. Thank you, Asya. Let's see. OK, so thanks, everyone, for the opportunity of being here, giving this introductory lecture to machine learning for many-body quantum physics. You know very well what's the format of today's lecture. So now I will be giving this introduction. I will really cover two topics. So what are neural quantum states and how we can use neural networks to encode the information of quantum states and to represent them, and then how we can use this tool to solve some interesting problems in quantum mechanics, such as finding the ground state or performing the time evolution. While this afternoon, like while soon after this introduction, Giuseppe will be giving a talk on a seminar on exciting new developments in the field. So yes, so the talk is divided into two parts. This first part is more about the neural quantum states, while the latter on the problems we are trying to solve. And I will take some questions in the break between the two parts, and then at the end. So I think that I don't really need to motivate anyone, any of you, about why we are interested in quantum physics and why we want to study what systems. I mean, you all know how exciting it is. The developments are nowadays on quantum computing, on experiments on high temperature superconductivity. There are experiments, and there are theories studying how chlorophyll and the mechanism to convert photonic energy into chemical energy work by using quantum physics. So behind all those problems, all those fields, we have a framework of quantum physics that we use to describe those systems. And of course, the most important thing, the most fundamental part of quantum physics maybe relies on the fact that we want to describe those states. So I'm sure you know it very well already. When we try to describe the quantum systems and the state of this, we have a problem because this is very complicated. So imagine we take a system composed of spins, so particles that are either down or up. So it's a binary state, 0 or 1. And consider a system made of n different spins. So if I just was to describe such a classical system with a classical state, then I would need to describe the state of every single particle composing the system individually. So I want one bit of information for the first spin. It can either be down or up, so I need one bit. The second particle is also described by one bit and the third and so on. So if I have n particles, I need n bits of informations. And therefore, if I increase the size of my system, I will need the same number of bits of information. And therefore, it's relatively easy to store on the computer in the memory of my computer millions and billions of particles, state of those, because the growth in the memory requirement is linear. Instead, in quantum physics, so if I want to describe this as a quantum system and its quantum state, I need to serve a wave function, which you know very well is basically a complex number associated with every possible configuration. So I need to store a complex number corresponding to the up, up, up configuration, to the up, up, down configuration, and so on and so forth. And this complex number is somehow a probability distribution with some information about the phase. And the problem is that the number of possible combination increases exponentially with the number of particles in my system. And therefore, the memory requirements of storing the wave function increases exponentially too. And this is a problem, because nowadays on my laptop I can store around a bit more than 30 particles, I think. But then I would need to soon use a cluster or a supercomputer. And there is no way, there is no supercomputer on Earth that can store more than, I think, 50 or 60, 50 particles, the state of 50 spins, for example. And I doubt that in the next years we will be able to solve this issue just by technological innovation, because this increase is exponential. Every time we add another spin, every time we add another qubit, we need to double the amount of memory we need. So this is just a formal description of what is going on here. Essentially, the wave function can be written as this psi, this wave vector. And basically, I need to store one complex number for every possible element in the basis. And in this case, I chose as a basis the up, down basis for every spin. And this is the same thing. So when I do this, what I'm doing essentially is I'm describing, I can describe any possible wave function in the whole Hilbert space. So the Hilbert space contains many possible wave functions, all of them. And many of those are not interesting for what we are doing. So very often we are just interested in the ground state of the system. Sometimes we are interested in highly correlated phases of matter, because we can build technological devices with them, and so on and so forth. And most of the states that we are interested in are structured to them. So they respect some symmetries. So actually, there are fewer degrees of freedom within them. They have some interesting correlations. And therefore, I don't really care about describing all possible wave functions, but I only care about describing a subset of them that are physically relevant. And for one of the first proposals that was brought forward, the still at the beginning of quantum mechanics was that, in general, I don't really care about describing the whole wave function and all its entries, but I could actually write it down, write with a wave function as some function that instead of storing every possible entry of the wave function, I just store its dependency on some parameters. And those parameters should be fewer than the number of degrees of freedom of the space. Essentially, what is going on is that I have some parameters, hopefully fewer than the size of the space. I feed them the wave function. So actually, I feed them to a function, an arbitrary function. And then by doing this, I can actually compute every entry in my wave function. So of course, this is interesting if it can solve my memory problem that I talked about before. So essentially, if the size of the space where those parameters live is polynomial in the size of my system instead of exponential, then I kind of addressed this issue because now I only need a polynomial amount of memory to store my state instead of an exponentially large amount of memory. At the same time, if I can do that, but to compute expectation values or any physically relevant quantity, I still need to perform this sum over the whole little bit space. Then I didn't address my problem. I actually just hid it because I will still need to do an exponential number of operations on it. I still need to perform this sum on my computer. So actually, there are two classes of variational states, variational in the sense that they depend on those variational parameters with W. So there are computationally tractable states, which are states such as mean field, good swivel mean field, or matrix product states, where I don't need to perform this whole sum of the variable space, but I can recast it as some operation which is of polynomial complexity. For example, matrix product states remap with some over some product of matrices which have a polynomial size. Mean field, for example, just doesn't on all the possible entries, but just on the whole local Hilbert space, and then you just perform a product, and so on and so forth. However, in general, this function could be something where we cannot, which we cannot do that. So imagine this size in our network, or an arbitrary non-linear function. It's not easy to recast this sum in something that we didn't know to try to treat exactly any polynomial time. So this second class might belong to computational efficient states. So for a variational ansatz, for a function to be a computationally efficient state, we need to address two requirements. The first one is that it must be efficiently valuable, which means that I can compute the wave function, given the set of parameters, and given a certain element of my basis in polynomial time. And this is generally true if I have a polynomial number of parameters, in general, the number of operations that I will need to do will also be polynomial. At the same time, I also need to be able to sample configurations from the square modulus of the wave function efficiently. And this is not so trivial because the probability distribution induced by it is this function. And you might see that at the denominator here, I have the norm of the wave function. So unless my ansatz is normalized already, and therefore I know this entry, I will need to compute this sum over an exponentially large space. So I believe Giuseppe might mention later some ansatzes, a class of ansatzes where the denominator here is easy to compute, but in general, it is not true. So we need more advanced techniques to do that. So if those two requirements are met, that is an interesting theorem, and it's actually very trivial to prove, that it is possible to compute any expectation value of a k-local operator with polynomial accuracy in polynomial time. So k-local operator is a notion that comes from information theory, which corresponds in terms more in more physics term to an operator that has at most k-body interactions. So physical Hamiltonians and physical observables, in general, are k-local because we only consider one or two particles interaction. Think about the Ising Hamiltonian, we have two-body interactions. In some cases, we have three or four-body interactions, but usually they're just like there is a notion of locality in our models. So the proof of this is extremely simple. It just stems from the fact that if I start from the definition of an expectation value, and then I insert here the identity, I can just essentially expand here on the whole inverse space. And then if I divided both sides of this equation by psi applied to sigma, essentially what I'm doing, I'm collecting here one term, actually this term, sorry, which corresponds to a probability distribution. If you notice, essentially this term here, it's very simple to see that p of sigma belongs in air is a real number. It's positive because I took an absolute, a modulus square. And it is in the interval between 0 and 1. And it's also normalized to 1 if I sum over the Hilbert space by definition. So you see that here I have a probability distribution, and then I'm multiplying on the other side by this o-lock. This is a local term. And I claim that we can compute this term in polynomial time. And this is because even if here you see this sum of eta, which means over the whole Hilbert space, so I would have true to the n elements in theory, in practice, if my operator is k-local, what happens is that I'm fixing a row, the row corresponding to the sigma entry, in this big matrix. And then I'm looking at all the non-zero columns in this matrix. And if it's k-local, most of its entries are zeros. So there are only few, polynomial few, non-zero entries. And therefore, this has polynomially few non-zeros. Psi, we said that already it's an hypothesis that we must be able to compute it efficiently in polynomial time. So we can compute those entries in polynomial time. Now, of course, I still have this sum here over the whole Hilbert space. But we can also address that. The reason for that is that if you notice, what they've wrote here is a sum over sum variable time of the probability of this item times a quantity. So this is exactly the definition of an expectation value or of a statistical average. Some people might write it with an e of o-loc. So if I'm able to extract elements of the Hilbert space that are distributed according to the probability p of sigma, I don't actually need to perform this whole sum. I can just average o-loc, this local estimator, over a smaller set of elements, hopefully polynomial few. So actually, the question is, can I sample this efficiently? Because once I do, then I can compute the average with the, as a mean. And the error will go down as the inverse of the square root of the number of samples that they took. So of course, if I take infinitely many samples, my estimate would be exact. And I can control the accuracy with the number of samples. The question is, how can I sample this efficiently? How can I sample the square modulus of a wave function efficiently? The problem is, as I said before, is that this denominator here, sorry, so this denominator here cannot be computed easily, at least in general. So there is a technique which I believe someone already mentioned before in last week lecture, I think, which is called Metropolis Hastings Monte Carlo, where if we can compute not the probability of an entry, but actually the function that is proportional to it, so in this case, just the denominator of the probability distribution, what I can do is then generate a chain of elements, a succession of elements, sigma 0, sigma 1, sigma 2, so on and so forth, that would be asymptotically distributed according to the probability distribution that I'm trying to sample. So I mean, so the algorithm is very simple. You start from an initial configuration which you can generate however you want. Let's say you pick a configuration random, let's say, sorry, let's say up, up, down, and then at every iteration what you do is you propose, according to some rule, a new configuration. So let's say that my rule is I pick one of those pins and I flip it. So this is my sigma, let's say 0, my sigma prime will be up, down, down, so I flipped the second spin. And now what they do is I compute the probability of corresponding to sigma 0 and I compute the probability corresponding to sigma prime. If the probability of sigma prime is greater than the probability of sigma 0, it means that I went in a direction where there is more probability and then I accept the new move and I will repeat the algorithm starting from this new state. If instead the probability decreased, what I do is that I don't reject it outright but I have some probability of rejecting it. I exponentially suppress the probability of accepting it. And if I do this many time, so if I repeat this operation, I will obtain a succession of states. And let's say apart from the beginning where there is some correlation with initial state, whatever comes after will be distributed according to the probability I'm trying to sample from. Of course I need a sufficiently large sample size. Of course my transition rule must respect some properties of my system but in general this method works and is very powerful. So I don't want to get in the details of why it works. Let me just say that it is derived from detail balance or microscopic reversibility, which essentially is the idea that if I'm at equilibrium and I have a certain distribution of the configurations of my system at equilibrium, then the probability to go from, the probability to be in a state and to go from that state to another state must match the reverse, the probability of the reverse process. And by doing that, and the only thing you must keep in mind is that you want to, if you define the probability to go from one state to another as, and you split it with the probability of proposing a move and then the probability of accepting it, you can actually derive this acceptance formula that was proposed by Metropolis in Astin's back in the 70s. So what this algorithm does essentially is that even if I cannot compute the full, even if I don't know the full probability distribution, but I can compute some, let's say the nominator, so I can compute ratios. Since I'm computing a ratio here, the normalization factor, which is constant factors out and I don't need to know it anymore. This is very powerful. So imagine for example, that I decide to use as a variational function, a neural network. So what I can do, for example, at least I can pick a very simple two layer feed forward networks, a restricted Boltzmann machine where the input of this machine is, I don't know, a bit string of up, up, down, which would correspond to one, one, zero, one, something like that. And then I multiply it by a matrix, W. I add some bias and then I pass it for an ordinal linear activation function, usually a lot of course, but you can use anything actually when you sum the output. What this is doing is, I mean, this is a neural network. And the reason why it is a good idea to use neural networks to do this and wish we have shown there are several results nowadays in literature showing that this is a good idea to do this. It's because neural networks are very good at capturing correlations in your system, at capturing some hidden correlation within whatever is the input that you're feeding. And therefore they can efficiently compress the information and instead of you needing to store this exponentially large state vector, the weight function, you can actually just store a few, I mean, still a lot, but much fewer parameters. So this W and this beta in this case. This function, as you can see, it has a polynomial cost to evaluate it. I mean, you just need, I mean, this is just a matrix vector product and those matrices have sizes of the order of square of a number of spins in your system. Log cost is a function that you apply linearly. So I mean, it has some fixed costs for every entry. So this is, this can be, these are aspects of the first condition that we asked and I showed you that we can also sample from it efficiently. So these, which we call neural quantum state. So using a neural network to describe the variational quantum state actually satisfies all the requirements that we asked for. And that means that we can use it as a valid variational state. So with that, before moving on to the second part where I would be talking about how we can use this technique to actually do some, to actually do something interesting and actually solve some problems I will start to take a few questions maybe. So let me check. Yeah, there was a question and answer box, so you want to. Yeah, just a moment. So could you again say what do you mean by sampling? So by sampling, I mean that what I've shown is that I'm taking, I'm rewriting the operation of taking the expectation value of a quantum operator which traditionally involves a sum over the whole Hilbert space into a statistical average of a quantity O log, which depends on the entries. And I'm averaging this quantity. You can think of it as a random quantity over some distribution. And this distribution is the square modulus of the wave function. So unless you can, so sampling means that I want to take configurations, elements from my Hilbert space. So configuration like, I don't know, up, up, up, down, up, down, up, down, configurations like that, according to their probability. So imagine that this is my Hilbert space. So here I have up, up, up, down. It's not working. Sorry. So imagine I have this is the direction of the Hilbert space. So here I have up, up, up, down, down, up, and down, down, and here I have probability. So imagine that my distribution of probability is this, something like that. I don't know, just random numbers. I want to be able to extract to have a set of configuration that are distributed approximately according to this distribution. And sampling means this, extracting this set. So this is classical Monte Carlo. Do you have any comments, ideas on quantum Monte Carlo? So the idea here is that this is classical sampling. So I'm classically sampling from a distribution. So in this metropolis algorithm, we assume no interaction. I'm not sure what you, what you're asking, because now I'm just trying to sample from a distribution. So this is simply a technique to sample a probability distribution. I'm not assuming an underlying model. I'm not assuming anything. I don't know if maybe you can, we can unmute Lavi Kumar. So maybe you can ask the question. Yeah, so Lavi, you can. Yeah, hi, Filippo. Thanks a lot for this talk. So yeah, so actually, my question was that these underlying spins, do they interact? So that was essentially the question when you were sampling them with the probability which you actually explained very nicely how the sampling is going on. So I was saying these underlying spin chains, which you flip, do they have some kind of this local interaction among them or it's just that we have like a distribution of these spins chains? So what I'm doing now is that I'm not talking about any model in particular. I'm just saying that how would you go if you have a set of parameters for your variational state and compute expectation values? So in this, I'm not doing any assumption of what the model is. And the spin chain on which I'm doing the sample is actually configuration like basis element of my Hilbert space. So I'm sampling basis elements. This is completely unrelated from the physical systems I'm studying. Okay, thank you. So another question. What is the difference between neural network and restricted Boltzmann machine? So in a restricted, this is very simple. A restricted Boltzmann machine is just one particular type of neural network. Like it's basically, I think it was already talked a lot about in the first lectures you had in this course two weeks ago. So that's why I didn't really talk about it, but essentially this is very simple neural network where this is the input layer. This is the output layer and I only have one intermediate layer in between. The restricted Boltzmann machine is just a name for this particular kind of network. In general, you can add many more layers. You can add some very particular interactions to it. And yeah. By avoiding the exponential cost, are we losing any kind of information? Okay, indeed we are doing two things. The first thing we are doing is that we are parametrizing the Hilbert space with some function. And of course this function is not able to represent in principle every possible way function. It will only represent the subset of it. So yes, of course I'm cutting away parts of the Hilbert space that I in principle do not care about. A way to see that are for example, I don't know the mean field answers. The mean field answers cuts away any state that has quantum correlations between different sites. Matrix product states bound those correlations. So neural networks remove parts of the Hilbert space in a much more contrived manner that we don't totally understand, but they are doing exactly the same thing. This is for the answers. Then for the sampling, what I'm doing is that I'm losing some information about the expectation value. So I don't know the exact value of the expectation value anymore. I only know an estimate. We have a certain error. So this is where I've hidden away the exponential complexity. In neural quantum states is the algorithm able to choose physically relevant Hilbert space by some means like probability or do we provide it already with a wave function with variational parameters when it calculates the expectation? Okay, so what they was talking, again, what they was talking now is just about the neural quantum states in general. So it's just an efficient way like matrix product states or mean field to parameterize the Hilbert space. In the next part of the lecture, I will be talking about how we can determine the parameters that give us the state that we're interested in. So how we can determine the parameters for the ground state and so on and so forth. So basically we have an optimization problem. Okay, so then that's the cost of calculating the wave function drop if we consider only symmetric and anti-symmetric wave functions in case we had the identical particle systems. Yes, indeed. So if you actually insert some information about the structure of your problem into your neural networks. So for example, imagine you want to describe a standard contestant matter system on a laptop and this system has some point symmetry or translational symmetry. Then you can, for example, you can insert, you can actually see that the translational invariance reduces the size of this W matrix in order to be so that the output is invariant under translations. And actually it reduces by a factor of N in 1D because you have N possible symmetries now. And therefore it does reduce the cost. So in general, it's a very good idea to use this information to reduce, to further constrain your neural network or your variational answers. What is meant with polynomial accuracy? I mean that the accuracy depends polynomially on the number of samples that they take. And therefore if I have, I have, and samples the accuracy goes down as one over script of N, not exponentially. So for example, if you are near a phase transition and you want to determine unobservable with exponential accuracy because this allows you to tell in what phase you're in, for example. I mean, you would need an exponentially large number of parameters, of samples. Okay, so is it sure that the neural network is taking advantage of the fact that samples in real-world data sets actually live in a small subset of a Hilbert space which we human cannot recognize. I'm not sure I understand the question. So the whole Hilbert space is a space of wave functions, right? And the wave function is actually a space and the wave function is actually a vector in this space. So can you just, Heyman-Zao, could you just, can you unmute Tim and maybe you can ask your question? You can ask the question. Hello, can you hear me? Hi, yes. So what I mean is that those samples, those data in the real-world data sets, they actually, for example, all the data are like water molecules. They only live in a small subset of the Hilbert space. So their wave function actually is in a small subset. So, but we cannot, we don't know the correlation between those molecules. So we don't know what the actual subset is, but neural network can help us capture those information. So it's actually implicitly finding the subset. So is that true? Okay, so with neural networks, it is hard to say exactly what part with neural quantum state, it is hard to know exactly what part of the Hilbert space we are parameterizing. So there are studies about it, but it's still hard to say exactly what part of it we are parameterizing. But in general, what we are seeing is that this parameterization does capture, is able to describe physically relevant states. So... Okay, so that means that makes it more efficient to come to compute, right? Yes. Oh, okay. Okay, thank you. So I'm just... You have 25 more minutes. I'm not sure how much more of the... Yeah, I'll just go on with the second part of the presentation and then in case I will finish answering the questions. Okay. So in this second part, I want to address a bit some, I want to talk about some problems that we can solve with neural quantum states. So in general, we can think of, I can think of two very broad class of problems. One of those is that like, one of those is when I want to determine, for example, the ground state. So the weights corresponding to the ground state of an amiltonian. You give me an amiltonian to find its ground state or I need to determine the state as evolved by this amiltonian, something like that. And the second category of problems is reconstructing a quantum state. So imagine you have an experiment and you don't know exactly the state of your system when you do this experiment, you want to determine it. So what an experimentalist can do is he can do several measures, local, like local, whatever, but eventually he has a set of measurements. He knows the basis on which he performs measurements and he wants to train a neural network in order to describe in order so that it describes the state of his system, but a priori doesn't know. So I think I'm not sure that Juan Carrasquilla, we talk about this second application next week. Today I will focus more on this category of problems. So again, the first thing I want to talk about is determining the ground state. So you all know that the ground state is the low, it's the eigen state of the amiltonian with the lowest energy. And so determining essentially what they want to do is given a neural quantum state. So a state described by a certain neural network, you fix the architecture, that's a restricted Boltzmann machine. Now I want to find the parameters, the set of weights, WGS, and best approximates the ground state. So to do this, I want to recast the problem of finding the ground state into an optimization problem. And this has been done a long time ago by, with the formulation of a variational principle. Essentially you can notice that the energy is an observable, so it's real. It's, we know that the state, of all possible states in the Hilbert space with the lowest energy is the ground state. So that given any possible set of parameters W, the energy would be higher, greater or equal than the ground state energy. So if I find the set of parameter that gives me the ground state energy, then I know that I have found the ground state. And in general, when it's true that the lower the energy that a set of parameters gives me, the better is the approximation of the ground state. So in general, what we are doing is, we want to find the set of weights that give me the minimum, the smallest energy. So this is really an optimization problem. And therefore it can be addressed with either global optimization techniques where you evaluate the energy on every possible configuration, on every possible set of parameters, and then you find the one with the lowest energy. Of course, since the space where parameters live is very highly dimensional, not exponentially large, but still it's, I mean, I can usually have two, three, four, 500 dimensions, thousands of dimensions. I cannot do this in general. So what we try to do is we use iterative optimization techniques. Essentially you start from an initial set of parameters. You just throw them at random, like you have an educated guess about your initial parameters, W0. For those parameters, you can compute the energy. We know that this is efficient. Then we compute the gradient of the energy for these parameters. And we use this gradient to actually optimize to correct the parameters. So essentially at every iteration, we take our parameters, we subtract the gradient, multiplied by some speed or learning rate, as we call it, and the optimization rate, if you want. And then that way we generate the new set of parameters and we do this on and on, so on and so forth. Now it is interesting to notice that you can think of this equation as delta W divided by some sort of discrete time delta T is equal to minus eta, the gradient of W of E. So essentially what we are doing is we are solving some sort of differential equation, discrete differential equation for the parameters W. And we are going down the potential well induced by this energy. Of course, the first question you might have is can we compute this gradient efficiently? Because I just told you that we can compute the energy, but not its gradient. So this is quite easy to show. Essentially, I'll go back to what I showed you before. I was telling you that the energy isn't observable, no? So we can rewrite it as with some of the whole little bit space of the probability of an entry times a local estimator, this E-lock, which can be computed efficiently because the Hamiltonian has only at most one, two, or let's say few body interactions. And then I can rewrite the expectation value of energy as the statistical average of this E-lock. Now the gradient of the energy is then this vector where for every entry I have a derivative with respect to one parameter. So if you want, you can also think that this is basically D in the W1 of EW1, W2, et cetera, WN, and parameters, let's say, right? So this is just a different way of saying, and so those are just derivatives. And in general, I think you've seen already the back propagation rule, you know that we can, there is a way to efficiently compute the gradient of the neural network with respect to its inputs or its parameters by using this back propagation technique. And it has roughly the same order, the same cost complexity as evaluating the neural network itself. So I mean, so computing every entry of this vector, it can be done efficiently. And in particular, you can show by doing some algebra that it's not complicated, but we just take some time. The gradient can also be written as an expectation as a statistical average. So you have a statistical average of E log, again, times some okay. And this okay is the gradient of the log derivative of our ansatz or neural network. So I told you that we can compute the derivative of a neural network efficiently. Here, I'm just taking the product and I'm averaging. So again, those are statistical averages. So I'm not doing the whole sum over the Hilbert space. So these in theory is the sum over sigma, P of sigma, E log of sigma, okay, W of sigma, but I'm not doing the whole sum. I'm just sampling sigma once and then taking the statistical average over this small finite polynomial large subset of basis elements. This means, however, that the gradient that I have estimated is not exact. It's not the exact gradient like what I had written here. Instead, it's a noisy estimate. So I have the expectation value, but in my, let's say, equation of motion, the equation that I used to update my parameters, I have to keep track of the random term that they, let's assume that the error is distributed as a Gaussian with a normal distribution. Then the error I know goes down as the, like the variance goes down as one over the number of samples I've taken, right? So if I take infinitely many samples, this error goes down to zero. There is no noise. The equation is exact. But if the number of samples is finite, this is not true. And this is actually very interesting because this really starts to look like the Elangevan process. So basically the equation governing, like the approximate equation governing the motion of a particle in a potential well, like the potential well is determined by this. And for us, basically it corresponds to our energy functional. While the noise term for a particle, for a Elangevan process depends on the temperature of the medium where this particle is. So while in physical terms, this would be a temperature. For us, this temperature is essentially set by the number of samples, like their inversely proportion. If you think about it, if I take an infinite number of samples, my temperature is zero. So I do this optimization, this motion is exact. Instead, if I have a finite number of samples, I would be at a finite temperature. And this is very interesting, no? Imagine that the potential we are trying to optimize is something like that. So I do my optimization, so this is energy and this is whatever parameter we have. So imagine I do my optimization and I fall down in this local minimum. If I'm here, in principle, if my gradient is exact, this is a local minimum, a gradient is zero, I cannot get out of it. But since I take a finite number of samples, I have a finite temperature. So there is a certain probability by which I will get out of this well and continue my optimization and eventually fall down in hopefully the global minimum. So this is what happens when you don't do a gradient descent, but you do a stochastic gradient descent or an approximate gradient descent. And this is one case where doing these things like approximately actually helps us because it helps us not fall and not stay inside of local minimas. Then there are other problems. Usually there are big regions where the gradient is almost zero. So it's very hard to optimize in this region, but still let's say local minimas are less of an issue. Especially if at the beginning of your optimization, you keep a number of samples that is not too high in order to, exactly for this, in order to avoid falling in local minima and then you can increase this number. So with that, I hope that I convinced you that we can find the ground state by recasting the problem of finding the ground state into an optimization problem. Another interesting problem is time evolution, right? Also because if you can perform time evolution, I mean, I give you a state, you are able to compute the state at every successive times, successive times, given an Hamiltonian. You can think also of performing some sort of imaginary time evolution where instead of evolving in real time, you evolve in imaginary time, which will allow you to converge exponentially fast towards the ground state. So to do this, we'll just catch the way we do it. Imagine you have a state given a parameter. So let's say this is an interesting state and you want to evolve it. What we are trying to do essentially is you want, we know how to do the evolution and this how to analytically write it. No, it will be e to the minus i h delta t psi of w. And we want to find a state psi of w plus delta w, where delta w is some update of my parameters that is able to approximate my unitary time evolution. The way we do this is we use linear approximations. So if we assume that the time steps are small enough, we can linearize the exponential of the unitary operator and we can write it down this way. At the same time, this right hand side, this psi w plus delta w, we can expand it to first order in Taylor around w. So we can see the linear effect of changing the parameters. So what we get is basically, again, c w, of course, because we are expanding around this point. And then we find, we apply all the log derivatives. Basically those are, yeah, those okays are the in the wk log psi wk. Basically there is a sigma also. Sigma, sigma, maybe we have some of sigma. So now what they want to do is I want to match those two, right? So the way we match essentially, we try to find the set of the w that solve these approximate requirements is that we try to, we define some overlap. This is the overlap between the two states. It's also the Fubini's 2D metric. We try to minimize the distance between those two states. I will not do the full calculation because it takes a bit, but essentially you can actually find the solution to it. So you can find that the solution to the delta w, the updates that solve it is given by this equation. So basically I have an s matrix here as kk prime on the left hand side and on the right hand side I have the same gradient that I showed you before the gradient of the energy. So this skk prime is known as the quantum geometric tensor. It has the structure of this expectation value or a statistical average of the log derivatives. It was in the context of variational Monte Carlo, it was first proposed by Sandro Sorella in the context of imaginary time evolution where he showed that this object gives you essentially able to generate the imaginary time evolution. You would just get rid of this i. What is interesting is also that this quantum geometric tensor essentially it carries information about the metric of our answers. So essentially imagine, so you know very well that the space where our parameters live, our w parameters live, imagine I have a configuration w and I take another state w prime which is equal to w plus delta w and they are very close to each other. So the distance between those two states in this space which has a nuclear metric will be just the norm of delta w, right? However, what we are really interested in is the Hilbert space and my neural network actually what it does is that it takes a configuration and gives me a psi of w, right? And it is possible since the mapping is highly non-linear, neural networks are highly non-linear function, it is entirely possible that this state which is very close to the initial state in the space of the parameters actually is very far in the space in the Hilbert space in the space of functions. So what the quantum geometric tensor does is it tries to estimate, it's a first-order estimation of the distance between the wave functions parameterized by w and w plus delta w. So it carries this sort of information. In any way, what we can do is essentially we can recast at least symbolically this equation and we can solve it. However, solving it means that we have to invert s. S is a matrix, you can show that s has a positive spectrum. So the eigenvalues of s, k, k prime are in air and are bigger than zero. However, it is also often a highly singular function. So it's not easy to invert it. In any case, if you can invert it, usually we try not to invert it and instead solve this linear problem with some iterative optimization algorithm such as conjugate gradients, mean res and several others. In any case, if you can solve it and you can determine this set of parameters to w that solve it, you just feed them back into the equation that we use to update the weights and then essentially at every iteration instead of computing just the gradient, you compute the gradient, you compute the quantum geometric tensor, you solve the linear system and then you use the output to update your weights and you do this on and on. And this can be used to perform the time evolution of the system, the imaginary time evolution of your system in order to find the ground state more efficiently and so on and so forth. So with that, I conclude. Essentially, I just to sum up a bit, I have shown you that we can use neural networks to variationally encode an arbitrary quantum state or at least a physically relevant quantum state. We can estimate expectation values efficiently by doing Markov chain Monte Carlo sampling. We can also compute the gradient of those expectation values and in particular of the energy efficiently so that we can optimize it. I've shown you that we can recast the problem of finding the ground state and also the problem of doing the time evolution into some sort of optimization problem thanks to the variation principle. And this allows us to solve them with iterative methods. And to end, I would like to point out that if you're interested in all of those things, we have a Python package that we are developing, which is called NetKits. You can find it at the netkit.org web address where we implement most of those methods. There are several tutorials that teach you how to do it. It's very easy to use. Usually you just need to define your Hamiltonian, the variational onsets of a neural network you wish to use and then the technique you want to use to optimize for the ground state of the time evolution and these kinds of things. So yeah, with that, I'm done. I think there are, I don't know how much time I have. I will try to answer some questions. You can address a few questions. There were a couple of them that arrived regarding the second part of the lecture there in the end of the, actually three of them now. So do you see a question and answer box? Yeah, just give me a moment. So you have mentioned that, can we use the stochastic gradient descent to find the subtle point of an energy landscape? So in general, we do, like, I mean, stochastic gradient descent will go down. So in general, it will not stop at a subtle point and it's not so easy to say if you are at a subtle point or not. Usually subtle points are something you want to avoid because they slow down the optimization unless you use good optimizers, but it's a problem. But in general, no, we are trying to find the ground state. But I would also like to say that, I mean, but yes, like this energy function that we are optimizing is, yeah, it's an energy function that depends on, we find on the space of the variational parameters, but a subtle point in this space doesn't really, doesn't really mean that the system that you're describing had, I mean, the subtle point in this energy function does not really mean, but doesn't really have a physical meaning. So is there a way to estimate if a subspace W we consider contains or at least is close enough to the real ground state? Yes. So I guess the subspace of the Hilbert space, with subspace W, you actually mean the subspace of the Hilbert space that our variational answers is describing, not the variational manifold, which is just like some tool we are using. So first of all, the lower the energy that we can estimate, the better the approximation it is. This is already an indication. So if we want to benchmark against over techniques, this is very useful. But if when we are going into the realms of unexplored realms, for example, two or three dimensional systems where there are fewer results, for example, what we can do is, I didn't talk about it, but we can, so let me see. Yes. So when we estimate E-lock, E-lock is, yeah, here I have the definition. So E-lock of sigma, if my wave function psi W is the ground state, is exactly the ground state. When I expect that when it is easy to prove that E-lock for any possible input sigma, they are all the same and they are like E-ground state. So essentially the variance of this distribution and therefore the error by your statistical error in your estimate will go down to zero. This is called the zero variance principle. And therefore it's quite easy to see that you reached, this is actually not only for the ground state, but for any eigen state of the Hamiltonian, but since in general we are looking for the ground state, unless there is something very pathological, if you hit this condition, you know that you are really at the ground state. We can also use this technique and with some tricks based on symmetries to actually target excited states. Yeah. I have a particular classes of Hamiltonians that can or can't be calculated using the neural quantum technique. So in general, there are Hamiltonians that are harder to train for, but we have two tools essentially. You give me, given an Hamiltonian that we try to, for which we try to find the ground state, we first, we can try to cook up a good neural network that should be able to represent the ground state. Of course, if I know that the ground state should respect some symmetries or must represent fermions or bosons or whatever, I will change the architecture. I will not always use the same architecture. And therefore like this already changes the tool I'm using to solve the problem. In general, I'm not aware of any particular Hamiltonian, like there are some Hamiltonians that are for which it's harder to solve the optimization problem, but it's also related to the answers. We are still trying to completely understand what makes the optimization procedure hard. So this is still an open research question to know exactly what is preventing us from solving the optimization problem, if it's the answers we are choosing, if it's the Hamiltonian. Yeah. Can we extend this finite temperature calculation? Can we extend this to finite temperature calculation of expectation values? Yes, indeed we can. We can also extend it to determine the steady states or the time evolution of an open quantum system or dissipative systems. It's not particularly hard, but in just 45 minutes it's hard to talk about all those generalization of this technique. Um, can you comment a little bit more on why the neural network variational answers is better than other kinds of answers? Is it related to non-linearity? So I'm not saying that neural networks are necessarily better than other answers. This is still a question we are researching on. For example, in one dimensional system, we know that matrix product states are extremely efficient and would be very hard to beat and the optimization is also much, it's also quite simple. For two or three or four dimensional systems, already like variational answers and neural network are quite, I mean, they're very general and therefore they can work very well. And also we can take years of research done by the giants in machine learning such as Google and IBM and several others and exploit them because we have interesting structures that allow us to address some problems and actually encode in our architecture some symmetries of the system. So in general, there are theorems that tells us that neural networks are able to capture arbitrary correlations. So even volume, low entanglement in the system. Which is something, for example, that in 2D MPS cannot do. It's very hard to, I mean, we are still trying to understand exactly the limits of those techniques, but there are theorems that tells us that the neural networks are arbitrary, very good function approximators. So it's a, I mean, they perform very well but I'm not saying that it's the ultimate technique. Okay, I think now it's maybe time to stop. Okay. Make a break. I don't know if there are a couple of more questions. I think, I don't know if you want to answer them maybe by tapping me, but I think we should take a break now. So we are back at 1.45, so in 10 minutes we start the lecture by people. So is it fine? You can check the questions and maybe you can answer them directly. Yeah, okay. Let's write down the answer for... Yeah, yeah, there is a possibility that you can after a break or... I mean, I can go on answering, I don't know, as you want, I don't know. Okay, for me it's fine, just I don't know whether or not, my hope participants are fine for it. So we should prefer to answer them like that if you think you can quickly answer them. Yeah, let's go and then we break and then we stop afterwards. However, we would like to start at 1.45. Okay, so let's just stop now and Giuseppe will give his talk and... Okay, so. Okay. Okay, good. So we are back in 10 minutes here. Thanks everyone, we'll see you soon. See you in 10 minutes. So I think we can start. So I'm happy to announce the lecture of the speakers of Giuseppe Carleo. So Giuseppe finished actually his PhD here in Trieste in Cisa some time ago, after which he did his postdoctoral studies in Institut d'Optique in France and also in ETH Zurich in Switzerland. Then he joined Flacarion Institute, I think in New York and since recently he's a professor at DPFL and where he has his research group. So you're happy to have you here, Giuseppe and please go on with your lecture. Thank you, Asya, for the introduction and also of course, thank to all the organizers for having me here today. I'm always very happy to be at ICTP even though this time only visually and I will have to skip the nice view of the sea but I'm really happy that this school could still be done online. So today, I mean, I will tell you more about mostly the applications of what Filippo has started introducing in his lecture. And this is, as you've understood, essentially applications of machine learning ideas to many bodies, essentially interacting quantum systems. And as you are seeing also during this conference, this school, this is part of a much broader if you want things that is happening in the context of physics, which is the application of machine learning to several realms of physics from particle physics to chemistry, statistical physics and also what we are going to talk about today is mostly quantum physics. And you will also see more, I guess, from Juan Carrasquilla in the next lecture. So this is a review where you can find somehow an overview of what's going on in the field and all the explosive development that have happened in the past couple of years. Now, just as a short one-slide summary of what you've seen during the last talk, what you've seen is essentially this idea of neural network quantum states. And again, so these are a parameterization of your variational function sites or your quantum state. Again, state describing complex quantum system. And what you do is that you have a non-linear function that given an arbitrary set of quantum numbers, for example, spin quantum numbers or electronic positions, whatever you have in your quantum system, will return, I mean, will output the amplitude, psi sigma, psi s, s is the ensemble of these quantum numbers. And essentially these quantum numbers, these amplitudes of the wave function that are complex as you know in general will depend parametrically. So this is why this is called the variational approach on some parameters. For example, those in a deep neural network. So you've seen this form maybe not written in this way, but you've seen this during the initial lecture. So this is a deep network where you read it from the right to the left. And the first variable that you see here is a vector s, which is the ensemble of your quantum numbers. For example, plus or minus one for the spin system. Then you have a linear transformation. So you apply a matrix w, these are your parameters, the thing that you can change. And then you apply component to y, so to all the entries of these vectors, vector, and no linear function g, for example, and real or any other function that you've seen during the other lectures. And you do this operation a lot of times until you reach the final layer, so called the network, where you will have in this specific case, only one output. So this only one output is for a given choice of the input quantum numbers, the amplitude of the wave function. So this, if you want bracket value. So this is the main idea of this in one of the states. And then, I mean, as also Philippo was anticipating and seeing also maybe in the previous lectures, one of the reasons why we won't use this kind of approximations is that they are very powerful. So no linear functions are very powerful in describing highly dimensional objects or highly dimensional functions. This is based on some theorems. These are somehow modern reformulations of a famous theorem by Kolmogorov and Admund at the beginning of the 1900. And these are written in terms of neural networks. Essentially, what these theorems say is that if you have a sufficiently large neural network, if you have sufficiently many neurons, then you can represent an arbitrarily high dimensional function provided it is sufficiently regular. So regular means not crazily infinite at some point and other things. And this continues. These conditions are typically made by wave functions and that's why we also use these neural networks to describe wave functions. But as you've seen in other applications, they're used in many other cases to the correct images and all of them. Now, from a physical point of view, we might have also heard about entanglement. So entanglement is this property of wave functions of quantum system that essentially if I make a measurement on some part of the system here that is far away from another part of the system, let's call this part A and the other part B, then the property of entanglement is that essentially the outcome of this measurement A on A will influence directly success with measurement that I do on B, the other part. And you can show it is possible to show that if you have a parametrization of your state in terms of a neural network, you can actually a quantum system which is entanglement even if these two part is AMV are very far apart. So this is what is sometimes called a volume law. So essentially the scaling of this entanglement that goes like the volume on the system and not like the surface as some other case. For example, if you have a deep network like a convolutional network that I'm sure you've seen in the previous lectures, it has been shown that if you want to encode these long range correlations, essentially the, this long range entanglement, the depth of the network should scale at the most essentially polynomially. So polynomially fast with the number of spins, the number of degrees of freedom at the end of the system. So this is a very important property because this tells you that you don't need exponentially many large, exponentially large neural networks, for example, to describe is the fundamental property of quantum system which is entanglement. Now, as Filippo was also mentioning, there are mainly two applications. So one which is about simulating quantum system and it is what both me and Filippo will focus on today. But there's also another part of the story which is about characterizing quantum hardware or somehow learning if you want wave functions from experiments. So if I have an experiment that contains, so to speak, a certain wave function, I can try to represent that wave function on my computer using these representations. This is not what I'm going to talk about today because we don't have enough time but I will concentrate on applications in the first three arms, so simulating quantum systems. It should be stressed that this is very important that these kind of applications are relatively different from the kind of applications you've already seen, I guess, during this school where you have data sets. So in standard applications on machine learning, you have a lot of data that is generated by, for example, images that are taken of cats or dogs. And then you, for example, try to classify these images with these very large databases. However, in the applications that we do here, we don't use databases, but in a sense, we self-generate these databases. So this is the sampling step that Filippo's discussing for a long part of his talk that is essential to compute, for example, expectation values of quantum systems. So in this sense, this kind of applications is self-learning in the sense that we don't have an external pre-solved solution of our problem, but we try to find it on the fly. So this is similar to somehow learning yourself to how to walk without having somebody that shows you how to walk. Okay, now concerning the simulation of quantum systems, again, there is several applications. One is if you want to find the ground state of a given Hamiltonian H, an approximation of this ground state or some excited states, or similar to the dynamics. So solve if you wanted the Schrodinger equation, the time-dependent Schrodinger equation, or there are even cases where we actually solve approximately for the dynamics of the density matrix of the system. So this is for open systems. If you want to find a temperature of somebody, some cases that somebody will say asking this. So let's focus again first on this part of the story, which is the ground state search. You've seen from Philippe again, one slide reminder that what we do is that we consider the variational principle. So the expectation value of the Hamiltonian, for the finger-discard interaction in my system. And we know that this expectation value is strictly larger than the exact ground set average. So what we do is that we try to minimize this quantity, E of w as a function of the w that are the parameters that are in the miracle. So and a very important point is that you can rephrase these as an expectation minimization problem. So I have a probability distribution, which is this size squared as Philippe was outlying. And we minimize the expectation values of a quantity that is this local energy to this by Philippe over this probability distribution. So this is the main step that we do doing this optimization if you want this variational learning of the neuron. Now this, again, I won't go too much into the details of the theory because you've seen already some, Philippe will give you some applications and idea of the application and the flavor of what we can do. So the first kind of applications, I mean that we do, for example, in condenser matter is about studying interacting, for example, spin models. So you know that in some case, you can describe, you can have a Hamiltonian, an effective Hamiltonian for even an electronic system, for a system of interacting electrons that reduces only to spin degrees of freedom. So we freeze somehow the translational, if you want degrees of freedom, so the fact that the electron can go around, we mentioned that they are on a square lattice, for example, like in this case, and then what is left only spin up or down degrees of freedom for this area, okay? So one famous model, which is what I will discuss about today is this kind of family of models where you have what is called an exchange, I mean, what is called an isomeric interaction. So essentially you have an interaction between spin, SI, J, so these are vectors of polymetruses, interacting on two sides, if you want of these two-dimensional square lattice so for example, only the nearest neighbors of this lattice, this is the first term here, and then you can have also interactions at second nearest neighbors, so this is this J2 term on the diagonal of this square lattice. And I mean, one reason why we use this model as a benchmark is because first of all, it's easy to write down and also because we don't know its phases exactly because it's very hard to solve, we don't have any other technique that can be used to solve this either analytically or computationally in a controlled way. So for example, one question that we would like from send this model is if one J2, so this interaction is comparable to this other guy J1, you can have a phase of... Sorry, I think we lost sound. We lost it, okay. Sorry, what? We lost the sound for a while. Okay, can you hear me now? Yes. Okay, so I was saying that this spin-leaved phase would be a disordered phase of matter of these spins, which essentially contrasts these disordered cases where you, for example, when J1 dominates, so this part dominates, we have this kind of ordering. And when J2 dominates, you have this other kind of ordering, right bordering on the vertical line. So the question that we've got on send is then essentially find good approximations of the ground state of this very challenging model. So one way to do this that we started doing in these words is essentially to take a neural network which is a convolutional neural network, this very successful architecture that people use in the image recognition problems to recognize cats and dogs and use them as a wave function, okay? So a two-dimensional wave function. So we take this kind of architecture that I guess you've already seen during these lectures. And we use this kind of representations. So here, the weights are essentially the endings of these matrices, these square matrices that are called those filters in general. And then, I mean, what we can do is try to see what is the accuracy that we get on the energy of the ground state for some cases, for some values of this J2 and J1, so for some values of the problem. So if we take, first of all, J2 equal to zero, so in this Hamiltonian, when we take this, also when we don't conflict these interactions, we only conflict this interaction here. This is called the standard Isomer model. What I brought here is essentially the error that you make on the energy that you can compute exactly in this case with other techniques compared to the sites of the network that we have. So here, we use an RBM that Philippe was also talking about and alpha is the width, essentially, of the neural network. So the larger alpha, the wider is the neural network. So this is a shallow, very shallow, just one layer, essentially, neural network, but that you can make more expressive by enlarging like this horizontal. So and you see that, I mean, at that time, we were able, ready to improve some of the, at the time, best rational results that people were obtaining on this model with general, with function answers. And if you start using the networks that are also more expressive because they are deep, so here we are not using deep networks, just these simple, one layer, a very simple mind of neural networks. But if you play a bit more and you increase the depth and you use them closer to the state of the art networks, you can also systematically improve these and I will show you later also how you can even go beyond these results. So you see that here, for example, the error that you make on the energies of your 10 to 12 minutes. Now, the thing becomes challenging when you start turning up this J2 interaction. So this second, for this interaction, the diagonals of the square lattice, right? So J2, again, is the interactions of J1, is the interaction between the nearest neighbors on the other spot. So I have a spin here, a spin here, this is my J1 interaction, even these two spins. And J2 is instead the interaction between, on the diagonal of the square lattice, okay? And so, and things that becomes challenging and I'm solved essentially when you turn on this J2. So there's no way we can solve the problem exactly when J2 is different from zero. So in this limit, I mean, in this case, you can see what the only thing that we can do is to compare, for example, the accuracy of our method to, so this is typical of variational methods. The only thing we can do is to compare the energy, for example, that we get with other energies that people have obtained in the past or are obtaining as we speak on this model using other approaches. So for example, people have used the PMRG in mathematics part of the states. Quantum Monte Carlo can be used again only for this specific point of J2 equals zero where you don't have the same problem. And then there were also other applications based on other variational functions, et cetera. So what you see here is essentially the difference between the energies obtained with this, all these techniques and our energies. So essentially when these points are up in this upper plane, it means that we have lower energies. So in principle, a better approximation for the ground state. Otherwise, when you see these kinds of points, it means, for example, that in this specific case, the energy would have a slightly better energy than our best approximation of as of 2019. So this is the state of the art in 2019 for this problem. And you can see that essentially apart from a small part of the phase diagram, we were already, this neural network can really help you in finding better approximations for the ground state and possibly also solve some open points. I have to say that nowadays we are, we know also how to improve this around here. And hopefully we will see soon some new works where actually all over the phase diagram, the neural network will have essentially the best approximation for the ground state. Now, there is of course challenges and the reasons for improvements as in all techniques, in all approaches. And this is what things that I think it's very important to discuss and understand also in this context. So one reason for the challenge, for example, if you go at this point five, or somebody has already noticed it, but you see that here, I mean, why would this point be more challenging than others? So one of the reasons is that that has been pointed out in this paper is essentially the number of samples that you need to generate. So essentially how many times you have to do this Markov chain that people was talking about during this lecture in order to get a good estimate, if you want a good hint of how the ground state properties look like. For example, in this paper, what they did is that they started the overlap. So essentially how good is the approximation of your ground state with the neural network as a function of the number of samples. This is a quantity which is proportional to the number of samples. So you see that what you see is a very nice phase transition actually. And you see that you need a certain number of samples in order to get a good accuracy. And I mean, what they found is that when you have strong frustrations so essentially when J2 is comparable to J1 for a similar problem, but not exactly the same but pretty much related, it can happen that the number of samples that you need is pretty large because essentially the kind of problems that you are trying to learn are very disordered. So it might be that also to learn this kind of properties you need also to see a lot of different combinations. So this is one of the limitations if you want to this learning based approaches that are based on the number of samples. However, I mean, and it is quite crucial, important. It seems that the critical number of samples to learn that you want to approximate these wave functions with a given accuracy which is not too bad. It doesn't seem to scale at this point even though we only have small systems because these studies are very hard to do for larger systems, but it seems that the scaling is not too bad. So in the sense that if you have 20 spins you need of the order 10 to the three samples even for very challenging models. If you have 36 spins, it seems that you have you need maybe 10 times more but not one million or 20 million more samples. So this hints to the fact that hopefully this idea that also Philip was showing that essentially even this large eberspace or vector space in this case of quantum systems we only want to describe as more portion of it parameterized by neural network, a series of neural networks. Then indeed, this is somehow a correct image in the sense that we can hope that for ground state of physical systems, this part can be addressed using a number of parameters which is not exponentially large. This is the hope. It's like how to prove analytically there are non-artificial kind of examples but in practical systems we see that this typically works. There is an improvement over the old things, older things that I was discussing until now and that Philip also was telling you about. So this idea of doing Markov chain to samples or to generate these many samples from the function. So this can be brewed even faster. I will not go into much into the details but I will just tell you that there is a family if you want of neural networks that are called auto regressive neural networks that can be generalized as we've done in this work to quantum systems. And this family of neural networks satisfies the property that you can essentially sample from these quantum states without doing Markov chain Monte Carlo and in a completely efficient way. So these are really the incarnation of the definition of computational and tractable quantum states that Van der Nest was introduced and that Philip was doing his presentation. So just to give you a flavor of how these things work but essentially you have to make sure that your filters in neural network are such that the correlations here depend only on the previous spins and not on the successive one. So you do a sort of decomposition of your wave function in terms of conditionals even though these are not probabilities but complex objects. And then you can efficiently impose with this quantum conditionals if you want to normalize to one. Again, I will not go into much into the details but this will allow you to do exact sampling and you don't need to do this Markov chain metropolis sampling. So just to give you an idea of how good these things are if you take again the case of the Asimov model which is a important benchmark, you see that if you use this exact sampling approach and also much deeper network you can crank down or go down in your inaccuracies or improve your accuracy of more than a factor I mean a factor of 10 or so compared to standard deep neural networks. And at a point that is 10 to one is five where I mean essential this problem is to all practical purposes exactly so, okay. So this kind of new ideas that are really influenced by machine learning are also making an impact in techniques now that are used to study variational. Now, I mean there is so maybe some, okay, so maybe I will take just one or two questions at this point before moving to the next topic. So there is a question which is from Giancarlo Franzese was asking, what about spin glasses and what about finite temperature results? So I'm not sure what do you mean by spin glasses? So you mean classical spin glasses? Yes, so what about that? I'm not sure I got the question, maybe can you mute yourself? Hello, can you hear me? Yes. Yes, thank you. So yeah, I was asking about spin glasses. So when you not only have frustration but also disorder. Ah, okay, yes. Yeah, that's a case that we have not studied in this case but can be addressed. That's, I mean, so this family of models, for example, has been used classically to study spin glasses. Right. Where this is set up side of the P for both. You try to approximate the Boltzmann distribution with a family of auto-regressive classical models. I mean, now I don't want too much to do it. But I mean, yes, you can do that too. There's no problem in doing that. But again, the issue will be the number of samples that you need to learn a spin glass. That's my question. If there is any results about that, if anybody did something about that? No, not in the quantum case. Okay, and so you are only interested in finite temperature. So you don't mean the quantum case is only related to the finite temperatures, so the zero temperature, sorry. So you are not concerning the any finite temperature case, right? So in this case, I'm only discussing about the ground states, but I will show you an example later of also excited states. Thank you. Thank you. Okay, so, okay, so there is, okay, this other question, which is, is there a conceptual reason why N case doesn't work when J2, I wouldn't say it's larger than 0.5, but it's actually around 0.5. On this field here, okay, because for larger, sorry, for larger values of J2, you see that our approximation is good. I remind you that if these points are up, it means that we are doing better. So you see that it's around 0.5 where we don't, I mean, in this paper two years ago, we were not performing as well as the energy. So the reason that I was telling you about this because of this sampling. So essentially the number of samples that you need to learn in these over the phases. Okay, so now I will move maybe to the second part of my talk and discuss about something which is pretty, quite important. I mean, I guess for those of you who will be interested in their future research activity in studying electronic properties or sort of other things related to interaction of electrons. So what I will discuss now is indeed how do we address one of the most fundamental symmetries of nature which is essentially the exchange symmetry, right? So the fact when you have two fermions, the wave function should change sign. So this is one of the main, the first thing that you learn when you take higher level course in quantum mechanics. Now, the main, I mean, one of the things that you can do, which is not the only one, but one that is the one that I will discuss today. What you can do is that there is a way, for example, if you have fermions on the lattice to map these fermions onto a spin problem. So there is, for example, a very famous mapping that was devised by Jordan in Big Ben that essentially allows you, for example, to map a generic fermionic Hamiltonian, so Hamiltonian where you have electrons, onto a Hamiltonian of spins. I will tell you in a second about this mapping, but there are also other mappings. For example, this maybe less known in quantum math mapping, but this is viewed by Bravy and Kedav, which is also a way to map fermions to spin in a way which is, for some cases, in some cases, better suited for, especially for simulations with quantum computers, but also in some cases with classical. So I will give you maybe just a rough idea of what the Jordan-Binger mapping is, but I mean, maybe you know that if you have fermions, you can describe them with the so-called raising and lowering fermionic operators C and C dagger. So these are anti-commuting operators. And the idea of this mapping is that you can essentially turn these, for example, these destructions for fermions C or for fermions inside J, onto a spin operator that has lowering the matrix, sigma minus, Pauli sigma minus on the same side times a string, so-called string operator of products of sigma z on the previous side. So this, if you want this string and cause the sign of this configuration. The same thing can be done on for sigma dagger. And using this very simple rule, you can essentially transform any Hamiltonian containing fermionic degrees of freedom into a spin Hamiltonian on this one. So if you want, you can turn the, if you want the generic discrete problem for fermions into a problem of spins. And then we can use all the machinery we've developed so far for spins exactly in, there is an alternative mapping. Unfortunately, I won't have time to go too much into how these words, because it's a bit involved by Bravian Kedav. And the main reason why this mapping is interesting is because instead of mapping these operators on two objects, like in this case that have n-bod interaction. So you see that here I'm making interact through this product essentially n spins. So a lot of, so this is an n-bod interaction. It's a very unphysical in a sense. The main advantage of these other mapping by Bravian Kedav is that these interactions that you did, the rise in spin Hamilton are quasi-local, not exactly local, but quasi-local. So they all involve at most local n number of spins. Okay, so this is something that can be used also in practical, classical simulations in one approach. Now, just to give you one example, we've applied this approach to some small molecules essentially to these fermions, to benchmark against other approaches that people use, for example, in quantum chemistry. So this is the case of two molecules, C2 and N2, so two dimers of carbon and nitrogen. And you can see that, so what I'm showing here is the energy of the ground state of these things as a function of the nuclear separation. So I have two atoms, for example, one carbon and one other carbon, and I can separate them by certain distance, and I can predict the ground state energy of this specific state. So the red line is the exact solution that you can still perform for these small molecules. And these green line points are the arc gaps of these neural network results. And you can see that these are pretty much close to the exact one, for example, you can see on this vision. They predict the correct dissociation energy, how it's called, and they also, in some cases, get better results than existing approaches that people have used for years in quantum chemistry, for example, it's a couple-castle approach. This is two, especially for more collected molecules, like N2, where you can see maybe from this plot that these green points are pretty much below these, the curves that you can obtain with other elements. Of course, these are small molecules and there is a lot to be done in the future, but this tells you a flavor of, again, how you can apply these techniques onto realistic or close to realistic systems that are relevant also, in some cases, for cancer. I will skip the large, if you want, table. You can find it in the paper. And here, as much as for the other cases, the main problem that we encountered in improving the accuracy is, again, the number of samples that you need to learn the wave function. Even though here the challenge is different, it stems from, essentially, how the correlations of the system are done and the fact that there's a single configuration that is dominating, is actually focused into the configuration that somehow spoils down all the others. But, again, also in this case, we found that one bottleneck is the sample size, but there are ways to improve on this. And, again, this is something, issue, if you want, here, is not related to what I presented before for the J1G. Just also, as a matter of reference to other works, there have been other approaches that are not based on this, if you want, Jordan Bigner or David Kiddard mapping on the lattice that are instead based directly on the continuous space divisor freedom for electrons. Most notably, there's been some work in the group of Frank Neue here in Berlin, his work that was later published in Nature Chemistry, where they use this kind of neural network architectures and also this paper done by people in DeepMind, so Google DeepMind, who are now interested also in fermions. And this is an alternative approach that can be a bit different in spirit, in the sense that they don't work on a lattice, but they work in real space. But the essence also of this kind of approach is that if you do things and take networks that are relatively large, also for some larger systems than what we started with our approach, you can get competitive energies and start essentially adding results for more challenging systems that cannot be addressed without techniques. Now, I believe I have 10 minutes. Please ask if you are correcting me if I'm wrong. Well, we can go until 2.45, so together with us. Okay, yeah, okay. Yeah, I wanted to reserve a few minutes for questions at the end. So let me actually take already many few questions because before going into my final part. Okay, so there is more than computer ground set energies. Is it possible to characterize phase transition, determine critical exponents? Yes, so I mean, essentially, once you have the wave function, you can compute as people showing you arbitrary operators, arbitrary expectation values of operators on these wave functions. So if you had to, for example, characterize a phase transition, you would like typically to measure correlation function of spins or any other quantity that is important to characterize phase transition. And you can do that for different distances and then with that, you can extract with the standard techniques, critical exponents and determine precisely where this phase transition exists. So, okay, so we need to know certain parameters to do computation of a system. What are the parameters we approximate for machine learning codes? What are the inputs used in general for various system? Is that okay? Yeah, I'm not sure I understand this question in details. If you can please ask it again, clarifying what you mean, what are the parameters we approximate for ML codes? This is the part of the question, I don't understand. Net get, yeah, it's a software that allow, it's not like VASP or I've been using that, it's more concentrated if you want on the discrete systems, quantum systems, lattice quantum systems and typically deal with smaller systems because we solved the sharing, we tried to solve the sharing equation not with a DFT approximation. So we try to solve the correlated ground state for correlated ground state. But I mean the spirit is all these things is always to try to find the best approximation for the ground state. So when you do DFT, you use a different approximation which is not in general variational. Here instead we use an approximation which is variational and better suited to characterize quantum system where you have strong interactions of correlations among the different degrees of view. So then there's another question. So you showed that the case when the coupling is here, some years next years, but what we, if we are with long range coupling, then do we need to find a new networks? Well, I mean, one of the main advantages of working this kind of architecture is that typically if you modify a little bit the Hamiltonian, this doesn't mean that the wave function architecture should change too much. In this case of long range couplings, if you allow your wave function to also have long range correlations, and this as I showed you at the beginning can happen if you have a deep neural network sufficiently deep, then typically you can take the same kind of architecture. So you don't need to change too much the neural network. So there's a, so he's using the Bravi, so I assume here you mean the Bravi and Kittai method gives the same results. Yeah, unfortunately I didn't show this, but in practice, yeah, this was a bit disappointing if you want, but if you use this Bravi Kittai mapping, I mean, all these models, the variational results that you get with our techniques are more or less equivalent to what you get with the Jordan beginner. But this is not the general, I mean, I believe that there are other cases where this mapping can be superior, you can get better or more easily to represent the wave function. Okay, so now let me move on to my final part of my presentation and then I will take more questions I will be able to get more questions. So in the final part, so I will tell you something which is related to, again, going beyond the ground state properties, right? So until now, I've told you that I'm interested in, I was interested, we were interested in finding essentially approximations for these equations, for this very important and famous eigenvalue equation for the ground state side zero of my Hamiltonian H, okay? But what if I want to do something that goes beyond the ground state? So one example that I will do, which is not strictly speaking in the realm or not yet, or that's a matter, but it's pretty much related, it's what people call the quantum circuits. So the idea is that in that case, what you do is that you want to simulate, so you want to generate a state so let's call it psi of K, which is the result of the application of a sequence of unitary. So that goes like this, so UK, UK minus one, UK minus two up to U1 on some initial state psi. Sorry, this is not very clear because I don't have much space here, but let me rewrite this in a better way. So in what a quantum circuit is, for those of you who have never heard about quantum circuits, it's the very simple statement, so that I want to generate a state psi K, which is the result of the action of a sequence of unitary U, so unitary operators onto some initial possibly trivial if you want state psi. So the output of the circuit then is psi K, so the general, the state, the state that you generated at the end after you've applied this sequence of unitary operators. Now, so this is of course very relevant because for example, when you do the time dynamics with the Schrodinger equation, you know that the unitary that you approximate in that case is just, if you understand this time independence, just the exponential of minus i h, assuming each part is ht, assuming that, so let me, so in the case of for example, standard, let's say, unitary dynamics, and the Hamiltonian that is time independent, in this case, this would be your unitary U, right, that depends in general, okay? So for, you can generalize these to case where you take other kind of unitary, and this is what a quantum circuit is, okay? So just to give you an idea, there is then this notion of gates, so a gate is essentially those, the unit that you put here in the circuits, and there is a universal set of gates. So essentially a set of unitary is local, unitary is local means that as people are saying, that acts only on one or two spins, for example, to this case. And you can show that you can generate an arbitrary quantum circuit, but just using, for example, a set of three so-called universal gates. So one, for example, that I will consider is the other market, this is a rotation, a local rotation that amounts essentially to putting you in the basis of the C max operator. Or you can do a rotation in the z direction, so these are z by some angle phi fi, or you can do a so-called control z rotation on two pieces. So these are just if you want to build in blocks that you can use to build a more complicated quantum circuit. Now, I mean, what you can do is that you can show that if you have a neural network, so if you have a wave function that is represented by neural network, so a very simple neural network like an RBM, so again, a one layer deep network, so very simple minded, you can show that you can apply to these neural networks, these gates and get out another neural network, typically. So for example, if you apply this gate here on this qubit, so this is now my set of my qubits, I imagine that I apply my gate for my unitary only on this qubit here, you can show that the neural network that will result out of this operation is another neural network with some of the weights modified depending on the fact. The same thing is true if you do this control z rotation which is applied in this case on two qubits, so this is a so-called two qubit operator. And then you can show that also in this case that the result in neural network will have essentially, which is just slightly larger than the previous one, but you can also write it down exactly. So the application of these two gates is it can be done in an exact way. There is however, this other market which is again the only one remaining to implement all the universal possible circuits that cannot be applied exactly. So it's known by from this paper that if you apply another my gate on two a generic neural network, you cannot generate another six-qubit neural network which has a simple form like in this case. So what you can do in this case however is to use another variational principle which is if you want the more general that the one that we've seen so far for ground states. And the idea is the following. So imagine that I have my neural network state, so psi w. So this is an arbitrary neural network that depends on some parameters, w, okay? Then I act with a unitary that in this case is just this other market. So beware that now H is not the Hamiltonian anymore but it's this other market gate. So this is a local unitary that acts on some qubit and that acts on this quantum state. So in general, the output state would be another quantum state phi. Now, what we know is that in general, this quantum state phi is not another neural network. Otherwise, we will just solve the problem exactly. However, what we can try to do is that we can try to approximate this state phi which in general is an arbitrary quantum state with another neural network that has this time some parameters, w, okay? So you see the problem. I have a genetic state phi and I want to approximate it with another neural network that has some parameters, w, right? So this is now an approximation problem because I want to somehow match these two quantum states as closely as possible. So if you want what you can do is that you can minimize instead of minimizing the energy as you do for the ground state, you can define the cost function. So the thing that you want to minimize is a machine learning approach. The overlap or in this case, we use the logarithm of the overlap but this is minus log of the overlap but this is not very important. So you can maximize the overlap or minimize the log minus log of the overlap. But I mean, essentially you see that when psi of w prime is equal to phi, this overlap here would be equal to one and the log of one would be zero. So this loss function would be equal to zero. Otherwise if psi is close, it's not close enough to phi, then this loss function will be non-zero. So you see that this is a sort of energy if you want for your system. So this L that depends on this parameter w prime and it will be exactly zero only when the two quantum states are identical. So we can play the same game that we played before the ground state but this time minimizing not the energy but this loss function which is the infidelity for the log of it's related to it. So why do we care about this? Well, we care about this because we want to see, for example, how hard it is classically to simulate the quantum computer. So for example, if we actually need to run a certain quantum algorithm on a quantum computer or if we can try to hope to approximate that certain quantum algorithm on a classical computer. So most of the hardness result that we know so most of the things that people will tell you is that this is a desperate task because the quantum computer is much more expressive than a classical computer. You can encode the explanation many states that you cannot do classically. This is a valid argument but in most practical cases, this is not entirely correct because there is several quantum algorithms that can be efficiently approximated classically. So understanding where the limit between the quantum computing and the classical computing lies is very important also for the development of quantum computing itself. So for example, I mean, let's take a very simple example. So what we do is apply a sequence of Adam and Gates or is H one on each spin on each qubit for initial state, which is the ground state of the transfer field as a model. So this is one of these spin models that I was telling you about before. So this is the overlap that you get out of this variational result. And you can see that you can get pretty high overlaps at the end of the circuit. So essentially the final state is identified by one in these units. And you see that the final overlaps that you can get, for example, on as large as 60 qubits so what you can do these days experimentally is around 98 or something in this kind of applications even for two-dimensionals. And so for example, what you can do is that you can compare the error that you make in this approach. So again, we are trying to estimate this quantum state with a classical neural network. So we are going to make an error in this approximation. But if you run the same quantum model on a quantum computer, also the final state that you will get in the quantum computer will not be exact. So this will be affected by, for example, noise. So you will have the coherence, the fact that your qubits are interacting with the external environment, that these qubits are talking to one another. There's all sorts of noise that can disrupt your quantum computation. So, and this is an exercise that we did in this paper that has been also retaken by, done later by other people. But I mean, essential, you can compare this variational noise. So if you want effective error that you make in the variational simulation to the noise that you have on the quantum computer. And you can see, for example, so this is the kind of accuracy that you get for these simulations with the neural network. So this is the overlap that we get. So this is the straight line here. And this is the instead the error that you get on the quantum computer when you change the noise. This is the simulation of the noise, not an actual device, but a realistic simulation. And you can see that essentially, if you want to achieve the same markers that you have with the neural network, you need to have a noise level around the single-qubit noise that is relatively small. So this is comparable, actually, it's below what people can do these days, even with state-of-the-art qubits. And you see, again, this tells you that classical computing is very competitive with, even for the simulations of complex quantum circuits. And one should also always keep in mind that one source of noise, which in the classical cases, due to your approximation power, it can be comparable to another source of noise in the actual hardware, which is due to deco-events. So this is the main message. I will just flash one of the more recent results that we have on another approach, which is called another more complicated quantum marker, which is called the QAOA, Quantum Approximate Optimization Algorithm. I mean, I won't go too much into the details, but it's described in this paper. This is what also Google has popularized recently with the work where they implement this algorithm on the quantum hardware, for this famous quantum supremacy, if you want architecture that they've used last year. They've also run this paper, these kind of circuits on the hardware. And I mean, we've shown that this work, recent work, more recent work that, again, with an RBM, a very simple mind at RBM, you can describe, do a very good job at describing also the output of these kind of circuits. And the kind of values that you can achieve, so again, the accuracy on these approximations in the regions of interest for the algorithm are very good compared to what you can get on the app. So, and the important point is that you can also scale this up to very large number, relatively large number, qubits, 54 qubits. This is our simulation for 54 qubits. And for example, for this amount of gates, so if you want for this large sequence that we started, we get an accuracy which we believe at the moment is not achievable in the actual experiment. Okay, so this was a comparison with Tensors Networks, I will skip it, but in essence, you cannot see near the sequence in Tensors Networks if you want, I can tell you why later. There is our software, which is Netcat, you've already seen about this by Filippo. I reiterate that this software is completely open source, you can download it, work on it, contribute to it if you want, we have a GitHub repository where you can also submit your issues if you have problems or if it's something that you don't understand, we can try, we will try to do our best to reply to your questions. There's a release 3.2 coming soon, and then you will see that it is based on this JAX, which is a very nice framework developed by Google to handle the dynamics. Okay, so I will then leave my last slide, which is about challenges, open things, mostly related to fermions or optimizing design phases, for example, in the circuits. But the main message that I want to give you today is that if you are interested in studying quantum system, in studying quantum system, there is a good chance, it's not a guarantee, of course, that if you represent that wave function with neural network, you can find typically a good approximation for the kind of properties that you are interested in. Of course, there is a lot of research going on, there are problems where we don't manage to do this as good as we would like. So one example was this infamous 0.5 point in the J1J2, but there are other examples. So this is why, this is a research, and that's why we are here, but it's only thanks to the new generations that we will extend these applications and find some more interesting results. So thank you and I'm trying now to answer some of your questions. Okay, thanks for this episode. I'm not sure from the last time, if some new questions arrive or you want to check yourself the box, yeah. So, I mean, there is a general question on, yeah, on codes in the condenser matrix. So I guess this is not directly related to what I was saying that today. Can these ideas be used, no, another question is, can these ideas be used in high-temperature superconductivity potential? Yes, in high-temperature superconductivity, I mean, there are some simplified models of high-temperature superconductors, so the famous upper model, which is not realistic, but simplified, and I mean, in this kind of applications, what you deal with is a fermionic Hamiltonian, which is written in terms of C and C diamond operators, and you can try to essentially find the ground set, for example, these Hamiltonians and see if these support superconductivity, so superconductivity in that case, something that can be measured if you want on the way forward. Has anybody done this yet? The answer is no, because this is a challenging calculation. We did something related to this, but not exactly the upper model, but I forecast that this year there will be a lot of people on this topic, so you can stay tuned on this. So this is done, so can I suggest some interesting problems to work on for pursuing a PhD? Yeah, so there is several lines of research, so one that I was mentioning is related to understanding how many samples, for example, do we need to learn, so to approximate a given wave function, so this is something that should be explored more, and another thing would be, for example, understanding a better understanding or actually impose symmetries in a better way than we are doing now. There are some family of symmetries that are harder to encode. These are a bit more technical, but like SU2 symmetry is a symmetry that is not so easy to enforce, for example, using neural networks, so a general flavor of one of the things that people are working a lot on in these days is really understanding how to enforce symmetries in neural networks that are used for physics, and also in one of the physics, this is one of the main hot topics, so the symmetries in neural networks. So when considering a specific Hamiltonian, how do you initialize the neural network? The term is depth and other architecture aspects, yeah, okay, so this is the one million dollar question. You don't have a general recipe because otherwise you would have solved essentially all problems that are affecting humankind, but so what you do is that you start typically, I mean, it's the same strategy that people adopt in machine learning, essentially. So you know that a certain architecture, for example, the convolutional neural network has been shown to be very effective at identifying images in two dimensions, and it is very effective for a lot of applications on two-dimensional objects, and indeed we started from this idea, we've taken this convolutional state neural networks and use them also for two-dimensional quantum problems and found out that in the last year, other issues, but in let's say standard problems, also for quantum problems in two-dimensional, this work. So the idea is that you have to start typically from where somebody has already investigated, I mean, unless you are the first to do that, but then you have to do a bit more work, but if you are starting something that is related, for example, if you take a two-dimensional model, you would know at this point that good architecture or a convolutional neural network. So this is the idea. You look at a certain class of architectures that people have already worked on and try to improve them. So there's a question. NetCAD is based on machine learning, but it requires a few steps to optimize the ground state energy. So how it would be different from the standard DFT method. So DFT method is not based on the wave function, as you know, so it's based essentially on the density functional and it's based on another quantity, which is the density for not the wave function, which is much more complicated of it. So the two approaches are very different. Again, the density functional theory is not in its practical incarnations, a variational theory. So the approximation that you can get can be uncontrolled. In the sense that the energy, for example, you can get can be lower than the exact energies. In our case, it is not possible in the sense that the energy is strictly larger than the exact energy. So is it possible to apply these ideas to similar ground state to phase transition skirmjons in a 2D magnet? Why not? You can try, yes. If you have Hamiltonian, you can write it in terms of local operators, I mean, local polymedicine. For example, you can try to use netkit and see how well you can approximate its ground state. Can it happen that the entanglement or coherence be destroyed by these neural networks? I'm not sure what you mean by destroyed by these neural networks. So these are intrinsically classical objects. So the wave function, if you want, it's a classical description of the quantum system. So there is no intrinsic, if you want, collapse when you work with a classical wave function. So the collapse happens when you actually measure the quantum system and it's not what we do in this case. So here we use a classical approximation of the quantum system. That's what we're looking at. Okay. So I think we can now finish. I mean, there are a lot of questions. And so thank you so much. It happened also for your presentations. I hope you enjoyed it. Everyone, the participants and also the lecturers. So yeah, thanks. And so next week we'll have the last lecture on the machine learning in condensed matter by Juan Carlos Quila. So the topics are more or less similar. So yeah, see you all then and enjoy this week. So thanks. Thanks, everyone. Bye. Thank you. I hope so. Bye bye. Bye.