 Okay, so good morning everybody and thank you for being here. So in the first one we've seen supervised learning, which as I was mentioning is conceptually very similar to fitting. So the easiest one. In the second one we have seen unsupervised learning and I was mentioning those examples and yesterday the goal there was that I had, so in this case I had a lot of samples and I also had labels. Okay, so the labels I remind you are those, the value that the ideal function, high dimensional function takes on those values. In this case of the unsupervised, I wholly had samples and not labels and the goal was to find the probability distribution according to which those samples are distributed. In the third case, which is where really artificial intelligence goes wild, is where you don't have samples, you don't have labels, you have nothing. You just have a goal you want to solve. So this is called typically reinforcement learning. So in this case you don't have some, but at least you have to have something and this something is the goal you want to solve, right? So for example, I want to solve quantum mechanics. But precisely I want to solve Schrodinger's equation. So I will find you, I will show you a representation of quantum mechanics if you want, which is amenable for machine learning purposes. So this is the goal of this lecture and I will show you also some codes that do that. But in general this paradigm of reinforcement learning as all the others has some practical applications or at least some real world applications which are not in the context of physics. So since it's a Saturday morning I would like to entertain you a little bit. So this is an application that you can, for example, think of this reinforcement learning is to play games. So in this case you don't have, for example, data, you just have the rules of the game. So you know that a certain game has a score and you have to obey some rules in order to get the best goal possible. So this is the case, for example, of this rather famous game at least for people born in the 80s like me, not you. And in this case where you have this game where you have to, you know, you have this ball and this slider and you have to make the most possible points. So the Google team which is DeepBrain which is a London, basically divides a machine which is based on this paradigm of reinforcement learning which just sees the pixels of the picture that we have in the top. So just all the raw pixels and the final score. And the goal of this machine then is to maximize the score of this game. So you can do this with a training, a self-consistent training. I'm going to discuss it later. But I mean, the nice thing is that if the training does not last very, very long, so for example, sorry. For example, in this case, so in the first case on the left you see that the training is very bad. So you see, the guy is not really winning. On the other side where we have a much longer train, so the machine really gets good and find also a trick, right? So this tunneling trick that you can, that people, you see that. So in this case, the machine is really able to understand what are the, what is the best strategy possible to win this game. And again, this paradigm is much more complex than the other two we have shown here. Simply because we do not know in advance, for example, what are the winning points or let's say, what are the positions I should move the cursor to win the game. I just know the score. For example, if I want to drive a car, I just know the position of the car. I don't know the optimal trajectory from A to B. It can change dynamically. So that's the goal of this branch of machine. Now, let's see how I can do quantum mechanics with that. So first of all, what is the problem? The problem is that I would like to solve Schrodinger's equation, right? Let's say for the ground state, okay? So Schrodinger's equation is an eigenvalue equation where I want to determine the eigenvalues i. So this is certainly not a problem which is amenable as it is to machine learning purposes because it's an eigenvalue problem. However, I can try to transform it into a machine learning problem if I manage to transform it into an optimization problem, right? So what is the thing that I have to use? Well, it's very simple. So for example, we know that from the variational theorem that if I define an object that I call e of psi, if you want. So technically speaking, this is a functional of the state and I define the expectation value of the Hamiltonian over the state, right? So I know that this object is greater or equal than the ground state energy. For example, here I'm assuming that I've ordered my eigenstates in this way, right? So from the variational theorem, I know that, right? So at least finding the ground state but this can be generalized also to other properties can be found then as an optimization problem. So again, which might be more amenable to machine learning and this optimization problem is finding so the minimum over all possible states psi of this energy functional e of psi, okay? So in principle, the task that I have to do to solve this complicated equation is just to look to do a search in the space of all possible normalizable quantum states psi and at the end eventually I will find the one that has the minimum energy. So this is if you want the simplest thing we can do and this idea of course of doing the variational or using the variational principle to solve a complicated many body. I mean, for example, it's not new, I mean. However, what has been remarked in the 60s, so more or less at the same time when machine learning was intended in a sense is that the expectation value over a certain state is the quantity which is typically hard to compute if you have a many body correlated state, okay? So unless psi is, for example, a simple product state, it is very hard to compute this quantity exactly unless you have some nice analytical properties in psi. So unless you restrict yourself to very specific choices but in general, this e of psi is rather hard to compute. But people in the 60s realize that this can be done in a stochastic way. So what is the idea? So let's assume, for example, that my system is spanned by a set of many body cats X. So you can imagine that these are, for example, in a simple case, the projection of the spin for a spin one half quantum system. So I have n spins, right? So the quantum numbers for this model are just, for this system are just n plus or minus one. So these are the full set of states in the Hilbert space so that we can write this expectation value of the energy over a generic state as basically, so I insert a completeness. So this can be written as the sum of all possible axes of psi X, H, X prime. So here I will have a sum over X and X prime. And then here I will have X prime psi and then I have the normalization. Now, if we do some simple manipulation, for example, we divide everything by psi of X prime, in this case, okay? So here I have a sum over X and X prime. In this case, okay? So we can rewrite this object at the end as the sum over X of, so let me call it like this, psi of X model square times, so the sum over X prime H, X, X prime psi of X prime divided psi of X. And then at the denominator, I have the same quantity but without this. So what is psi of X? Well, psi of X by definition is this. So it's just the amplitude of my wave function. So this is very general, it's valid for any wave function basically. But the advantage of doing so is that you can see immediately that then the expectation value of the energy, of the Hamiltonian can be written as a statistical expectation value. Right? So it can be written, so you can consider these as your unnormalized probability density. So psi squared is your unnormalized probability density. And these as the quantity that you are estimating on this unnormalized probability density. So this object takes the name in the community of variational Monte Gallo methods of the local energy, it's called ELOC of X. Actually, there is nothing local in the sense of quantum information because this Hamiltonian can be highly non-local but it's called local energy for some reason. So and this thing, so you can just write it as the expectation value of the local energy over this probability distribution. Okay? Now, this is very important then because if I have a genetic state which is correlated, which is not mean field, which is whatever, I can just do basically a Monte Gallo sampling. So what I discussed yesterday in the context of, for example, unsupervised learning. And I can estimate this expectation value as, for example, what I do is that I generate a lot of samples, so X1, X2, X and S, which are distributed according to a probability distribution which is psi of X square. So imagine that I fix my state now and I can draw a lot of samples, for example, doing metropolis to generate those configurations, those many body states. And here I can then estimate this quantum expectation value as one over the number of samples. So again, as the simple average of the local energy over those XI. Okay? So this is the fundamental connection that has been done in the sixties, for example, by McMillan, one of the pioneers of this field, who understood that quantum expectation values can be computed exactly as classical expectation values over the probability distribution and then you can sample this probability distribution using Monte Gallo. Now, the interesting aspect is that we can not only compute exactly the local energy, so the expectation value of the Hamiltonian, but also, okay, we can also compute any other operator, basically, not just substitute H with your favorite operator. So that's the first thing. And the other thing is that we can also compute efficiently the derivatives of this energy. For example, imagine that I want to optimize this thing. And imagine that my state now psi of X depends on a set of parameters, P1, P and P, where these Mp can be huge, millions, 100,000s. Okay, so in this context, then I can also, so I've shown you that the expectation value over this state of, so, the expectation value of H can be written as the statistical expectation value of the local energy. But it is also true that the gradient of this expectation value, for example, with respect to some parameter, P of K, so I call this E of psi, right? This can be written also as an expectation value over this probability distribution, pi of X. So in particular, you can show, and I mean, it's quite straightforward calculations that this is equal to the expectation value of basically the local energy times what I call D of K, and that I already introduced yesterday, but now in a moment I will remind you what it is. Minus, so the expectation value of the local energy alone times the expectation value of this D of K star. So star is the complex conjugate. All of those expectation values are taken again over this probability distribution, pi, so psi squared, that's here. And this D of K, it's what I introduced also yesterday in the context of unsupervised learning. So D of K of X is defined as basically the derivative with respect to parameter P K of the log of my psi of X, okay? So imagine that I have some form for my variational wave function from my state, which depends on some parameters. And then I can do the derivative with respect to those parameters, the derivative of the log of psi, so that I know immediately what this D of K are. The local energy can also be efficiently computed, typically, so I will not go into this detail. And you can measure then, so if you do this procedure so that you can sample a lot of configurations distributing according to pi, you can estimate both the expectation value of the Hamiltonian and the gradient, right? So this is then the fundamental, one of the fundamental things that we needed to transform quantum mechanics into machine learning. So the first one is basically to convert an eigenvalue problem into an optimization problem that I can somehow work with. So this is the optimization problem I get and this is how I would solve it if I have a generic state. Now, the connection that somehow has been missing since the birth of this field is how to identify psi, what psi should I take which is general enough to somehow hope converging to the exact ground state. So this is what we've done in our recent work so published last February. And the idea is to take as a psi a neural network, so an artificial neural network. So by, from the way that I've already written here you can immediately realize where I was going to since the goal of my neural network now is not to translate a string of text, it's not to solve a game, it's not to do anything but it's to compute the amplitudes of the wave function. So, let me write this. So what I want, what I want to do is that I want to identify this psi of x so the amplitudes again, so this psi of x is this object, right? With an artificial neural network. So an F neural network which will depend on this high dimensional vector x which can be for example the projection of your spin and which will depend on some parameters. So I've shown you for example the case of the feed forward network where those parameters are the connections, are the weights in your network. I've shown you the case of the restricted bottom machine where the weights of the connection between the hidden and the visible units were the parameters, so I remind you. So for example in the case of the RBM, the specific form that we took for this F RBM was inspired by classical, basically statistical physics. So in that case, so I told you that for this F RBM which depends on the spins, right? So sigma one, sigma two, sigma n was the sum over all the possible values of the hidden unit of an interaction between my physical spin and those hidden units which takes the form, so in this case that I have sigma z as an input if you want of this machine. Sigma z i h j, if you want also this is z so it's again a plus or minus one variable. W i j plus then I had the sum over j of h j b j and then I had the sum over i a i sigma z i. So in this formulation, then my parameters, so my p's are the weights, are the biases b and a, so the bias of the hidden units and the visible ones. Okay, so this is basically what you need to do to do machine learning for quantum mechanics. You need to know, you need to identify the wave function amplitudes with a suitable neural network. So a choice which is rather reasonable is to, which actually achieves some state of the art results in 1D and 2D actually, is to use on some models, is to use this form, this specific form that I just wrote you here. So here in this RBM form, you fix the number of hidden units, so this number of artificial, so these are your physical spins and these are the artificial hidden neurons which somehow mediate the correlations and the interactions if you want between the physical spins. So what I can do is that I just tune those things in order to best reproduce my ground state. So that's the idea. And to do that, I need two ingredients. So the first one is the ability to sample from this machine, right? So I want to generate a lot of configurations, a lot of x1, x2, x3, which are generated according to psi of x squared. And the other one is that I need to, you do some stochastic where they descend or some approximation, some optimization algorithm in order to minimize the energy function as a function of those parameters. And this can be done using again those samples and using the gradient that I can specifically compute as an expectation value over those configurations. Now, the sampling from the RBM is what we've discussed yesterday. There is a subtlety, of course, in this case. And so in general, the wave function is complex valued, right? So yesterday we discussed only the case where those parameters are real valued. Of course, that case is not general enough if you want to describe a wave function in the sense that if you want to do this, you want to identify psi with f, you need this function in general to be complex valued. So in the general case, you might want to take these w's and b's and a's complex value. So if you do that, typically you cannot choose any longer the Gibbs sampling strategy that I was introducing, but you just need to do standard metropolis sampling and generate your configurations and measure your local energy. It's all rather straightforward. So an interesting property that I would like to discuss now is how this, before showing you some actual numerical results live, is how this optimization can take advantage of some specific properties of quantum mechanics. In particular, you might remember that, yes, I discussed the connection between the stochastic gradient descent and this Langevin equation, right? So I show you that there's a connection between the effective temperature of your classical system and the noise that you have on the gradient. So also in this case, you have a noise because simply because those expectation values are estimated over finite samples, a finite amount of samples. So on those gradients, you also have a noise which was a sigma square with go like, so the noise on this, so the variance of this variable, if you want, the variance of the expectation value of the gradient. So I call it G of K, let me call it G of K gradient. So this thing will go like one over the number of samples that I have drawn from my probability distribution. So in principle, there is always some intrinsic noise which is good as I was telling you yesterday because it allows us to to converge to the global minimum of the distribution. In this case of the energy of the system, the actual physical energy. However, we also might want to control this noise or at least be sure that the temperature we arrive at the end to is zero. And there is a nice property and the property which is easier to show, for example, in the case of the local energy is that, for example, the variance of the local energy as a variable. So you consider your local energy, so the expectation value of the local energy is the expectation value of the Hamiltonian. So what is the variance then of the local energy? So if you want the expectation value of the local energy square minus the local energy square. So these are just the statistical fluctuations of the local energy and also those have a very nice physical meaning because you can show, I will not go through the derivation, that this is equal to the expectation value of the variance of the physical Hamiltonian, okay? So you can demonstrate that using a couple of tricks. Now, this is rather nice because it tells you that, for example, the fluctuations that you have on the local energy tend to reduce when the Hamiltonian converges to the exact ground state. So can you see that? Because if your state is an exact target state, for example, then h psi, right? Yes, I mean, h psi, so h square psi, if you want, so this quantity, which is here, is simply equal to, normalize, is simply equal to e squared because psi is an eigenstate of h. And here, we have the same quantity, which is e squared. So the variance of the Hamiltonian is zero if we are on a close, we are at an exact time set of the Hamiltonian. So the magic thing which happens is that the closer you get to the eigenstate of the Hamiltonian, the smaller the fluctuations on the gradient, for example, and also on the energy will become. So this means that the closer you approach the exact solution, the less noise you will have. So this is something which is peculiar, I would say, or particular of the quantum mechanics that you do not find in standard applications in machine learning, typically. So in particular, this means also that typically, so this learning rate that I introduced yesterday, so which is the rate at which you change your parameters can typically be taken just fixed. So you do not want to anneal it as a function of the number of iterations simply because the temperature decreases by itself because the variance of your gradient goes down automatically. Now I can show you this at work for a simple example, numerically. So the simple example I'm going to treat is the case in which the Hamiltonian is the harmonic oscillator. So here I'm taking h bar equal one, omega one, whatever, equal to one, m equal one. And I take as an answer, so it's a simple answer for the ground state just to show this property, psi of x, the exponential basically of some Gaussian, which I could, so minus alpha x square. Okay? So I take, if you want, there's a variational parameter in this case, this parameter alpha, which I can, so that I can play with and I can modify in order to converge to the ground state. This is just to show you how, what happens to the energy as a function of the, ah yeah, so those things are, will be, I mean, some of them are already on my GitLab repository, those codes that I'm going to discuss. So the first, so they are there, this is publicly accessible, so you can go there and download these codes and play with them. There are also some other lecture notes from another school, I mean, there's a lot of material you might want to, to have a look at. So in this case, let me show you the example of the, of the, of the, of the, of the IC model. Sorry, I have to find the right, okay? Okay, so here, again, I'm optimizing alpha, so this parameter in the wave function and what I'm doing is that I'm generating samples of, drawn according to psi square using some metropolis. So you can have a look at the code again in the repository. I don't know why the sound is not dead, but it should be. And then, so you can call this script. Okay, she's called, okay. So you see that I'm, well, you can probably not see it, but so I start from a very, so okay, so let's have a look at the code. So optimize harmonic. So here, what I do is that I start with a state psi, this is in Python, I hope you are fluent in Python. So this state, so I start with a pi, with a state which has alpha equal to one. So it's not the exact tiger state. You know that alpha, one half is the exact tiger state in this case. And then I, what I do is that I generate some samples and then I consider the, I compute here, for example, the local energy and I compute the derivative of the local energy to this gradient and I use this information just to change the parameters. So what you can plot then is there, for example, the expectation value of the energy as a function of the time. And here I'm using a number of sample which is very small, I think a hundred. So this is the energy as a function of the iteration counter. So you can see the plot here. I generated it in real time. And it is very noisy because I start with a very bad answer for the wave function. So again, I had half equal one, which is very large. And so at the beginning, the energy is very large and it is very far from the exact energy which in this case is one half. And you see also that I have wild fluctuations in this quantity simply because at the beginning I'm taking not enough samples to have a good resolution and I have a very huge variance. But I'm keeping it fixed and then I'm doing this optimization in the way basically I described here. And you can see that at the end, the energy approaches 0.5, like with the precision which is unbelievably accurate. So you can have a look at the thing and you can see that the energy is 0.500031, so you see. So for example, this is the last iteration. And you can see that the variance, for example, is energy is 10 to the minus eight, something extremely small basically at the level of numerical precision. So you can see that because of this property of the Hamiltonian, because of this property of quantum mechanics, even if you are doing something stochastic, so which is intrinsically dirty in a sense because we have noise, we can convert to very high precision in the energy and also in other properties. Now, of course, this is interesting, but again, it's a toy model. It's just the harmonic oscillator. We want to do something more complicated. So to do something more complicated, we need the good stuff, so we need the RBM. And this is the other example that I'm going to put on this repository or you can already find it. So in this case, I consider a Hamiltonian which is already a prototype for strongly interacting models. So I consider the transverse field as a model. So in this case, I have a one-dimensional chain with periodic boundaries. And I have my sigma x operator proportional to my transverse field gamma. And then I have some, in this case, nearest neighboring interaction of the form sigma i z sigma i plus one of z, right? So again, here I have a lattice of spins of, so I have sigma one z sigma two up to sigma n. And I have periodic boundaries, so I have interactions in this way. And I also have a transverse field on each of those, if you want, I have a transverse field on each of those points. So this model has a nice property which allows to simplify a little bit the presentation of this code, which is that the ground state is positive definite. So you can demonstrate that in 1D. So in this case, then what I will do is something a little bit different because I just want to simplify the code. And I will say that psi square, so psi square now is my probability distribution. If you want, psi square of x, so of sigma one, sigma n z. So identify psi square with my FRBM, if you want, with my distribution. Okay, so you see the difference here that before I was doing psi equal f here, I'm doing psi square. And this is fine, but the simplification is that now I can, you take just the real parameters for those interactions, for example, these w's because psi square is positive by definition. And also I can use Gibbs sampling to draw samples from this psi square. Or I can generate those configuration just using the Gibbs sampling approach that I discussed yesterday. Again, those are details that I discussed somehow also in my lecture notes. So what we can do is that we can go here. Sorry about that. It's okay. So okay, so this code is pretty general. You can use it for any Hamiltonian and basically any graph. So here I'm just considering the case of 20 spins, right? And okay, hypercube one dimension is just lattice. So it's okay. So the first thing defines the number of spins. The second thing defines the geometry, the graph if you want on which model is defined, which in this case is in a hypercube of dimension one. So it's a line. And I have periodic boundaries. And then this defines the Hamiltonian. So you can, so the Hamiltonian depends, I'm assuming that all of you know C++ but probably it's not a good assumption. But anyway. And this Hamiltonian takes as a template the graph. So the thing on which this Hamiltonian leaves, so in this case a one dimensional lattice. And then you have to specify the number of hidden variables. So again, it's this number here. So how many of those I want to put my neural network? So the more I put the clever my neural network will be and the longer it will take to optimize of course, but that's the trade off. And then here I basically define my state for to do, for example, Gibbs sampling. Then I can use a modified version of stochastic that in this end, which is called add a max. This is very a little bit more advanced. Okay, so we can run this thing, for example, just to show you. So typically you just do make and it's compiled. You just need a few standard libraries. Might take a while though. Okay, so you just run this thing. So it does, so it does these things and we can also plot in real time what's going on. So here I'm plotting as a function of time of the iteration count, the network that is finding the ground state. Or is it these guys learning what is in the guys and the software this model has learned in a few years. So you can see as a function of time, the energy, so the expectation value of the energy. And you can also see plotted the relative error with respect to exact ground state. So in this case, the model is simple enough. I mean, we have 20 spins, we can diagonalize it, for example, exactly. And you can see that I'm doing some progress, right? So already the relative error on the energy is pretty low and then I can improve it and I can go as low as, I don't know, I think in this case, 10 to the minus four or something. So this is the same principle that I was discussing before for the transfer field for the simple harmonic oscillator. But at work for a much more complicated, I mean a more complicated system where you also, which is basically a many body system where you have interactions and which does not have a simple mean field solution at least. But this can be generalized not only to 1D, but also to 2D, you would find very similar results. You have to be a bit more careful about the sampling. It's not always that easy. But I mean, in general, this is a relatively robust way of finding the ground states. Okay, so, okay, I mean, it's going down, but I have to stop at some point. So let's say two to, yeah. Depends on how patient you are or how many processors you have. Okay, so that was more or less what I wanted to tell you. So I've shown you during this lecture how to transform quantum mechanics into basically a machine learning problem. Through this stochastic mapping of quantum mechanics to an optimization problem. And I've also shown you that like a simple choice, one of the, let's say in a sense most straightforward choice that you can do is to take this RBM machine and identify it with the wave function of your system. So I mean, after our work, people have started wondering why those states work and what is the peculiarity or what is the characteristic which makes them suitable to study quantum systems. So I'll discuss some of those issues and ideas during my talk. But I can already tell you that one of the nice features of those states is that because of these new states if you want of variational answers, is that if you have long range weights, so if for example you're effective if you want a classicalizing model as very long range connections, then the amount of bipartite entanglement, for example, that you can put inside those quantum states typically can satisfy volume law very easily. So you don't need exponentially many parameters in the wave function to have a volume law. So this is nice for example if you want to describe critical systems, especially in two dimensions, et cetera, whereas for example with other approaches, you might be limited by entanglement. The other thing that people realize is that with those states, you can write basically a lot of interesting topological phases, exactly, like the ground set of interesting topological models, the Gitaev models and other things. And also this is a nice result and in some cases you cannot do that with other states. And one of the results which is also particularly interesting, which can have far reaching consequences, I believe, is that you can demonstrate mathematically that any basically physical quantum state, so any reasonable physical quantum state can be written as a network with two layers. So in the classical case, I've shown you that the RBM is enough to describe any classical interaction, any classical energy. So I've told you the example of the two-body interaction in the classical case and I told you that you can represent any classical Hamiltonian with that form. Now, in the quantum case, you can demonstrate that if you have two layers, so not only let's say these hidden layers that we already have in the RBM, right? But you also have some extra connections, so like this, for example. So we have two layers now. So physical, sigma one, sigma two, hidden, H1, H2, and deep, let's say, let's call those deep neurons. Then the wave function that results out of identifying, let's say, this object with the wave function can represent efficiently any quantum state where efficiently it means that the number of neurons that you need is basically a polynomial of the number of spins. So in particular, it goes, yeah, it depends on the model, but let's say it's a polynomial that can be somehow, typically, sent square for reasonable moments. So this is a very nice result to which I believe we'll have some consequences in the field in the sense that if we manage to use efficiently those states, in principle, we can describe any quantum system efficiently. Okay, so I leave you, if you have some questions, I will be very happy to answer now or later. So thank you for your attention.