 Our speaker this time is Roger Melko. He received his PhD from the University of California at Santa Barbara and spent two years as a Wigna Fellow at the Oak Ridge National Lab afterwards. Today he is a professor of physics at the University of Waterloo in Canada and a faculty member at the Breameter Institute. His research circles around a number of fundamental questions in condensed metrophysics and the physics of many body systems. In his work he uses simulations to characterize phase transitions, find new exotic states of matter and elucidate the role of quantum effects. And today he will pick out one aspect from this very diverse set of topics and methods and we'll talk about machine learning in the context of quantum state tomography. Roger please take it away. Okay thank you Phillip. I'm just gonna share my screen now. So I'm gonna give me a thumbs up if this works. All right now thanks again Phillip and Peter for the introduction. Of course I wish I could be there in person. I missed I missed tea time at All Souls College. Well maybe maybe next time. As Phillip mentioned I'm a tenants matter theorist. However we do a lot of work with sort of this I'd say new generation of quantum information experimentalists who are building sort of these near you know noisy intermediate scale or near term quantum devices. So what I'm gonna talk today about is sort of how we use machine learning and specifically unsupervised learning with generative models to look at and to reconstruct the quantum states that are prepared by these experimentalists. So as mentioned I'm a professor at University of Waterloo and I'm at the Primer Institute for Theoretical Physics. This picture here is the group of people not last summer the summer before the pandemic who are working on machine learning sort of in quantum information in quantum condensed matter and I've just been adding photos to this as the pandemic rages on. So you can kind of get a rough idea for the size of the effort that's at the Primer Institute. So as mentioned I work on condensed matter but I'm kind of I'm gonna generalize that for this audience to just use the kind of lingo quantum many body physics and there's many tasks and there's many interesting open problems and all sorts of interesting research that of course people perform in quantum many body research but I want to talk about one aspect where we basically construct models of our universe the part of the universe that we're interested in and then we try to solve those models to get glean information about maybe microscopic ingredients that lead to certain macroscopic phenomena and that's kind of the I'd say that's kind of in some sense the theme of condensed matter is like what are the microscopic ingredients that go into something like high-temperature superconductivity or topological phases or something like that. So I've used the word models and we can construct models often sort of in collaboration with other theorists and experimentalists and so on and those models typically take the form of some sort of interacting Hamiltonian where a lot of we believe we've kind of distilled some ingredients of a macroscopic system perhaps down into these microscopic ingredients so I've kind of shown two Hamiltonians which are you know typical for condensed matter physicists one the you know Hubbard model which is studied by whole swaths of you know subfields in condensed matter who are looking to kind of explain quote-unquote the mechanism underlying high-temperature superconductivity sort of implicit in that is the I guess you know the belief that if we understand the ingredients yeah like the microscopic quantum ingredients that go into you know current high-temperature superconductivity we may be able to help in the design of future you know maybe perhaps room temperature superconductors or other you know materials and matter devices and things like that. So kind of implicit in that is this idea of you know discovering something about you know as a macroscopic phenomena and then perhaps helping to design or predict what's occurring in an experiment. So I'd say it's a very experimentally driven field in that sense so in these you know when we when we talk about building a model and and you know you know constructing a Hamiltonian that may sit on a lattice and of course it has different ingredients like you know kinetic energy hopping interactions you know topology all these sorts of things that we bake into the model then the question is how do you solve that model and what does it mean to solve it and solving it typically you know if we're thinking about low energy low temperature physics you know number one involves understanding the ground state the properties of the ground state it's really the vacuum with which in you know the things emerge and and of course you know things like correlation functions within that vacuum state the elementary excitation so the low lying spectrum and the you know the fundamental particles of the condensed matter system if you will that emerged from that vacuum and then all sorts of nontrivial things like topological defects or topological invariants and things like that and this is a hard problem and it's hard in a very precise sort of computational complexity sense which I probably won't really get into too much but it's it's you know you know it's it's sorry I'm just fooling around with my screen it's it's hard you know many of these tasks are difficult you know very fundamentally in in the in the sense that we would like to solve this Hamiltonian or perhaps simulate it maybe we want to solve it by hand and we can understand why that's hard but we may also want to simulate it and that could be a very difficult problem even in an exponentially hard problem so you know I'm kind of alluding to fermionic systems maybe frustrated magnetic systems which live on lattices and I'll show some examples of what we're interested in but you know really we're stuck sometimes some of these Hamiltonians are just difficult and even exponentially difficult to solve or simulate so just a note you know where does that difficulty come from and I mean you can imagine taking a quantum system and diagonalizing the Hamiltonian right to kind of get the spectrum I mean that that's obviously or maybe not obviously but it's clearly an exponential exponentially difficult problem there's you know there's other there's of course many approximations that one may make if you're trying to construct a numerical solution and and hardness or even exponential hardness can creep in to all sorts of different parts of this problem so for example if I just wanted to you know even store a representation of the wave function psi you can easily convince yourself that and I think people who work on quantum antibody systems see this immediately that you know the number of parameters required to represent that wave function you know naively is something that's exponentially growing you know so if you had a qubit system or any system that has two degrees of freedom this this number there'd be two two to the end of these coefficients let me put it that way so of course there's all sorts of approximations you know non-interacting particles as an approximation and so on that go into you know trying to reduce this complexity but in some cases in condensed matter especially where interactions matter like in that Hubbard model or a frustrated magnetism you know it's not as obvious how some of these difficulty or some of these hardnesses get boiled down to something simple you know another good example is you know you you may have a sort of efficient representation of a quantum system if you're lucky but you may have difficulty in finding the optimal parameters you know or optimizing over the space of parameters and so you can have and this is this I think is very familiar to people who work on machine learning I mean you have some lost landscape and the question is you know are the baron plateaus you know are we getting stuck in local minima are we finding the true global minima and and that you know that brings us into a whole another field of physics that relates to glassiness and these landscapes and and sort of ergodicity problems ergodicity problems are very familiar to people like me who work on Monte Carlo methods where you know we're sampling a configuration space so it's not necessarily a lost landscape that you're trying to optimize but it's a configuration space and the time required to get representative samples could be exponentially large that's typically another way of saying we lose ergodicity and even if we can produce samples there are cases where producing expectation values of observables can be exponentially difficult even in a quantum sense so there's lots of different ways for things to be hard let me put it that way and just checking on my slides progressing it should be on quantum simulators now yes yes it works okay thanks so this this sort of experimental field of quantum simulators or quantum emulators in some sense is meant to i would say get rid of one of the difficulties of this many-body problem and it's really saying you know this is all the way back to Feynman who talked about quantum computers in 1982 or three or whatever it was you know when he suggested that you know if we really want to solve a quantum system whatever that means or simulate a quantum system we really should have a quantum computer and and you know it bypasses many of these exponential difficulties like for example you don't need a classical representation of a wave function if you prepare it experimentally I mean that's just one simple way of imagining it so quantum simulation is this field of you know physics that comes from AMO and Kevin's matter and other you know kind of disparate fields where you know it's you know Hamiltonians let me put it that way or interactions maybe implemented experimentally typically in highly controlled devices like cold atoms trapped ions you know maybe superconducting circuits things like that where the you know the individual kind of qubit control and coupling turns into an emulation of a Hamiltonian of interest and just a kind of a simple picture that i've ripped off the internet somewhere I mean you can just imagine that you know in that Hubbard model where you have you know hopping or tunneling and interactions you know you can emulate that with actual real atoms which are trapped in the standing wave you know pattern of some optical lattice let's just say which can be tuned and so you can have actually you know tuning of laser intensities and frequencies in these optical lattices can really give you interactions which emulate almost directly that which could occur for example in that solid state you know or condensed matter Hubbard Hubbard model type Hamiltonian and the problem becomes a kind of an experimental theory collaboration in a way because you know you you need to understand these simulators you need to be able to extract data from these simulators so I imagine measuring the state and really you know helping to verify and characterize what's going on and you know it becomes a just a different problem and that problem is very much a data-driven problem so we in some sense we've gotten rid of the task of solving the Hamiltonian or diagonalizing the Hamiltonian or whatever you want want to call it because that thing is now implemented directly in the experiment but now the task turns into you know how do we extract information from that experiment how do we perhaps control that experiment you know how do we give feedback into the control how do we characterize how do we verify and how do we learn from it and so this data-driven problem I'm arguing you know this is kind of the thesis of my talk that you know this is something that's very suitable for sort of modern machine learning methods and so I believe really the kind of era that we're embarking upon is really this you know feedback loop if you will or this collaboration between experimental devices which produce a large amount of data and sort of our state-of-the-art technology which can classify process and so on that data so to kind of get us into the you know into the machine learning side of things I'm going to discuss basically kind of the philosophy or the strategy behind learning a state you know from that simulator based on the data that it produces and so this should be something that's familiar to us in an unsupervised learning context and so let me just forget the quantum nature and I'll forget the quantum nature for most of this talk and I'll just imagine that I'm looking at classical data okay and what is classical data so you know classical data is something that's you know coming from a black box which is my experiment and it's really you know just a set of you know let's call it numbers but I'm really just going to look at binary binary vectors and so this is a single data vector that if I had n qubits in my experiment would be a blank n okay and you know I assume that I have access in this experiment to data that you know roughly is what we imagine training neural networks on a machine learning tasks I like the I like the number 10,000 it's usually something like a thousand to 10,000 there's some budget for producing data you know data is expensive in these simulators and so there's you know there's there's some limit to how much data you can produce but in the classical setting let me just imagine that that data is drawn from a probability distribution and I'll get to the kind of nuances the difference between you know a classical distribution and a quantum wave function later in the talk but really just let me imagine that inside this experimental black box is a probability distribution that we're interested in learning about okay and the only access we have is through this data these could be projective qubit measurements for the quantum physicists and so we want to use only that data to find the optimal parameters of some you know some representation and I'm going to abuse the word model so now I'm switching gears when I say model and I'm talking about actually a model like we would imagine in machine learning like a generous model and let me just say that you know as a note why don't we just build the frequency representation or approximation of that distribution through the data I mean I think the underlying point is that you know we don't have access to enough data to necessarily give a good representation of that of the likelihood of every single element of that distribution so there's some data missing and we you know that that would affect our generalization if we produce data that wasn't in the training set to use that lingo you know we wouldn't generalize well if we use this frequency distribution so that's that's kind of I think well known and maybe in machine learning or generative modeling circles so the idea is that we instead we parameterize a model so model which I'll talk about you know has some you know imagine some neural network with some parameters that I'm calling lambda here that model in some sense smoothes out you know the missing or interpolates the missing information that this smaller smallish data set doesn't give us access to and and you can imagine if you have a Gaussian and there's you know a mean and a standard deviation that can be your two parameters right it's much easier to maybe fit those two parameters which would be a model from a limited data set and to reconstruct the entire Gaussian from the frequency distribution so our goal is really to make this model through parameter adjustment as close as possible to what's in that black box with access only to the data and so we're going to use generative models and that's what I'm going to talk about in this talk and generative models are I mean I'll classify them in a minute but really they're in some sense you know models that we build of these underlying probability distributions given a data set that we can also sample from okay so that's the generative step and I'm going to talk about our restricted bolster machines which are like a stochastic neural network with binary variables I'll introduce what I mean later I'll talk about recurrent neural networks which are autoregressive models that are very powerful but just as a note you know I should have put a citation here there's many different possible generative models that can be substituted for what I'm talking about in this talk and so a particular type of model autoregressive model that's being pursued by Juan Keraski and others at Vector is really is these transformers so if you know GPT-3 you know the underlying transformer technology there it could also be used for you know this attention-based mechanism could also be used for what I'm talking about but just a step back so generative models again they have some parameters they're you know you train them from data once they're trained or even during training but once they're trained we're really interested in the new data that they can produce or you know perhaps the likelihoods for new measurements that are coming in from the device but I'll focus on the first kind of strategy where we have them produce new data samples and then we use these data samples to calculate estimators and I'll talk about that so there's different types of generative models and I find this classification by Ian Bidfellow fairly useful so this is from his his GAN paper back in NeurIPS in 2017 and you know really he classifies generative models you know so they're in some sense he's looking at a maximum likelihood branch and I'll explain what that means but I think really everything we're talking about today is sort of under this maximum likelihood branch and then he breaks up generative models into explicit and implicit density okay so explicit density I think of as you know really the parameters in the model you know explicitly represent the probability distribution whereas an implicit density you don't have that representation you know it's implicit the GAN is something you know and the GAN right here falls under this implicit density sort of branch because you know really what you're doing with the GAN is you know you have this generator and discriminator and you're kind of making them fight but you don't you know explicitly parameterize a probability distribution inside of that for example so I'm going to stick on the explicit density branch here and talk about both the tractable and the approximate density cases okay and these are there's an important difference between tractable and approximate densities when we when we reconstruct either the probability distributions or the wave functions that underlie these simulator experiments and so you kind of I'm going to maybe introduce these from a historical perspective and the first one will be along this branch so explicit density approximate density and then Markov chain and that's the restricted Boltzmann machine so the restricted Boltzmann machine is really a generative model that you know has a number of parameters it has it has a you know you can vary the expressive expressiveness of the representational power through a latent space okay to use that language and it's actually a relatively powerful piece of technology interestingly so John Hopfield who's I would call him a condensed matter theorist he was he was Bert Halperin's advisor I think Steve Berman's advisor he he kind of started a lot of this field with what we now call Hopfield networks which to the condensed matter or statistical physicists is really just a an icing model okay so what restricted Boltzmann machine is sort of the modern variation of a Hopfield network and I've illustrated it here so it's an icing model and there's icing degrees of freedom that's what these circles are and we've split them into two layers one which is a visible layer and one which is a hidden layer that visible layer is the you know the number of binary variables the icing variables we actually use zero zeros and ones is the same as the number of qubits in the device or you know the length of the data vector right so there's I'm calling it n and you know really what a general what this generative model does is you can imagine it taking in these binary numbers into the input layer and then learning some representation the hidden layer is just a number of binary units again it's an icing variables and you can vary the size of it to give you different representational power or to give you different you know expressiveness of the of the neural network so the probability distribution is represented explicitly through a Boltzmann like or Gibbs like distribution where you have a partition function and it's you know they call them energy base because you know you construct this energy and this is the icing Hamiltonian where you have you know weights which are the icing interactions and they only exist between invisible and hidden layers and that's the restricted nature of an rbm and then you have biases or fields and so the point is you have an explicit representation of the probability distribution but you do not know the partition function or the normalization here okay and that's important so you don't you can't get a tractable density out of this you can't get a tractable likelihood because you would need to know that partition function so i'm using lambdas the parameters and i'm you know i'm not going to talk a lot about the details of these rbms but just know that in order to actually represent the physical distribution in order to obtain an approximation you have to marginalize all these hidden units so there's this additional step in rbms where you have to trace over all the hidden and then really what that does is give you this marginal distribution which is the goal again this is the goal is that you train all of these parameters these weights and biases to obtain p lambda x i'll show results on that any questions on the rbm i'm going to talk about one more generative model rbms used to be used for generative pre-training of deep neural networks before alex net came out you know before the convolutional neural networks basically became performing better so they're not used that much in industry anymore so i'm going to go to this left side here tractable density okay so what's a tractable density model so you're familiar if if you familiar with natural language processing any sort of you know sequence to sequence mapping like english to french translation or uh or text completion like gpt3 or talk to transformer if you're bored right now just google talk to transformer and start typing in there that's on the tractable density side of this so okay yeah nades pixel r n n's and and transformers i would put here so i'm going to talk about the recurrent neural network on this side and so we've actually adopted most of what i'm going to talk about in terms of generative modeling we've started to adopt these r n n's and there's there's several reasons i mean first of all first of all they're they're a normalized distribution so i mean just to go back there explicit density but they're also normalized so they're tractable and that's one of the big points so just because it's a machine learning talk let me quickly go over how an r and n works here and what we use an r and n for is we take that data and i called my visible data vectors or my input data vectors x uh and you know each x or each uh you know let's say each binary number so each qubit projective measurement or each one in zero is fed into an r and n cell and that cell is then you know unrolled if you will to correspond to the length of that vector so what is an r and n it's it's these data vectors and so there's some initial value that's fed into an r and n cell which is just some default set to zero there's a hidden vector which is passed between each one of these cells or rather remember this is recurrent so what really happens with the hidden vectors it wraps back around and feeds into itself the output if you know if i'm starting with h0 is h h1 which is given by this formula here okay so it's just some activation function this is like it's simple what do they call like plain vanilla r and n where you have a weight matrix uh you have another matrix here and then you have some biases so i mean the parameters of an r and n are buried inside of these expressions it's what it's you know a little it looks a little more complicated than the rbm but you know you don't have to worry about too much of the details weights so w u b v and c are like the parameters that you're going to train inside of one of these r and n so that's what happens when you feed an input data the output is uh you know again it's input output is a hidden this next i guess if you will iteration or step of the hidden vector gets processed by a softmax function in our case um and to output what we call y and y is conditional probabilities so we interpret y as the probability of the next qubit in the chain if you will or the next projective measurement or the next binary vector being either you know zero or one conditioned on all of the previous um all of the previous elements of that vector so we actually select probabilistically you know the next element of the of the of the input vector and that's input into the next iteration so that's how it's a generative model okay so if i go back to the let me just go back to the r and n how is this a generative model because you train it and then you can sample visuals and you can sample hidden units and so you can produce new binary vectors of the visible layer like if you look at the marginal distribution you can just sample that and you can do it but you don't know the problem you don't know the partition function let me put it that way in the r and n you this is also how you produce samples of the input vector this is how you produce new samples of of the projective qubit measurements or whatever you're calling it but it looks very different and one of the points is that when you interpret the output of of each cell probabilistically as a conditional probability you can form the product of all of these conditional so i've written you know x less whatever i here is like written explicitly out in this expression this is the autoregressive nature of a recurrent neural network okay and by the chain rule i think it's called if you take the product of all these things together you get the full you know the full normalized distribution which is what you're looking for at the end so that is what's special about autoregressive models is you get a you know essentially in some sense you don't you you know the it's not like you have an unknown partition function you know the normalized distribution you can produce perfect uncorrelated samples with no autocorrelation between them and something i won't talk about which but which is also implicit in this structure is that you can implement symmetries directly into an rrnn and if you look back at an rnrbm it's very difficult to imagine implementing symmetries in here if you don't basically mess with the mess with the structure of it so i mean let me know if you have any questions this is i'm going to show how these work now okay but this is kind of the underlying technology that we are using and you know i've simplified this one a lot but you know you have lstms you have grus you've all sorts of things that occur inside these rnn cells so just feel free to interrupt me if you have questions so just a step back now if you forget the specific architecture what is training of a generative model again we take in we take data vectors that are coming from that black box and we want to somehow adjust the parameters of the model these lambdas so that whatever it is that that distribution you know that explicit representation of that distribution is it matches as closely as possible to the p of x which is in the black box okay that's that's the red so we want the blue curves and the red curves that lie perfectly on top of each other and one way of doing that is defining an optimization problem you know where we want to optimize those parameters based on some cost function and the k l divergence is a popular one it's it's like a cross it's like a relative entropy or whatever you know it's like if you traced out x p8 log p over p lambda and because the parameters which is what you want to you know optimize over or only exist in the denominator of this expression right you can throw away that p log p which is the entropy of the data set and and that turns into what you call a log likelihood so you know if you flip it on its head and say get rid of that minus sign then the problem of training a generative model turns into the problem of maximizing the log likelihood and on the top of of Ian Goodfellow's tree is you know maximum likelihoods and this is where it comes from in all of these cases so this is the object here that you know log p lambda you know you can turn it into a problem where you've sampled x from this this probability distribution p right and so since and that's the black box so since you have x distributed according to these samples then the expectation value of log p lambda is really the cost function and that that's what defines the loss landscape so that's a very simple way of seeing it and then what do you do you just do what everyone else does and do gradient descent on that that loss landscape so really just that definition is sort of what goes into it once it's trained again you can generate new instances of data vectors so new projective measurements from qubits or whatever you want to call it okay and and so you can produce new data vectors and you get presumably it's efficient to sample in some cases you can get many more what i'm going to do is calculate physical estimators from those those generative generated samples and and hopefully you know again we have a limited amount of training data but we have um um you know we have unlimited we can produce unlimited generative data afterwards and so that hopefully that generalizes well and i'll look at these two estimators and again one important point in the generative step is that rbms because they're non normalized um you have to sample them with a markoff chain procedure and that markoff chain gives you an auto correlation function so this is the correlation between two elements in that chain like two adjacent elements in the markoff chain right and so there's some auto correlation time uh it's sort of implicit in that which reduces the efficiency rnn's and transformers and other auto regressive models don't have this they produce perfectly independent samples and so you know this auto correlation has all sorts of consequences for ergodicity and mode collapse and so on that aren't in rnn's so that's just another reason that we use these auto regressive models okay so let me know if there's any questions on the generative modeling aspect so now let's turn to what we do with these things and i'll i'll um i'll kind of maybe go through this a bit quicker so now we have a way of representing a probability distribution what does that have to do with quantum systems right i keep saying things like projective measurements well this procedure can be essentially generalized to learn a wave function and you can see that most simply if i have what i'm calling a classical wave function it's probably bad terminology but a wave function that has no complex or negative amplitudes let me put it that way so it's really just square root of probability distribution uh in that case uh you know the probability distribution that you're learning by this procedure that i told you about is exactly what you need for you know to represent the wave function and that's it one set of data you want basis that's all you need if you have a phase in your wave function then you then what i'm saying you know has to be generalized you have to be able to learn that phase and so the biggest problem is that you need measurements and other bases in order to populate that phase um and and you also need to be able to represent that uh phase somehow in the structure of the generative model and there's different ways of doing it you could have complex weights um but you know i won't talk too much about that and then for the aficionados out there of course you're dealing with mixed states often experimentally and so you really have to think about how you um structure this as a density matrix so i'm going to focus on a class of wave functions which is true to the left hand side here okay so it's it's it's wave functions that in some basis which i'll call the computational basis the coefficients are all real and positive and actually that's a fairly large class of models of models by the first sense of the word or hamiltonians that we call stochastic and what what stochastic means is that all off-diagonal matrix elements of the hamiltonian are negative and if that's true then by the parent-for-benius theorem the extramal eigenvalues are all real and positive and so the ground state wave function has this structure here or the structure here okay so that's just a class of of of quantum hamiltonians let me put it that way but it's an important class and it's very relevant for a certain type of simulator that's coming out experimentally uh and so the the experiment the data that i'm going to show comes from a ridberg atom uh like a many-body ridberg atom simulator uh and and the hamiltonian that's emulated by the device uh is this hamiltonian here is first first studied uh theoretically by your colleague paul finley uh who i hope is on this call and it was studied experimentally or at least in the AMO setting let me put it that way um by sarac and zola and others um and that and that really has to do with these long-range interactions so this is a very interesting hamiltonian uh it it talks about the interactions between uh ridberg atoms which are atoms in highly excited states okay these can be loaded into optical lattices and so on and there's some transition amplitude related to the rabbi frequency of going between the ground state where the atom is literally in its ground state to this highly excited ridberg state and what's interesting about hamiltonians of this type um as pointed out by paul and others is that there's a blockade mechanism uh which occurs between uh highly excited states so here's an illustration um but if i have this interaction potential between two ridberg states uh it can preclude two highly excited states being within a certain radius of each other and and in the you know in the experiment there's really some kind of decay of this uh interaction one over r to the six and so really what it means is that if two atoms are too close to each other they can't both be excited into the ridberg state and that interaction mechanism gives all sorts of interesting um lattice dependent you know geometric dependent uh phenomena and if you don't believe me there's two papers this may be for the more for the condensed matter theorists there's two papers that came out last week one by satchdev's group and one by oshrin visionos group who on different lattices i'll just go through this real quick the kagame and the uh what's this one called the ruby lattice uh they claim to see z two spin liquids okay so maybe this is for the experts but z two spin liquids are exotic quantum phases that have no conventional order parameter and their low lying excitations are topological in nature and can be used as you know they can be used for uh things like topologically protected qubits so it's very interesting from both the condensed matter and a quantum information perspective to try to find phases if you will phases of matter that could be used for topologically protected quantum computing and these two papers again which came basically at the same time they claim they didn't know about each other which is fun but also misha lukin's on both papers so that gave me a chuckle but anyway they claim that there's a pretty robust part of the phase diagram in each one of these two lattices with that ridberg hamiltonian that has this phase of interest and again for the condensed matter people uh one of the signatures of a z two spin liquid uh is the topological entanglement entropy which is a subleading correction to the aerial law which gives you a firm signature of z two spin liquidity and so you can see they have some dmrg simulations but okay so that's just there's interesting physics here's the experiment this is where we get the data from and this is from a paper from 20s when I first noticed this was 2017 and this is I think a 51 or a 53 atom simulation so 53 qubits if you want to call them that and the black dots are projective measurements in the ground state and the absence of a black which is like a fluorescing thing uh case is is is the ridberg state and so here's here's actually what our data looks like this is a data vector x1 x2 x3 x4 again this is coming from stabilizing ridberg atoms in an optical lattice really technically what happens is um the the atoms are prepared in the ground state and then they adiabatically evolve which is equivalent to evolving the detuning parameter and the detuning parameter in the hamiltonian is this piece here this is typically almost fixed that rabbi frequency and and you can you can traverse cuts in the phase diagram here's a one dimensional phase diagram where you have different ordered states and that z two ordered state not to be confused with the z two spin liquid sorry uh is really just a anti ferromagnet and here's the density here's here's something like 13 ions or sorry 13 atoms time evolving into that z two ordered state and you see you know light dark light dark light dark and that's that's that's the data so what we did is we worked with manual andres and micha luke and manual built a lot of this experiment before we went to caltech and we produced data for the generative models so in the generative modeling context um we and we did it by taking advantage of this adiabatic evolution uh so really here are the rabbi and the detuning parameters but you know the hamiltonians changing as a function of time as it adiabatically evolves and you can stop the experiment and take a number of projective measurements at each one of these dots and so you can see the detuning changing so it's cutting through that phase diagram between what they call disorder but that's everything in the ground state and the z two ordered or the anti ferromagnetic state and each one of these dots gives us you know the experiment can really produce about 3000 measurements and so that number of measurements is you know roughly commensurate with what we believe can be used to train you know kind of these standard machine learning methods and so it's everything kind of checks out we make some assumptions on the wave function we assume that it's pure um that there's you know we're not looking at a density matrix we assume that we have no phase in the wave function uh so there's a lot of assumptions that go in this and then we take this data at each one of these stopping points uh each one of these dots and we train a different you know we train a generative model uh but it has different parameters for each one of these these stopping points and we put it that way and after training the sample and model can produce all sorts of estimators and and so here's the here's here's why we do this first off if you have any observable or any s you know yeah any observable quantity that's diagonal in that you know in the Hamiltonian it's it's just really what you're doing is you're looking at the original data in some sense okay so I have some operator called it a here and I want to measure and that you know okay here's an operator which is in the spin language um one minus two to the this is the number of rubric atoms I don't know why I did this anyway if that's your operator that's diagonal you're really just counting the numbers of light and dark and you're maybe taking an average right and you know hopefully you're drawing data from a probability distribution that's fairly accurate in your model uh and so on so here is the result of uh the experimental measurement of of sort of the number or the average number of of ground states spins if you will of ground state atoms and that's the black line ed is exact diagonalization so what that is is that is actually a diagonalization of this Hamiltonian which can be done in this case because we stuck with nine eight or nine atoms for this data um so you can actually do the exponentially difficult task of solving for the ground state and the rbm which is the generative model we used in this case was taking the experimental data training the restrictive bolster machines so training the generative model and then producing new samples of data from that model and calculating the same expectation value of this operator is applauded here so why would you do that is kind of the question but anyway it tells us that our generative model is training well you know so we take we have enough data we're training that model we're producing we're generating new data and that's these triangles and everything checks out why do you do that so the question we do it the the answer to why we do it is because certain observables aren't immediately accessible from the experimental measurements and in in the quantum case you know it's really off the angle uh if you will operators so here's sigma x which was defined and you know it's it's the operator defined um it's like uh the raising or lowering operator let me put it that way it's raising plus lowering into the redberg state so you know that is not uh you know that is not the basis that we perform the measurements in right so you don't have immediate access in the experiment to that sigma x however if you have a generative model which represents the wave function then you know we know how to produce the expectation value of that off the angle operator uh with knowledge of that wave function so i just have sigma x sigma x prime here and that can be turned into something that is numerically um efficient to calculate assuming that the thing we call this local estimator here uh is sparse and and that's the case for us so this you know the the off angle expectation value um which is important uh you know to characterize the experiment can be calculated directly from the generative model and so here i have again exact diagonalization is the target and here is the reconstructed sigma x expectation value from the restricted bolsa machine and you know it it shows a discrepancy and that's discrepancy is kind of the interesting thing so we didn't have a discrepancy in this diagonal case but there is a discrepancy and that really is feedback back to the experimentalists to say you know your uh you know whatever you're doing in the experiment isn't exactly what you think you're doing based on kind of the naive measurements but this can go further so we can do all sorts of things and again maybe i'll gloss over the details here um but you know the when you have a model of the wave function in one of these generative models it's very powerful one thing we can get out of it is the entanglement entropy again something that's not accessible directly from the experimental measurements but if you have a model of the wave function uh then you can form and i've done it here so you can form a sort of a reduced density matrix and uh this is my two-line proof of the second rainy entropy algorithm we use so if you know penrose notation fine if not ignore this um henrik i got i just i got a notification you have a question sweet yeah uh just a question what would happen because the samples before you generate new samples you could just take the samples that they took in the experiment right yep and i mean you could just take the after the matrix element of this of the x between those right what would you get which curve would you get for the off diagonal oh if you did it here uh so i don't have that so what happens is that if you if you do that with the experimental data passed i think it's eight sites in this case then it gives a wrong answer because it's not generalizing well sorry i have a figure for that but i just didn't put in this plot in this talk so it gets something close to the generalization issue okay i see i see so right so the experiment so the data from the experiment would lie closer to the rbn curve than the ed curve um yeah um it closer that yeah i actually don't have that data i have other data of um of a different off the angle matrix element so i can't answer that directly um but i don't know what you want to lie it's closer to but i know it's off from the exact value that you would expect and again that has to do with a number of measurements okay so if you only have three thousand projective measurements what what you're doing is you're losing your ability to generalize um uh you know which which you know which means that like that's not enough to reconstruct the off the angle observables but yeah i mean a good point i should put that in in this talk and once you get passed you know i really think it's like eight to ten riburg atoms and if you're sticking you know if you're only doing three to maybe ten thousand measurements that you're really losing this ability to generalize and and yeah i figure exactly how it manifests here but uh it's definitely you can see it i see thanks yeah that's a good point so again this is all about generalization or generalizability or whatever the word is and limited and expensive data right so on this this is the same thing i mean this is just a more complicated okay i'm not going to go through this but what we do is you know typically we replicate uh the wave function so here's a replicated bra on a replicated cat where you perform some swap operator which turns out to be a trace of the reduced density matrix squared that gives you a second rainy entropy and there here's the second rainy entropy again reconstructed from the restricted bolsa machine and you know compared to the exact diagonalization but we have no way you know we have no direct access to this um with the experiment and so this is again just for the tenants matter people i think one thing that's interesting and hopefully we'll be able to do this is when the experiments can tune into these interesting regions for the z2 spin liquid we get these measurements out we train these models we get these second rainy entropies i mean you know we're talking about hopefully experiments that are in the regime of 400 and 500 atoms i know i know misha lukin has at least a 20 by 20 array and so that should be big enough to do all sorts of complicated entanglement geometries and again this would be difficult to do experimentally you can i mean it's been demonstrated that you can replicate the experiment to get this estimator out ragible islam did that but i think he only demonstrated for maybe four or six atoms but here it's clear to me that we can do this for hundreds of atoms so we can really extract something like the topological entanglement entropy from from this data but okay that's that's kind of a again that's that's for the experts um just a couple slides on you know more quantumness uh you know if you don't have this nice feature of the river hamiltonian where it's stochastic and then you have to worry about the phase you're still okay in principle you can parametrize both the amplitude and the phase through different sets of parameters so here's my visible layer now here's the amplitude here's a phase that's fine there's lots of different interesting ways to do that but what you need you know fundamentally is data that comes in in different bases and that isn't all that easy with the ribbering setups but something like a trapped iron setup can do that so we actually do this more with trapped iron people which i have data for but it's not you know i don't think it's that interesting for this talk and just one note is that you know you have to define a lost landscape and so you can define a maximum likelihood lost landscape for every basis so here's n qubits here's x x z z z z x x z you know each one of these is a different basis you have some number of bases and you essentially add up the loss lens loss function or the k l the versions for every one of those bases where the unitary that rotates the other computational basis goes in there so that's how you do it and it works if you want a density matrix here's another invisible here's here's like a physical amplitude and phase and then an ancillary amplitude and phase that's a purification so you know you can take the same thing and just purify it you get a density matrix but i think at that point there's better ways of doing density matrices so for example kwon kereskija especially has some really nice povm based density matrix reconstruction methods that you know you can use any generative model there's a generative model stage that's this all the entanglement in some sense is captured in a generative model and then you have non-interacting povm you know if you will operators or matrices here and that's a really nice way of breaking up a density matrix the povms are these operators and we we use all sorts of different generative models like an rnn works really well there or you know even even these non explicit vaes gans and transformers so this is where the field is going i really think it's density matrix reconstruction sort of using all using generative models as one step and then you know some sort of less naive parameterization of mixing and signs um then i've illustrated here and just this illustrate how that works like if you do time evolution of of any of this you know any of this data uh then you need to have complex phases at least and so here's that here's a ribber Hamiltonian starting in the z2 up down up down anti-ferromagnetic quenched uh and time evolving and each one of these dots is a reconstruction of the of the time evolution with a generative model okay and and that in in that case this is synthetic so we actually produce the data with other types of simulations just to do proof of principle this doesn't come from experiment because we need these other basis measurements and actually there's two n well two n plus one different bases went into reconstructing this and here's the here's sort of the evolution of the entanglement entropy right uh as you as you time evolve the system so this is what's possible uh and you know we're we're ahead of the experimental data but barely let me put it that way so just to summarize again general models i think they're you know they're they're very suitable uh to reconstruct uh quantum states given data uh you know limited amounts of data are fine as long as uh you have a good you know if you will generative model and really what you're what you're aiming for is good generalization um and again for us this is very different than uh looking at Hamiltonians you know the Hamiltonians in the black box or you know the Hamiltonian uh is used to prepare the ground state or whatever the state is that you're looking for and in some sense you're kind of doing tomography on what's in that black box and i really think as i showed um we've done proof of principles with experimental data uh you know but you know i really think when we get more bigger simulators that have more access to different bases these generative models are going to be i mean my prediction is you're going to see them in every experiment i really think they're a powerful tool and fundamentally i mean i'm a theorist uh i think it's interesting when you look at this you know reconstruction from data versus solving a Hamiltonian to ask you know these these first two slides that i asked the question about like what makes a problem hard i think you can ask those questions here and you know are the problems that we consider hard when we're solving the Hamiltonian the same as the problems that we will encounter when we try to reconstruct from data though so is it as hard to re to to reconstruct a ground state wave function from data as it is to solve a Hamiltonian let me put it that way and i think it's interesting because the lost landscapes are very different if you have a you can have the same parameterization of a wave function uh and you could try to try to optimize that with the variational energy and that gives you one lost landscape or you know you can have data coming in from this black box and look at the maximum likelihood or the KL divergence and that gives you a different lost landscape so you know in some cases we believe that these lost landscapes are glassy or non-urgotic is that true you know in the complementary case and i think that's a really kind of interesting theoretical question going on look at that it's 11 o'clock exactly i'm gonna say that was a success of timing thank you thank you everyone for your attention and hopefully you have some time for questions wonderful well thank you for this great talk very interesting okay questions anybody have a question yeah is there any concern that the generative model has the same issue that you're sort of trying to avoid that it's operating at like a lower complexity than the Hamiltonian that you're trying to model and that the transformers or whatever generative technique can't go to whatever dimensionality you need yeah that's a good question the like the RNNs you know if you think of everything we've learned from dmrg you know really are one-dimensional sequence map you know sequences sequence mapping in some sense and so you can really ask the question of are the correlations you know so you have a you know if you have a two-dimensional system like I was talking about for a while there you know isn't RNN sufficiently powerful enough in a representational sense to capture those correlations or that entanglement in the two-dimensional Hamiltonian and that's a real concern and so that's one reason why we go to the transformers because the transformers of this attention mechanism and that attention mechanism isn't a one-dimensional sequence right so theoretically we this is what we do we essentially study these generative models with synthetic data which I didn't really show but we we we bombard them with data from all sorts of one two three-dimensional whatever wave functions and we try to try to figure out how exactly the correlations are manifest and represented inside there thanks Neil thanks Roger can you hear me I can you mentioned the rpm and then you mentioned the recurrent network right and you said that you had the the loss of efficiency with the rpm because of the correlation time right is there any advantage of the rpm um I mean I love the rpm so I'm trying to think what's not the advantage of the so in some sense the the rbm is a very simple um can you see the rbm here yeah yeah so I mean the rbm is the rbm is very simple to interpret and also it's fully connected and so there's a bit of debate about this but it's almost related to Neil's question but if you have full connectivity between visible and hidden so in some sense these things can represent any any entanglement you know or any correlation regardless of dimension or anything like that so if you have some sort of long range correlation between you know the most physically distant qubits that's easy to mediate through two weights for example okay and so I have a lot of work on the weight structure interpreting how these weights look after training and relating that to correlations so I think there's like an interpretability aspect of this too when you look at the RNN again this is really a one-dimensional sequence here and so if you have correlations between the farthest distant spins or qubits or whatever you better hope that the mechanism inside here which isn't all that you know it's kind of opaque you better hope that that's a strong enough you know the representational capacity is strong enough to capture that correlations and it's much harder to interpret so it's much harder to interpret than just seeing a large weight in an rbm so there's pros and cons you know I think if you increase performance maybe it's not universally true but in many cases you kind of lose things like interpretability but yeah maybe that's maybe that's too general a statement but so there's definitely advantages to the rbm but if you're looking for a peer performance I think I think we just see better performance with the RNNs to be honest all right thanks and maybe a quick follow-up on that also related to Neil's question and Hendrik's question before so you spoke about having very general models that are you know expressed enough to capture all the possible entanglements and so forth is there any value in this game in this game to actually restrict the class of bothers that you're using like if you know if you know something about the Hamiltonian that's governing the system that you're looking at is there any value in encoding part of that knowledge into the class of the model that you're using to study the state yeah so we do that already in some sense when we let me try to find the slide here so you're totally right yes and and we do that in this case here already when we throw away a whole bunch of like if you think about it at the end of the talk I talked about all sorts of extensions of what we do but if you really have make no assumptions on your state you should assume that it's a mixed state density matrix but we strip all that away right because we're making a whole bunch of assumptions and we boil it down to this so that's kind of one example of where assumptions come in and that's what makes us different than quantum state tomography where I think they try to have as little assumptions as possible it's also here's another example so here's an rbm I didn't have time to show this but here here we have some notion of locality and but you see there's a whole bunch of weights and biases but you know really you can prune a lot of those weights away so you know here's here's here's a rbm after training and then I don't know if any of you look at these lottery ticket hypotheses or these pruning papers but you can actually prune a whole bunch of those things away and you see locality emerge in the rbm so why not assume that locality to begin with right and that would be another you know another assumption that you can bake into these models to get a bunch of performance but on that note actually you know over parameterization sometimes helps exploration of the lost landscape so it's not immediately obvious you know this is very heuristic you really want to get good performance and you want to reduce the number of parameters but sometimes sometimes it helps to have a few extra parameters so this is a this is a tricky question I see thanks okay will you have a question um it was on like pitfalls for training these generative models I've always found in the past that generative models can be somewhat of a black art do you do you have any common problems that you found while training these that said that we can avoid them possibly I am a wizard of of training these things or rather my students are so I mean there are a lot of heuristics in this absolutely um and you know the last one I mentioned with the pruning this is a really good example so we spent years training fully connected rbms like this and only recently again this is unpublished did we realize that you know you can prune let me actually just go through this real quick you can take you can look at your weights of your rbm and here's ordered weights here and this is a log scale so this is the largest weight this is a criticalizing model doesn't matter and you can see there's some decay of the amount of weights that occur in an rbm and if I want to get a certain energy or fidelity or accuracy you know typically I have to keep increasing the weights increasing the weights but we looked at these papers um on the lottery ticket hypothesis and pruning and so on and realized that we should be able to chop off a bunch of these weights but when you do the performance gets worse but when you do you start from the original model you look at a threshold you chop a bunch of the weights and then you run it more you have more iterations of the of the more epochs of training and then you'll actually get a better energy with less weights I mean that is a pure heuristic that really comes from like the generative modeling literature and again there's no kind of theoretical guidance for this necessarily that I found but it's definitely something that helps us out so my answer to you is like yeah no we're we're deep in the black arts here okay thank you very much thanks indeed any more questions well I can ask a physics question then go ahead um hey Paul yeah hey Roger um physics question um so one of the reasons people got all excited about these Rydberg atoms was not due to ground state stuff due to excited state stuff this whole story that goes by quantum scars is one theoretical explanation for what's going on um your rvms can say anything interesting about that yeah I mean you can train um I mean I I lost my slide here so you you can do dynamical reconstructions with this stuff right and so I mean in some sense that's kind of what I was trying to get to here but you know every single one of these data points uh you know this is time evolution is a different rbm so you know it's if you want to train an rbm to maybe learn about the spectrum or something like that and with one set of weights it's kind of not how I know how to do it if you want to learn about a different if you want to learn about an excited state or you want to learn an excited state you have to take data from that and probably train separately a different set of parameters so I mean yeah that's one point is you don't really know you know in the experiment what the excited states are you just see these revivals this this oscillating that you do but then somehow the explanation of the well the proposed explanation which is pretty believable is that this has something to do with very special excited states and you know there's no good way of getting at that and uh well in any case really it's not clear to me I mean you know I think I think what's I think what happens is you'll reconstruct whatever you have data for but interpreting that like even interpreting you know if you have the ground state and imagine you want to know the elementary excitations out of it I don't see immediately how you would do that with this type of generative modeling you know if you have data for the ground state the thing that you reconstruct is the ground state that's it if you've got a friend excited state or a time of all yeah but that's when it's this time evolution it's not it's not any you know some ensemble okay so that's still uh those are simple questions I think and I think it you know without interpretability like unless you can interpret the weights and and and you know you might be able to like if I could really interpret the weights of an rbm let me just show that again um here so if I could really interpret these weights and again so maybe you get something like this it looks like a big mess but then again if I can get something like this where I say well okay each one of these hidden units mediates a bond or two and I can understand how the lattice emerges and all that then maybe you have a chance of doing what you want to do but I think without like that type of interpretability you're totally stuck and let me let me just say that these things are not easily interpretable you know typically you get something that looks like this you know and and I mean you can argue just to get into the weeds that you have a Hubbard-Stratonovic transformation here you've decoupled you know the the interacting degrees of freedom and coupled into these auxiliary fields but that coupling is insane I mean it's nothing that you can back out a sort of workable you know Hubbard-Stratonovic type model from so and so this is it can you can we interpret machine learning it's can we explain you know is it explainable can we get explainable or explainability and if we can't well we're screwed but maybe that's the point with many body systems maybe certain things aren't interpretable great way to end thanks thanks yeah I couldn't agree more great way to end so if there's if there's no more questions let me just thank you very much Roger for this fantastic talk and for the last discussion afterwards thanks a lot to everybody who's asked questions and I think it's a bit too early to wish everybody merry christmas but since that's the last seminar of this year let me just wish you a nice end of the year and let me hope that some of you will join again next year that's it thanks a lot thanks for