 Yeah, so let me start by thanking the organizers for giving me the opportunity to present here our work that has been done partially by my postdoc Soronandi, who just moved from my group to PKS interest, and then with Marko Schmidt, who you know, the organizer here and Marin Bucco, you also probably know, but we'll talk about his work, I think, tomorrow. So I mean, what we did in kind of these two studies was that we were using neural networks as a way to compress quantum state and then try to extract some effective description, you know, some features that we are typically looking from, from those states. And then, I mean, we applied this principle, for example, to detect different stages of non-equilibrium dynamics to say, I mean, whether maybe the simulator that we're dealing with is a bit noisy or maybe even to learn Hamiltonian from those states or what we actually have at disposal. And so we assume that what we actually can operate with is only the measurements of local operators performed on those states, and then we were trying to extract all the things that I named. So in this talk, I'll kind of first kind of, let's say, put forward or tell you about kind of a new measure of local complexity using autoencoders that we studied, and then later upgrade this for Hamiltonian reconstruction. So we all come from kind of more non-equilibrium dynamics, interested people. So that's what also our motivation was, whether we can use neural networks to study such non-equilibrium systems. So what we had in mind was, you know, suppose you start your state in some rather simple maybe product state, and then you run your evolution with your Hamiltonian that is interacting in tangling and so on. So what will happen to the complexity of this state? I mean, of course, under such an evolution, the complexity of this state will grow. But if we pretend, or maybe that's actually reality from your quantum simulator, if we have access only to some local measurements, so measurements of local operators in such time evolved state, we somehow know that, I mean, after a long time when the state gets very scrambled, so essentially most of the information about the initial state is gone, and the only things that are still important are only, let's say, energy or information about other conserved quantities in this state. So at that late times, it happens that the states, I mean, the local complexity or what we can probe with local measurements should be rather simple once again. Namely, I mean, what's termed thermalization, what should happen is that for what we are probing, the rest should, you know, serve as a bath, and that's why the measurements that we are performing locally should look like coming from some thermal ensemble from some statistical description that is now not very complex but parameterized just with a single parameter, say temperature, or maybe with some additional parameters for additional conserved quantities. And that was kind of what we were interested, so we were asking ourselves, can we use neural networks to detect now these different stages of non-equilibrium evolution, for example, this growth of complexity, and later decay. So the platforms that we, or the neural network architecture that we used for this task was autoencoder because it's kind of, I mean, very traditionally used to obtain compressed representations in general. So it's a neural network of that type, and it's, let's say, task or the goal is to take some initial data and then try to reconstruct it by going through a bottleneck. And we train it in such a way that this reconstruction is as good as possible, but of course it's going to be successful only once the intrinsic dimension of our data is actually, I mean, lower or equal to the number of, to the kind of this number of neurons in the bottleneck. So yeah, and in that sense it, by checking when this reconstruction is successful and when not, we can actually use it, you know, as a way to determine the inherent intrinsic dimension of our data set. Okay, that's very general, but what is our input? So in our case, each data element will contain measurements of local operators. So say we are dealing with spins, so measurement of x, yz, xx, let's say all Pauli strings up to some support s. So and kind of the, let's say the dimension of this input element is how many operators we actually measure. And then we'll be checking, you know, if we perform this measurement in certain state, I mean, is this state parameterizable with some smaller number of parameters? Is it, in that sense, kind of compressible locally? And as I already said, the network is first trained on a subset of realisation measurements, so we take, I mean, maybe I forgot to mention, so now one data element is measurement of all those operators in one state, and now different elements are measurements in different states with supposedly similar complexity. And then we train it on part of those states and measurements in it, and then test it on the unseen part. And we will call the test error essentially dysfunction. So how far is this reconstruction here of our measurements from the initial input measurements? And this will kind of tell us whether the bottleneck is wide enough. All right, and kind of through this first part, I'll guide us through these different stages, but we'll first benchmark this approach on, let's say, the simplest stage, and this is this late stage when we expect the system to be describable by some ensembles, meaning just with a few Lagrange parameters, or just with the temperature. But we will not do a real-time evolution to start with, but we'll just first benchmark. So we'll take Hamiltonian, let's say, transverse field easing, and we'll just prepare thermal states with respect to this Hamiltonian, where we randomly sample the temperature. And then in such a state, we measure all these different observables and first check whether our autoencoder figures out that these measurements are actually parameterized with a single parameter, and that is temperature. And what we first do is that we look at the test error. So we compare how good is this reconstruction of the input measurement at a different number of neurons in the latent space. And we see that this error, or this test error, drops already when I have at least one neuron here in the bottleneck, which is an agreement, or which tells us, and then it kind of flattens off, maybe proves a little bit, but essentially it's flat. So it tells us that indeed the successful reconstruction happens as long as we have at least one neuron in the bottleneck, related to the fact that this data was indeed parameterized just with a single parameter. So it's kind of doing the job that we expected. Now it's also nice, so if we look at how does this input data, which is what it is, high-dimensional vector, where does it map to here in the latent space? And this is what I'm showing here, so one set of measurements at a given temperature would be mapped here, another one there. But when I look at this latent representation of my data set, I see that it's one-dimensional, which is again in agreement with the fact that what the autoencoder is seeing are just thermal measurements at different temperatures. And we see that here I color coded these different points according to the temperature of the state from which the data is coming or the energy. And we see that it's indeed ordered monotonically with respect to the corresponding energy. Yes, so we put all power strings of local power operators up to support three. No, many, x, y, z, let's say 64 of them for support three. I mean a bit less because we remove those with identities at the edges, but roughly kind of four to s operators. So this input is really high-dimensional, and then this is mapped to some lower-dimensional representation. So then these are measured with respect to your state. So let's say I would take a trace of your operator with respect to rho. And then I take, let's say first as I said x, and then the second one y, and so on. So all these are inputs then to my, but they are not stochastic. I mean, it's like for a given state or which can be pure or I'll also do some open systems, maybe mixed or thermal, but it's just the measurement in this state. Yeah, that's like it was disconnected and then essentially what it kind of learns is the average. So I mean, why it's not that bad even if it doesn't know anything. It tells us that because a lot of, let's say inputs were quite small. So maybe the average of those measurements is, yes, they are disconnected. So it's kind of best, I mean, what you would learn. I mean, okay, what would you input? Yeah, if you wouldn't know, yeah, wouldn't connect. Yeah, and then we can also, a bit like we can check what is the value of this neuron here, let's say in relation to the temperature. I mean, does it pick exactly the temperature of my states? And the answer is no. So let's say here in red is the value of, on this, let's say if I had just a single neuron in the latent space, that would be the value on it. And you just see that it is an invertible function of temperature plotted in black, but it's not exactly it, which kind of makes sense. So it should just only be bijectively related to it, but not necessarily the same. And then we went on and also, okay, looked at what if we don't have just a single, I mean, not a thermal state, but we add additional conserved quantities. And that is kind of super easy for the model that we consider, because transfer field using model is actually integrable. So it has macroscopically many conserved quantities. So we could have put here macroscopically many additional branch multipliers. And that would be, yeah, where all of those Cs, which are the conserved quantities of the model would commute. But here we added just one of it. So this was Hamiltonian, and then we added one more. Just to check that whether the out encoder then figures out that now our state is parametrized with two parameters. And that is indeed what happens. Now the test error drops when I have at least two neurons in the latent space and then levels off. And if we look now, like we did here for the latent representation in that case, I mean, it looks, maybe it's a bit hard to say that it looks two dimensional, because it maybe looks also a bit one dish. But while according to this, it should be two dimensional. But still the message is that when we look at how this ordering is performed. In that case, we still see that one direction is spent by the Hamiltonian expectation values and the perpendicular to the second conserved quantity. So, yeah, the message being that, yeah, it works well. But now let's, I mean, this was really benchmarking. So sending it measurement respect to this synthetically prepared state with randomly sampled lambda one and lambda two. Now let's go to more realistic situation and that is the following. So, I mean, we could have done a quench, but since I am kind of quite fond of open systems, we did a bit of a variant of it. So we've performed evolution with respect to some Hamiltonian. And then added, which was fixed. And then added a little bit of coupling to the baths, which I don't know if you're familiar with it. But I mean, which can be described with so called limblat operators, so these are those LK. Such that the evolution of the state is described by this mixed density matrix that follows this Louvillian equation. And what we know from some other works, also including myself, is that if this coupling to bath is very weak, then the steady states are of this form that we just described. So if this evolution with Hamiltonian is chaotic, then we are relaxing to a thermal state plus some corrections, which are regulated by how weak is our coupling to the bath. While if this Hamiltonian is integrable, like it was our transverse field easing model, then it actually relaxes to a generalized Gibson sample described with additional Lagrange multipliers for these additional conserved quantities of our integrable Hamiltonian. Again, plus some corrections. So what we did is that indeed we performed time evolution with a fixed Hamiltonian and then prepared different states by kind of taking different weak couplings to baths, which were encoded in different, essentially randomly rotated single and two side limblat operators. And then we went to analyze now this physical density matrix corresponding to the steady state. And that's what we got. So we first look at the test error, which should tell us now how many parameters we need to describe this state. And somehow if our Hamiltonian was chaotic and we should be relaxing close to a thermal state, we got that we needed essentially one, two latent variables to describe kind of statistically all of these different steady states obtained by slightly different limblat operators. While if Hamiltonian was integrable, we needed more. But because we were measuring here only operators up to support three, it turned out that we didn't need like a crazy amount of latent variables, maybe only like four, which means that even though we know that this state should relax to a GG parameterized with macroscopically many Lagrange parameters, if we are measuring only at maybe up to three sites, we don't see the importance of the higher, more complicated conserved quantities, but we only roughly see the importance of the most local conserved quantities. And once again, if we look at this latent representation, so how the data looks here in the latent space, we saw that for a chaotic Hamiltonian that should lead to a nearly thermal state, the data was once again almost, let's say approximately one dimensionally ordered and that ordering was once again with respect to the energy. While for this other situation with integrable Hamiltonian, where we said that the data was ordered in, let's say, some four dimensional manifold, if we mapped it to 2D with Disney, we saw such representations where once again, let's say, the dominant primary direction was spent by the expectation value of Hamiltonian and then the next one with some linear combination of Hamiltonian and the most local parity even conserved quantity. So we nicely saw that really these conserved quantities are the ones that are spanning this latent representation. Now, if we go a little bit back to this chaotic example and try to think, okay, where does this spread comes from? I mean, it must come from these corrections above the simple thermal state. And in this case, different study states were performed by kind of randomly, by like coupling two bars represented with two sides and one side limb blocks, which were kind of randomly rotated sigma x times projection down and randomly rotated projection up. And it turns out that in this case, the outer encoder figure out that the only direction which is not rotated is sigma z. So this spread was actually given by the z correlators, which tells us, let's say, if you had kind of a simulator, you did many runs, you saw it's not perfect, so it has some noise. So then such an analysis could tell you, I mean, what kind of correlations this noise is promoting. And this got, we then pushed it a little bit and said, okay, let's now really look at the coupling to the bath which is promoting a certain correlation, which is more obvious. For example, by choosing limb blocks, which are flipping your spin on site I up, if the spin on site I plus one is down, and all these other symmetric combinations, and then just giving them some random weights, in which case really the bath was proposing, was promoting anti-ferromagnetic X6 correlations. And then we nicely saw that the secondary direction was exactly spanned by these kind of correlations. All right, so in that sense, we could at least have a bit of a feeling of what might be, I mean, what kind of noise might be polluting our data. Yeah, yeah, it would, so it would happen that these points would lie elsewhere, but features like this one, that you have kind of one direction spent with, let's say Hamiltonian, and then the second one with your second concert quantity, that should remain. So I think this cloud would change, but this fact that, I mean, these different directions are related to the, like say, your most important, and second, and so on, latent neuron should be still visible. True, and I mean, let's say also, these shapes that I was showing here was done the same way, and let's say this star would change into something else. So one shouldn't, yeah, I think, of course there are regions where it's somehow, where it's meaning, or where it looks similar, but I still think that unless you, okay, unless you do something crazy, this ordering, I mean, or these interpretations can be done. Or perhaps what we start the little bits of making this more precise, where we were going towards variational auto encoders, where this ordering was kind of more, I mean, aligned, but that we didn't fit, like we never really pushed it to publication quality. If we did just a simple PCA on the data, I on the latent space maybe then, yes, maybe one could do it in a sense of once you already have the compressed representation, probably this primary direction should pop out there as well. Okay, and what I wanted to say is that, okay, if I crank up this coupling to the bus to be very strong, when no emergent, simple description of those states can be formed, then we got, I mean, then we saw that we needed, we would need much larger latent representation, and there was no nice, let's say, latent illustration of your data anymore, telling us that then these states are really kind of strongly dissipated or dependent. Okay, and then finally we went for this time evolution, so let's say if we start from an initial state, can we use this approach to detect this growth and decay of complexity? But here I should be kind of honest, I mean, I cheated a little bit, so if we would really prepare the state in a product state and then just evolve it with some Hamiltonian, the autoencoder would know that our state was prepared in a say two-dimensional manifold, and then it would just, I mean, that Hamiltonian is just mapping it to something else, so we would never see this growth of complexity. So this is only possible once we introduced some randomness, for example, by propagating the state with random unitaries, which then differ from time to time. But we still imposed one conservation that is magnetization conservation, such that at long times we were still flowing to essentially like non-trivial kind of deep ensemble characterized with chemical potential for magnetization. And here is what we then got, so yeah, because we prepared a system in different product state parameterized with two angles, the autoencoder nicely saw that essentially we need two latent variables to represent that. Okay, at long times, as I said, we were running into like state that was locally described with an ensemble parameterized with the one chemical potential, so we needed just one variable to represent that Lagrange multiplier, but at kind of some intermediate times, we saw, I mean, okay, what I'm plotting here is the test error kind of color coded as a function of time and number of latent variables. We saw this growth and decay of complexity. For now, so again, this case observables on support three, which, okay, what it told us that, I mean, it could help us identify on which time scales this so-called hydrodynamic regime for which we can build effective classical description start to appear and it helped us extract this information from, let's say, the bunch of observables that we had. Yes? Initial states are different product states which are translation invariant. Parameterized right with two angles of those spin. Yeah, I mean, it's this effect that we always see, let's say that it's not, it's always drifts a bit down, but let's say, I mean, it's a bit hard to see still the major drop in that case is a two. Yeah, on this plot is a bit hard to, yeah. But yeah, you do see these that here you need two, here you need one, and in between you have this information bottleneck where the state really gets hard to represent and where people have been really trying to put forward some numerical techniques trying to overcome this barrier here that emerges in subsystem dynamics only. Okay, and now, and I mean, so far we didn't manage to really put forward the technique, we only essentially put forward a way to detect it and possibly try to ask yourself whether now you can detect, I mean, reconstruct this effective equations of motion. But then, I mean, we went further, I mean, can we now harness this to do some sort of Hamiltonian reconstruction if you're given, let's say, just bunch of observables? And at least in the case where we, the autoencoder tells us that these observables are coming from different thermal or at least nearly thermal states when this latent space is ordered with respect to the energy or Hamiltonian expectation value, then even, then we can use that. I mean, even if we don't know, let's say the Hamiltonian that produce these expectation values, it will hold that not only Hamiltonian but also other observables that we do have will have a strong variation along this latent representation. So what we did is that essentially we said, okay, let's just look at the average gradient of the observables that we have along this manifold and pick the ones that have the largest gradient and take those as kind of trial Hamiltonian terms. For my unknown Hamiltonian. And then we fixed these weights by comparing our trial thermal states because we know that, I mean, that these measurements are coming from thermal states because the representation is one dimensional to the actual measurements. And then, I mean, if essentially if we have measured all observables or terms of the Hamiltonian, in that way we are done. We got it, okay, up to a pre-factor which can be fixed as I'll tell you later. I mean, if we are actually dealing with a long range Hamiltonian but we are only measuring part of its terms and we are trying to propose some kind of approximate short term approximation to it, then it turns out that, I mean, in that way, maybe some ghost terms creep in and we can get rid of those by looking at the variation of these coefficients at different measurements, get rid of those and repeat it. And, okay, and essentially now what we did is that we, well, we applied this approach to a few cases. So as I said, now if I measure all terms that actually appear in my Hamiltonian, this approach will work well and I'll essentially, you know, I'll reconstruct it successfully. If I'm dealing with a long range Hamiltonian but I measure just the most local observables and I'm essentially trying to approximate it now with some local one, it will, well, it will be approximation, right? But still, I mean, we can check how bad it is. So let's say I consider as my Hamiltonian that I'm after this such an XY model with power low decaying in directions. What I find out that this procedure, I mean, if I look at the relative error of these coefficients with support L, so one, two, three, so on, the error is pretty small at the dominant terms and it gets larger at the, like the terms with larger support but because these have smaller kind of weights, turns out it's not so important in practice. So let's say if I compare evolution of my actual Hamiltonian and my reconstructed one, it will be very good at short times, then it start to deviate, but these deviations are in our case where we do some ED kind of amplified by finite size effect. So anyway, I expect here that I would just see a plateau and all these fluctuations are more or less finite size fluctuations. Yeah, and this, how in this, thank you, it depends very much on delta. So the more larger is the delta, so more kind of suppressed our long term contributions, then the smaller are the errors as well as like the larger access we have. So if you are measuring all up to support four is worse than up to support six. So the more information we have from measurements, the better. On the dynamics plot, yeah, I think it was the, sorry, I think it was the two, but I can, yeah, was one of those two. Right, but why, I mean, why bother? So we thought, okay, let's do something interesting with it and one case where we sometimes really do not know the effective model is when we are considering periodic driving. So when, for example, your Hamiltonian is periodic with some time T, or maybe your time propagator is periodic. And what we know if the frequency of the dry is very high and the system has hard time, you know, absorbing this large energy that we try to put in. So what will happen is that on a pretty long window of essentially exponentially up to exponential times that are exponentially long in this high frequency, the system will look like thermal with respect to so-called Floke Hamiltonian, which can be in this case of high frequencies, for example, obtained with some perturbative approaches like Magnus expansion in one over frequency. But then at later times, the system eventually, I mean, is able, I mean, start to absorb and heat up to the infinite temperature. And it's much less known of what is happening now in this regime, where these perturbative expansions break down. And that's what we've been interested in. So, I mean, we first, again, benchmark, that's our approach on the example where we have a clear thermal plateau. So, for example, our propagation for time t was done by H1 for half a period and H2 for the other half. When, for the parameters we chose, we have like a clear plateau. I mean, shown here in blue for the exact. And here we first compared, let's say, some BCH. This is one of these expansions up to the third order or our reconstruction, including also operators up to support three, which is the same as the order that we considered. And okay, what we see is that neither our reconstruction nor the BCH is perfect. So they all slightly deviate from it because in both case we are reconstructing this kind of long range Hamiltonian with some finite, I mean, short range one. And plus we are doing this reconstruction from the data that is polluted by finite size effects and these kind of wiggles in time, which I guess makes our method slight, maybe even slightly worse than the BCH one, but roughly they are kind of comparable. Like if I look really at the weights at different terms, they are quite similar. So let's say at least we are roughly happy with what we are getting. So then we can go on and kind of go into the wild and try to reconstruct now some effective Hamiltonian or I'll see, I mean, what's happening after we, I mean, after we are away from this simple thermal plateau. And what was already proposed by Marine and his collaborator earlier was that, I mean, that also in this regime, the state should remain thermal, but with some unknown Hamiltonian. So we first wanted to test that. So we were essentially no feeding now into our auto encoder measurements at these different times here in the heating regime and first check what is the dimension, I mean, what's happening with the test error. And what we see is that, I mean, maybe not for the short times, I mean, here is the actual data that we then measured from, except for the short times, which maybe still remember of the initial state, which was some randomly rotated thermal one, but at least then at later times, we see a clear one dimensional latent representation again, which then also in the latent, really in the latent space look like that, which first of all told us that data is really still thermal in the heating regime. All right, and now, okay, under this knowledge that it's a thermal state, we went on to try to reconstruct this Hamiltonian and okay, what we saw from that analysis is that as time progresses, this Hamiltonian becomes less and less local. So here I'm comparing the average coefficient at support two, three and four, five, compared to the dominant, which is one. And I see that as time progresses, these less local terms are essentially picking up on the importance. And apparently this heat now, and this kind of gave us intuition that this heating is happening through, it is effective Hamiltonian becoming less and less local up to the point where I guess it becomes non-local and we enter into this infinite temperature state. I'm running a bit late, but we repeated this same procedure also for so-called multipolar drives when I'm randomly combining a sequence of some U dagger and U plus and U minus or the other way, but essentially here really the type of effective state is completely unknown, but our analysis at least show that in this heating regime also for this other type of driving, the same is happening. And with that, okay, I'm down to conclusions. So we've used neural network to try to achieve some compressed representation of measurements in quantum states, so as to extract some kind of typical non-equilibrium dynamics features, but potentially the noise that's polluting our simulator or the effective Hamiltonian describing these states. Let me just maybe highlight the poster of Gianluca Laniese who's more talking about what we do now and this is using POVM representation of density matrix to be captured by steady states for describing light matter coupled open systems. And yeah, let me also say that, I mean, in my group we have PhD in post-opposition, so if you are interested to join, just let me know. And with that, I conclude. Okay, thanks, Zalla, for the interesting tag. Is there further question? Yeah, I mean, I think that maybe measuring even more operators could make it sharper. I mean, we were also playing a little bit with, I mean, maybe I have here, let's say with the number of samples and the number of neurons you have per layer. So this, the first one was like going from two to 400 neurons in a layer, but was maybe making a little bit sharper, but not too much, same with the number of samples. It's also, I mean, yeah, it's a bit, yeah, not so clear that it's helping. So probably if you really wanted to have this more sharp variation, autoencoders are maybe a better choice when you then look at how strong is the variation per each and there you should see that on your telling neuron, it's slow and on the random neurons is essentially very high. So maybe it's a better way to make it sharper. It would be possible to generalize this for when your data set is not expectation values, but it's more like snapshots of a quantum state. Yeah, we've been a bit thinking along this direction and I mean, I didn't do anything, but I guess that indeed, perhaps I believe it should work, but maybe there also going for kind of the variation and autoencoders would be the way to go because then kind of this probabilistic, maybe nature can help you overcome this difference and try to maybe even from snapshots extract some sort of, if you find that your measurements are thermal, even use it to maybe produce, it was not, I mean, I think maybe Marcus, you can comment, it was not, I mean, what? In that sense, if it's a measure of the variance of your measurements, in some sense, it's also can be, I guess, related to, I don't know, if you would sample from hotter states where you have smaller, I mean, it's all very, all values are pretty small anyway. I mean, if your density matrix is nearly identity and you have, okay, then all values are small, so I guess then this error you would get would also be smaller than if you sample from something where they vary more strongly. So in that sense, maybe yes, or depending on your Hamiltonian, how many operators actually have a thermal expectation value, it can tell you also a bit on the Hamiltonian, right? Because like, I mean, which operators are finite in a thermal state depends on your Hamiltonian. Any other questions? Now, let's thanks again, Zala.